Tag
#literature-search
Articles tagged "literature-search" — 1 entry.
Frontier Scout
AutoResearchBench on Spark — Two NIMs, One Bench, Two Failure Modes
Two Spark-tuned NIMs run AutoResearchBench's three Deep-Research example questions. Llama-3.1-8B crashes by turn 5-6 on its 8K context; Nemotron-Nano-9B-v2 finishes cleanly at 128K. Both score 0% Accuracy@1 — for completely different reasons.
uses fieldkit.nimfieldkit.evalfieldkit.capabilities