Tag
#nim
Articles tagged "nim" — 13 entries.
Picking the Hermes Brain on a DGX Spark — When Throughput Stops Being the Answer
The Hermes serving-lane bakeoff couldn't pick a winner: all five lanes cleared the tool-call format bar. A graded brain-quality rubric breaks the tie — and shows the fastest serving lane is also the better agent, by a margin throughput could never have measured.
uses fieldkit.evalfieldkit.harness
The Hermes Serving Lane on a DGX Spark — MoE vs Dense, and the Number That Actually Picks the Lane
Five Hermes serving lanes on one DGX Spark: Qwen3-30B-A3B MoE vs Qwen3-32B dense across vLLM, llama.cpp, and NIM. The MoE runs ~8.5× faster for the same memory — but the lane is picked by tool-call reliability, which took two config fights to get to 0% everywhere.
uses fieldkit.capabilitiesfieldkit.harnessfieldkit.nim
The Hermes Harness on a DGX Spark — A Local Cockpit That Holds Tools, With No API Key
Installing the Hermes agent harness on a DGX Spark and running the first local agent turn against the cached Nemotron-Nano-9B-v2 NIM — reliable tool calls, no API key, no cloud hop. The defensible angle is NIM-first; everyone else's Spark Hermes write-up leads with Ollama.
uses fieldkit.nimfieldkit.capabilitiesfieldkit.harness
ClawGym on Spark — A 7B Base, A LoRA Adapter, and the +15 pp the Adapter Earned
ClawGym shipped only a .github profile, so we built the substrate ourselves — persona task synth, sandbox harness, 200-task corpus, LoRA SFT, matched-base eval. The adapter earns +3.8 pp task pass and +15.0 pp per-assertion against its own base. The diagnostic is the lift.
uses fieldkit.nim
AutoResearchBench on Spark — Two NIMs, One Bench, Two Failure Modes
Two Spark-tuned NIMs run AutoResearchBench's three Deep-Research example questions. Llama-3.1-8B crashes by turn 5-6 on its 8K context; Nemotron-Nano-9B-v2 finishes cleanly at 128K. Both score 0% Accuracy@1 — for completely different reasons.
uses fieldkit.nimfieldkit.evalfieldkit.capabilities
The Autoresearch Loop — 50 Iterations of an LLM Editing Its Own Trainer Overnight
NIM Llama 3.1 8B drives a structured-perturbation agent loop against a 354M GPT pretrain. 50 iterations, 73.4 min wall, 0.07 kWh of electricity. 8 keeps, 42 reverts, 0 rail blocks, 0 crashes. Best result: val_bpb 10.8534, +0.93% over baseline at d_model=768.
Second Brain as a Tool — Wrapping the RAG Stack in MCP for Claude Code
Closing the Second Brain arc. Four MCP tools wrap the RAG chain — embed, retrieve, optionally rerank, generate — and any Claude Code session anywhere on the box becomes a grounded research client. 200 lines of Python, one launcher, one .mcp.json entry.
Three Endpoints, One Answer — Naive RAG on a DGX Spark
Three endpoints in one curl chain — a query embeds through Nemotron, pgvector returns top-5 chunks in under 80 ms, and a Llama 3.1 8B NIM stuffs them into a strict-context prompt. The chain works; the 8B generator still refuses on questions its own context answers.
uses fieldkit.ragfieldkit.eval
Your Own Semantic Space — a Nemotron Embedding NIM on a DGX Spark
The embedding endpoint that every downstream RAG, wiki, and agent piece will reuse — a 2048-dim Nemotron Retriever NIM running locally on GB10, ready 52 seconds after docker run and holding 28 docs/s under batched load.
uses fieldkit.rag
Your First NIM on a DGX Spark — What 24.8 Tokens Per Second Doesn't Tell You
First-contact notes on NVIDIA's DGX-Spark-specific Llama 3.1 8B NIM. 9.4 GB image, ~108 s warm-cache cold-start, 24.8 tok/s steady, OpenAI-compatible on :8000 — and a confidently wrong Python one-liner that clarifies what small-model FP8 buys and what it costs.
uses fieldkit.nim
Field-Fixing the Hermes Harness on a DGX Spark — When the NIM Won't Stream Tool Calls, and Other Rough Edges
Fifth in the Harnesses series: the field fixes that take a fresh Hermes agent on a local NIM from 'mostly works' to 'just works.' Leads with the one that bit hardest — the Spark NIM ships a non-streaming tool parser, fixed by bind-mounting NVIDIA's own streaming parser.
uses fieldkit.harness
RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation — Spark reproduction notes
Reproducing the RaguTeam SemEval-2026 T8 winning system on a DGX Spark — judge-orchestrated 7-LLM ensemble (Qwen3-4B-FP8 + Meno-Lite-0.1 7B local + remote members) with Qwen3-32B judge, then extracting the pattern into `fieldkit.ensemble` + `fieldkit.judge`.
Heterogeneous Scientific Foundation Model Collaboration — Spark reproduction notes
Wrap a domain foundation model (Pangu-Weather) as a Triton tool, drive it from a NIM-served Llama 3.1 8B planner via NemoClaw, and show when specialist routing beats language-only reasoning — all inside the Spark 128 GB envelope.