Tag

#moe

Articles tagged "moe" — 2 entries.

Article №49 agentic NIM 28 May 2026 ~6 hours across three serving lanes, N=5 attempts per prompt

Picking the Hermes Brain on a DGX Spark — When Throughput Stops Being the Answer

The Hermes serving-lane bakeoff couldn't pick a winner: all five lanes cleared the tool-call format bar. A graded brain-quality rubric breaks the tie — and shows the fastest serving lane is also the better agent, by a margin throughput could never have measured.

uses fieldkit.evalfieldkit.harness

Article №46 deployment NIM 26 May 2026 ~3 hours, most of it model pulls and four cold-starts

Harnesses

The Hermes Serving Lane on a DGX Spark — MoE vs Dense, and the Number That Actually Picks the Lane

Five Hermes serving lanes on one DGX Spark: Qwen3-30B-A3B MoE vs Qwen3-32B dense across vLLM, llama.cpp, and NIM. The MoE runs ~8.5× faster for the same memory — but the lane is picked by tool-call reliability, which took two config fights to get to 0% everywhere.

uses fieldkit.capabilitiesfieldkit.harnessfieldkit.nim