Tag

#qwen3

Articles tagged "qwen3" — 5 entries.

Article №51 agentic Foundation 28 May 2026 ~4 hours including the OpenRouter bakeoff + harness publish

Cost-Routing the Hermes Harness — When Local Stops Being Enough on a DGX Spark

The local 30B-MoE on a Spark is at $0 marginal cost — until it isn't. H6 measures the failure-mode curve: where does local stop being enough, and what does the dollar curve look like when you escalate to OpenRouter only when you have to?

uses fieldkit.harnessfieldkit.eval

Article №50 agentic Foundation 28 May 2026 ~3 hours including bakeoff + harness publish

Harnesses

The Hermes Vertical Router on a DGX Spark — One Brain Always Warm, Five Specialists Summoned on Demand

Five published Orionfold verticals plus the pinned MoE brain become a router on one Spark — not by parallel inference (the unified-memory envelope forbids that), but by a deterministic keyword classifier that dispatches the prompt and serves the right specialist one-at-a-time.

uses fieldkit.harness

Article №49 agentic NIM 28 May 2026 ~6 hours across three serving lanes, N=5 attempts per prompt

Harnesses

Picking the Hermes Brain on a DGX Spark — When Throughput Stops Being the Answer

The Hermes serving-lane bakeoff couldn't pick a winner: all five lanes cleared the tool-call format bar. A graded brain-quality rubric breaks the tie — and shows the fastest serving lane is also the better agent, by a margin throughput could never have measured.

uses fieldkit.evalfieldkit.harness

Article №46 deployment NIM 26 May 2026 ~3 hours, most of it model pulls and four cold-starts

Harnesses

The Hermes Serving Lane on a DGX Spark — MoE vs Dense, and the Number That Actually Picks the Lane

Five Hermes serving lanes on one DGX Spark: Qwen3-30B-A3B MoE vs Qwen3-32B dense across vLLM, llama.cpp, and NIM. The MoE runs ~8.5× faster for the same memory — but the lane is picked by tool-call reliability, which took two config fights to get to 0% everywhere.

uses fieldkit.capabilitiesfieldkit.harnessfieldkit.nim

Article №40 deployment llama.cpp 16 May 2026 ~5 hours end-to-end on a DGX Spark

Machine that Builds Machines

Orionfold/II-Medical-8B-GGUF on Spark — five medical-reasoning variants, MedMCQA mini-eval, ChatML reasoning format

Five GGUF variants of Intelligent-Internet/II-Medical-8B (Qwen3-8B + DAPO reasoning recipe) measured on a DGX Spark. Q5_K_M lands at 36.4 tok/s, 5.45 GB, and 52% on a MedMCQA n=50 mini-eval — above F16. First reasoning recipe in the series.

uses fieldkit.quantfieldkit.publishfieldkit.evalfieldkit.lineage