Orionfold Arena — Which local lane should drive your always-on Spark agent?

What it's for

Audience — DGX Spark power users running a local, no-API-key agent harness.

Quant economics quality × speed per build

Variant	tok/s
NIM · Nemotron-Nano-9B-v2	27.7
llama.cpp · Qwen3-30B-A3B (MoE, Q4_K_M) sweet spot	88.0
llama.cpp · Qwen3-32B (dense, Q4_K_M)	10.2
vLLM · Qwen3-30B-A3B (MoE, FP8)	55.9
vLLM · Qwen3-32B (dense, FP8)	6.6

Known drift bounded · honest

Tool-call reliability sample size format-error rate measured over 8 agentic tasks per lane; not a large-N guarantee.
Qwen3 context vs Hermes minimum Qwen3 lanes serve at native 40,960 tokens; Hermes's 64K floor is bypassed via model.context_length + auxiliary.compression.context_length overrides.

Get it

Run it local

Yours, offline, on the Spark:

pip install fieldkit[arena]
fieldkit arena up

then drive this model from the cockpit — prompts and telemetry never leave the box.