GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
Active Lane idle no warm brain
OpenRouter $0.00 spend · since start
Unified · 60 s 8 GB guard band shown at top
Benches 3 cached evidence sources
Lanes ranked 14 unique (lane, bench)
Runs 16 bench + live
Schema v2 leak-proof · ✓
Generated 2026-05-28 21:28:08 UTC last fieldkit arena mirror
Cost / quality efficiency frontier 6 models · 29 builds

Every quant variant the Spark has measured, plotted as quality × throughput. The gold line is the Pareto frontier — the builds nothing else beats on both axes at once. A frontier public cloud arenas can't draw: they don't know what hardware their votes ran on. We do — the operator is the hardware.

Quality index is normalized per model (perplexity is corpus-dependent — only comparable within one base model). Each model's variants form its own curve; hover any point for the raw numbers. Per-model detail lives under Models.

View Sort

⬢ Bench-anchored — cached evidence

hermes-cost-routing-local-and-openrouter:cost_router 3 lanes · 3 runs metric · cost_router
Rank Lane Quality Throughput Runs
1
frontier-only
100.0%
1
2
cost-routed
91.7%
1
3
local-only
66.7%
1
hermes-vertical-router-on-spark:vertical_router 6 lanes · 6 runs metric · vertical_router
Rank Lane Quality Throughput Runs
1
cyber
100.0%
1
2
finance
100.0%
1
3
medical
100.0%
1
4
brain
80.0%
1
5
legal
80.0%
1
6
patent
80.0%
1
picking-the-hermes-brain-on-spark:hermes_brain 3 lanes · 3 runs metric · hermes_brain
Rank Lane Quality Throughput Runs
1
qwen3-30b-moe-llamacpp-q4km
90.0%
83.5 tok/s 1
2
qwen3-30b-moe-vllm-fp8
87.5%
55.0 tok/s 1
3
nim-incumbent
77.5%
23.9 tok/s 1

◉ Live cockpit runs — operator compares

cockpit · all rubrics 2 rows · 4 runs metric · rubric mean
Rank Rubric · Lane Quality Throughput TTFT Runs Human ↑
1
patent_claim_validity vs · openrouter-frontier
100.0%
27.3 tok/s 3179 ms 2
2
patent_claim_validity vs · resident-brain
50.0%
88.1 tok/s 100 ms 2

Source — fieldkit.arena.mirror.export_publishable_slice(); allowlist pinned by fieldkit/tests/arena/demo/test_mirror_does_not_leak.py. The chat_* tables, compare_runs.prompt, and compare_responses.{content,reasoning} are NEVER enumerated.