Orionfold Arena — Cockpit

resident brain idle · waiting for the sidecar

Artifacts 22 manifests in roster

Articles 67 published deep-dives

Benches 4 cached evidence sources

Runs scored 78 bench + live

Envelope 128 GB unified · 8 GB guard

Top runs · last cut M6 mirror

cyber · hermes-vertical-router-on-spark:vertical_router · 1 run

100.0% —

finance · hermes-vertical-router-on-spark:vertical_router · 1 run

100.0% —

medical · hermes-vertical-router-on-spark:vertical_router · 1 run

100.0% —

frontier-only · hermes-cost-routing-local-and-openrouter:cost_router · 1 run

100.0% —

cost-routed · hermes-cost-routing-local-and-openrouter:cost_router · 1 run

91.7% —

qwen3-30b-moe-llamacpp-q4km · picking-the-hermes-brain-on-spark:hermes_brain · 1 run

90.0% 84t/s

4b-sft-v0.2::curveball-v0.1 · the-refusal-floor-is-trainable:advisor_contract · 1 run

90.0% 42t/s

qwen3-30b-moe-vllm-fp8 · picking-the-hermes-brain-on-spark:hermes_brain · 1 run

87.5% 55t/s

Resident lane configured

This run vs the bar live

Measured on the Spark numbers earned in the deep-dives · click through to the article

throughput 100 tok/s a brain that runs at picking the hermes brain → throughput 20 tok/s within sampling noise runs at becoming a legal curator → latency 80 ms streams the first token in naive RAG → latency 2 ms loop over distances finishes in pgvector → accuracy 9% accuracy use to land the headline autoresearchbench → accuracy 9% accuracy the query the bench's headline autoresearchbench →

What is Spark Arena?

Spark Arena is the operator-driven alternative to public cloud model arenas: private eval leaderboards, efficiency-as-metric (quality and tok/s, unified-mem peak, TTFT, $/M), closed-loop eval → fine-tune → re-rank, tool-call replay, custom rubrics, and a cost-per-quality Pareto frontier anchored to the hardware the votes ran on — here the operator is the hardware.