Now shipping · Cockpit

Orionfold Arena

The cockpit for running, comparing, and scoring local language models on a DGX Spark.

15.4h

to build

12,733

lines of code

125

tests

features

Launch the demo Read the launch story

What it is

A single-screen cockpit for running, comparing, and scoring local language models on one NVIDIA DGX Spark — live GPU telemetry, an efficiency frontier, reference-based eval, and a leak-proof leaderboard over your own artifacts. Private by construction, on the machine under your desk.

Private by construction Single-screen cockpit Over your own artifacts NVIDIA DGX Spark

Inside the cockpit

17 features, one screen

The cockpit

One screen to see every artifact, bench, and the warm model's live telemetry — the operator's home base.

Live telemetry rail

Always-on GPU, temperature, unified-memory, and throughput readouts — each with a fixed-window peak-bar chart — so you watch the Spark's envelope while a model runs.

Leaderboard

Live bench-anchored rankings that fold in every chat and compare run, with Spark/OpenRouter source badges — served from a leak-proof public mirror, never your prompts or completions.

Efficiency frontier

Quality versus throughput on one chart with the Pareto skyline in orange — where you decide which quant is worth shipping.

Models browser

Every artifact you can run, filterable by kind and license, one click from chat or compare.

Model detail

Positioning, quant economics with the sweet-spot row, known drift, and a per-model efficiency curve — the full card before you commit GPU.

Chat against any lane

Talk to the warm resident model, an on-demand local GGUF, or a hosted lane — markdown, reasoning, and live tok/s in one composer.

Eval prompts + reference scoring

Pull the exact bench a model was measured on, autofill the composer, and auto-score the answer against gold without leaving chat.

Compare — any vs. any

Duel two lanes side by side with a deterministic rubric score and telemetry-style metric cards — quality, tok/s, TTFT, tokens, cost — each over a session sparkline. Local-vs-local, local-vs-hosted, your call.

Command palette

Hit ⌘K and jump anywhere — fuzzy-search every model, article, and lane, or fire a chat or compare without touching the mouse.

The Lab

A living board of what's shipped, what's next, and what's being explored, with a built-together timeline mined from the commit log.

Guarded lane lifecycle

Launch and tear down serving lanes from the cockpit with a teardown-first confirm, an envelope pre-flight, and an anchor on warm — one resident model in 128 GB, enforced visibly.

Measured benches in the eval drawer

Registered benches replay their measured packets — system contract and reasoning control included — so a row picked in chat scores against exactly what the tracked receipts measured.

Vertical-proof cards on Cortex

A model lane's whole promotion case on one screen: generator preflight, corpus pack with recall gates, routing costs per config, and the publish receipt with its gate chips.

Your first run — the welcome

A dedicated, self-dismissing welcome screen greets your AI Researcher, grounded in your real corpus numbers, with three prompts to try — one of which teaches honest refusal in a single turn. Re-reachable any time from the top bar.

Bring your own cloud key

Your Advisor runs locally with no cloud key. Add an OpenRouter key only if you want to compare against frontier cloud models — the catalog and spend tile stay hidden until you do. Your key is written to a private file on your box; Orionfold never sees it.

Guided console onboarding

One command brings the whole stack up: a preflight matrix, a free-key capture if you need one, a named download manifest, and orientation cards that teach the product while the model pulls — ending on a warm cockpit. curl-to-running, narrated.

Live preview

Drive the cockpit yourself

The preview is recorded on a DGX Spark and runs sidecar-less — no GPU, no backend, nothing phones home. Chat and Compare replay real sessions token-by-token; the telemetry rail, efficiency frontier, and leaderboard are the genuine cuts.

Walk the cockpit, open a model card, duel two lanes in Compare, then run it for real on your own Spark with one command.

Launch the demo Artifact card

The build

How it came together

Orionfold Arena was built in one day and an overnight (~15.4 hours) — 12,733 lines, 125 tests, 12 sessions of agentic coding with Claude Code. The launch story walks every feature, the build metrics, and the workflow that produced it.

Read the launch story

Orionfold Arena · ships inside fieldkit

Run it on your own Spark.

Install fieldkit with the arena extra, start the sidecar, and open the cockpit over your own models, artifacts, and benches.

Install the cockpit

Terminal

$ pip install fieldkit[arena]▌

Live demo Launch story View source