GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
Active Lane idle no warm brain
OpenRouter $0.00 spend · since start
Unified · 60 s 8 GB guard band shown at top
Now shipping 6

Models browser

The dead "Models" tab is now a filterable capability catalog — positioning, quant economics, bounded drift, deep-links into chat & compare.

Open →

Cost/quality frontier

A Pareto frontier anchored to real Spark hardware — quality × throughput scatter with a deterministic gold skyline, on the leaderboard marquee.

Compare depth + parity

Side-by-side now renders markdown + syntax highlighting at chat parity, with a rubric-derived winner banner and a head-to-head delta strip.

Command palette (⌘K)

A global fuzzy palette over every model, article, lane, and page — keyboard-first, offline-safe, with ask/compare quick actions.

in this build

Telemetry ↔ evidence bridge

The cockpit now contrasts live resident-brain throughput against the published baselines that 49 deep-dives measured.

This Lab

A public window into the operator+AI build loop, with an operator-private margin (pin a note on any card when the sidecar is live).

in this build
Next queued 11

Ships inside the fieldkit wheel

pip install fieldkit[arena] → fieldkit arena up → the full cockpit at 127.0.0.1:7866/arena/demo/ — no clone, no npm. The app rides the package.

Lane swap from the pill

Mutate ~/.hermes/config.yaml from the LanePill to hot-swap the resident brain without leaving chat.

Two local lanes in compare

Today compare pits the resident brain against the OpenRouter frontier; v0.2 lets you duel two local quants head-to-head.

A/B regenerate on a turn

Regenerate a single assistant turn into a side-by-side variant instead of a hard replace.

Full history sidebar

Promote the session-switcher popover into a persistent left rail of prior chats.

Field-Fixing the Hermes Harness on a DGX Spark — When the NIM Won't Stream Tool Calls, and Other Rough Edges

Proposed deep-dive (placeholder).

LoRA on Nemotron Nano — Fine-tuning a 9B Without Blowing Unified Memory

Proposed deep-dive (placeholder).

Continued Pre-training on a DGX Spark — NeMo Framework Without a Cluster

Proposed deep-dive (placeholder).

Tracing a NIM Request with Nsight Systems — What the 24.8 tok/s Number Hides

Proposed deep-dive (placeholder).

Watching the GPU — DCGM, Prometheus, and a Local Grafana for the Spark

Proposed deep-dive (placeholder).

Synthetic Corpus Frameworks on the Spark — From a Bespoke Pipeline to an Orchestration Layer

Proposed deep-dive (placeholder).

Exploring open questions 4

gpt-oss-120b resident bakeoff

Whether a 120B MoE can repin as the resident brain if it clears the multi-step capacity wall the 30B-MoE hit on H6 (assumption A3).

arq + Redis job queue

A real background job surface for long eval sweeps, promoted from BackgroundTasks once the queue actually wires up (assumption A1).

HuggingFace Space demo

A static Arena preview as an Orionfold Space, matching where the models and datasets already live.

Blog search in the palette

Wire the second-brain MCP into ⌘K so "search the blog" returns ranked article chunks inline.

Built together 9 Arena commits · newest first
  1. 2026-05-28 v0.2 leap pt.1 — rebrand to Orionfold Arena, models browser, efficiency frontier, compare depth, ⌘K palette 716875d0
  2. 2026-05-28 v0.1.1 cockpit density + chat overhaul aa86d92d
  3. 2026-05-28 standalone web app shell + cockpit chrome polish f3de82bd
  4. 2026-05-28 M6 leaderboard + mirror exporter b22f2b4b
  5. 2026-05-28 M5 compare + rubric scorer 76993d57
  6. 2026-05-28 M4 chat against the resident brain 1a4e7000
  7. 2026-05-28 M3 telemetry SSE + cockpit gauge live d016d9bf
  8. 2026-05-28 M2 retroactive import (40 lanes + 17 bench rows + 55 articles) ee186de7
  9. 2026-05-28 M1 spec + skeleton (Cockpit series · spark-arena-v1) f6a6734e