GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
Active Lane idle no warm brain
OpenRouter $0.00 spend · since start
Unified · 60 s 8 GB guard band shown at top

← Models

What it's for
  • Give a Hermes (or Claude Code) agent a DGX Spark serving + routing playbook
  • Bring up one local serving lane inside the 128 GB envelope and tear it down cleanly (spark-serve)
  • Route a request to the right Orionfold domain-expert GGUF, one expert at a time (vertical-route)

Audience — DGX Spark power users wiring an agent harness to the box.

Quant economics quality × speed per build
Variant
spark-serve sweet spot
vertical-route
Known drift bounded · honest
  • Registry indexing latency published to GitHub (manavsehgal/spark-skills); skills.sh re-crawls the repo within ~24 hours of push.
  • Scope 2 skills at launch (spark-serve, vertical-route); the curated existing .claude/skills/ subset is portable but not yet republished.