spark-hermes-vertical-router
One always-on brain, five specialists, zero LLM-classifier overhead.
Positioning
A Spark holds one strong model warm at a time. The pinned MoE is excellent at general agentic work but is not your domain expert. The five Orionfold vertical GGUFs are domain experts but compete for the same 128 GB envelope. A router picks per prompt: keyword-matched prompts get the right specialist (warm on demand, ~5–10 s), everything else stays with the brain.
- Route a Hermes agent prompt to a vertical specialist by keyword
- Reproduce the 30-prompt router-accuracy + per-vertical quality bench
- Embed a deterministic, auditable router into a Hermes config
Audience. DGX Spark power users running a local, no-API-key agent harness across multiple domains.
Lane variants
The harness profile records every serving lane that was driven through the same Hermes agent on the same DGX Spark. Throughput is recorded as measured tokens-per-second on the box.
Recommended lane. Default brain (MoE) — the lane the harness profile points at for an always-on Spark agent.
How to load
License: free · mit. Published to HuggingFace as a harness profile bundle (config files, lane recipe, doctor checklist).
from huggingface_hub import snapshot_download
local = snapshot_download("Orionfold/spark-hermes-vertical-router")
print(local) # local path to the harness bundle Known drift
Every measurement has a measurement window. These are the bounds the harness profile is honest about.
- Router-accuracy sample size
- router classification measured over 30 prompts (5 per vertical + 5 default-brain) — not a large-N guarantee.
- Keyword-set tuning
- vertical keywords were tuned against the 30 bench prompts (5 per vertical); out-of-distribution prompts may misroute.
- Per-vertical pass-rate basis
- 5 prompts per vertical; deterministic substring/regex rubrics — open-ended answers (haiku, drafted claims) marked vibe.
- One-at-a-time vertical serving
- verticals are served on demand on :8090 (~5–10s warm); the default brain stays warm on :8080 (always-on, ~32 GB).
Companion field note
The harness profile pairs with the field note hermes-vertical-router-on-spark — read the article for the lane-bakeoff narrative, the unified-memory math, and the tool-call reliability gate that decided the recommended lane.
Read the field note