harness
One always-on brain, five specialists, zero LLM-classifier overhead.
A Spark holds one strong model warm at a time. The pinned MoE is excellent at general agentic work but is not your domain expert. The five Orionfold vertical GGUFs are domain experts but compete for the same 128 GB envelope. A router picks per prompt: keyword-matched prompts get the right specialist (warm on demand, ~5–10 s), everything else stays with the brain.
- Route a Hermes agent prompt to a vertical specialist by keyword
- Reproduce the 30-prompt router-accuracy + per-vertical quality bench
- Embed a deterministic, auditable router into a Hermes config
Audience — DGX Spark power users running a local, no-API-key agent harness across multiple domains.
| Variant |
|---|
| Patent prosecution |
| Legal reasoning |
| Financial analysis |
| Defensive cyber |
| Clinical reasoning |
| Default brain (MoE) sweet spot |
- Router-accuracy sample size router classification measured over 30 prompts (5 per vertical + 5 default-brain) — not a large-N guarantee.
- Keyword-set tuning vertical keywords were tuned against the 30 bench prompts (5 per vertical); out-of-distribution prompts may misroute.
- Per-vertical pass-rate basis 5 prompts per vertical; deterministic substring/regex rubrics — open-ended answers (haiku, drafted claims) marked vibe.
- One-at-a-time vertical serving verticals are served on demand on :8090 (~5–10s warm); the default brain stays warm on :8080 (always-on, ~32 GB).