← Products Jun 2026

Orionfold Advisor

A governed local AI advisor over your corpus — exact citations, trusted refusals, and a receipt for every claim.

Cockpit NVIDIA DGX Spark Manav Sehgal

What it took to build

29.9h
Wall-clock
two days (~30 hours) from proof-spec to promoted serving lane
4,307
Lines of code
authored, bundles excluded
54
Tests
7 surfaces shipped
10
Sessions
871
Agent turns
645k
Tokens generated
97%
Cache hits
Built with Claude Fable 5 now: Claude Fable 5 Claude Code

An advisor that knows where its answers come from

Orionfold Advisor is a governed local AI advisor for your enterprise corpus. It runs on one NVIDIA DGX Spark and it is not a model — it is an appliance: a fine-tuned NVIDIA-native model lane, a recall-gated retriever, a deterministic frontier router with a spend cap, a corpus pack you can swap, and the Orionfold Arena cockpit as its operating surface. Ask it a question and it answers from retrieved sources, citing the exact source id every claim came from. Ask it something the corpus doesn’t contain — or something about private operator state — and it refuses, with empty citations, even when the question arrives dressed as an urgent exception, a roleplay, a false premise, or an instruction to miscite.

It is for the operator who needs corpus answers they can audit, and for the builder who wants the receipts: every gate the Advisor passed on its way to serving — retrieval recall, generator preflight, frozen out-of-distribution benches, routing accuracy, the publish verdict itself — is a tracked artifact you can re-run, rendered on cards in the cockpit.

What it unlocks

The hard part of a corpus assistant was never fluent answers — it’s trustable ones. A grounded answer with a paraphrased citation is unauditable; a confident answer to a question the corpus can’t support is worse. The Advisor makes both failure modes measurable, then trains them away: citation discipline means the cited id is drawn from the retrieved set (an alias like Source 2 is a strict failure), and the refusal floor is measured against adversarial pretexts that were frozen before training, so the number can’t flatter itself.

That discipline turned out to be trainable where prompting wasn’t. The most important measurement of the build: on the frozen curveball bench, the 30B model with a carefully engineered prompt contract scored 8/21 — and fabricated private-looking state on 3 rows. The 4B model fine-tuned for the behavior scored 18/21 with refusals 9/9 and zero private-state risk. The trained lane, not the prompt, carries the refusal floor. That’s a result you can only get cheaply when training, serving, and evaluation all live on the same box — the whole SFT run was ~21 minutes on the Spark, and the candidate was behind the same eval surface minutes later.

And because the corpus pack is a swappable unit — manifest, trust tiers, gold eval seed, refusal cases, rebuild gates — the same harness that advises over Orionfold’s public corpus today can take a customer’s corpus tomorrow. That swap was proven, not promised: a synthetic three-source fixture pack went through the unchanged recall scorer and passed its gate at 1.0@5.

The build story: gates first, training second

The Advisor went from proof spec to promoted serving lane in two days (~30 wall-clock hours, 31 commits). The infographic above is the honest receipt of those two days: 10 Claude Code sessions, 871 assistant turns, 118.9M tokens processed — 97.4% of them served from prompt cache — and 645k tokens actually generated, all on Claude Fable 5. The 4,307 lines are the authored Advisor harness (corpus generation, recall scorers, the preflight gate, the route bakeoff, the SFT pipeline, the receipt assembler); the cockpit cards and eval drawer ride the existing Arena codebase, with 54 advisor-specific test cases alongside.

The order of operations is the story. Before any training: a 182-source public corpus manifest with a frozen 28-row held-out bench; retrieval recall gates on two lanes (BM25 proxy 0.9885@5, live pgvector + NIM embedder 0.977@5, 1.0 on held-out answerable); then a raw-floor preflight of candidate bases through the visible cockpit. Two NVIDIA bases failed that floor honestly (one leaked its reasoning chain on 7 of 8 rows) before the 30B and the 4B cleared it. Only then came SFT — a teacher-verified corpus generated by the 30B on-box, trained into the 4B with NeMo LoRA in ~21 minutes, quantized to Q8_0, and launched behind the same gates.

Twice, the gates caught what would have silently shipped. A corpus rebuild pulled the Advisor’s own proof spec into its retrieval index — the bench’s erratum text started surfacing in eval context, and the full gate re-run flagged it before promotion (proof-control documents are now excluded at generation time). And the first SFT pass fixed citation discipline but regressed the refusal floor under novel pretexts, 14/15 → 9/15 — caught by the frozen curveball, fixed in v0.2 by training three new hint-free refusal families, and verified at 15/15 on the class that regressed. A pre-registered bench you can’t touch is the only reason either failure has a number attached.

The promotion itself is a script, not a sentiment: the receipt assembler reads every tracked receipt and fails if the evidence stops supporting a gate claim. It assembled nine green gates and a verdict — PROMOTED: the 4B-SFT-v0.2 lane — with the 30B retained as teacher and comparison lane, and four other candidates recorded as rejected, with reasons.

The feature tour

A one-click generator gate

The Advisor preflight card — 8 packets, reasoning off, all rows passing, scored receipt below

The citation/refusal gate as a button: 8 packets against the active lane, pass/fail per row.

The Cortex pane’s preflight card runs the 8-packet citation/refusal gate against the active lane — one button for the default run, one for reasoning-off — and renders pass/fail per row with the receipt underneath. Every base model the Advisor considered went through this card in the visible cockpit; the screenshot shows the promoted lane’s 8/8.

The corpus pack as a visible unit

The corpus pack card — 182 sources, manifest hash, both recall gates green, swap fixture, SFT handoffs

The corpus with a hash and gates, not a folder of files.

The Advisor’s knowledge boundary is a governed unit: its card shows the manifest (182 sources, 6b1e832d099c), both recall gates, the corpus-swap fixture proof, and the SFT corpus handoffs that were generated from it — the chain from “what the Advisor knows” to “what the model was trained on” on one card.

Governed escalation with a visible bill

The routing & cost card — three configs at 28/28, $0 to $0.0033, zero private-state escapes

Every hosted call has a tier, a provider, a cost, and a verdict — and a policy that keeps private state home.

The router is deterministic and observables-only: it escalates on detectable failure signals (a citation outside the retrieved set, a rank-sanity anomaly), never on vibes. The card shows each measured config with its score and bill, the per-escalation ledger, and the governance line: hosted models are an allow-list with a dollar cap, and private-state queries are blocked from egress by policy.

A publish verdict you can re-verify

The publish receipt card — PROMOTED, nine gate chips, the why-promoted drawer open

Promotion as evidence, not announcement: nine gates, each chip backed by a named receipt.

The §14 receipt card renders the promotion: the lane, nine gate chips each backed by a named evidence file, and the drawer explaining why this lane won and what happened to the other lanes considered. Re-running the assembler re-verifies every claim against the receipts on disk.

Guarded lane swaps

LaneTruth — the promoted 4B lane ACTIVE on , guarded launch form below

One resident model is the law of unified memory; LaneTruth makes it visible and safe.

One model resident at a time is the law of 128 GB unified memory, and LaneTruth enforces it visibly: teardown-first confirmation, a guarded pre-flight on every launch, and an anchor on warm. The Advisor’s promotion was executed here — the 30B (~40 GB, 14 s warm) torn down, the trained 4B (~12 GB, 2 s warm) anchored in its place.

Every measured bench row, pickable in chat

The eval drawer — Orionfold Advisor bench selected, 89 rows with family filters

The receipts’ exact packets, replayable from the chat composer.

All 89 measured Advisor rows — the frozen held-out and both curveballs — are pickable in the chat and compare eval drawer, replaying the exact packets the published receipts were scored on, system contract included. The bench rider even replays the measured reasoning control, so what you see in chat is what the receipts measured.

Refusals scored live at the wire

A live private-state refusal — empty citations, scored 100% against the deterministic contract

A refusal you can score the moment it lands — no judge model, no rubric drift.

The deterministic contract scorer runs on every eval turn as it lands: this row asks about live operator state, the serving lane refuses with empty citations, and the turn scores 100% instantly — refusal wording, empty citations, no private-state echo, no thinking leak, each a visible check.

Built on the substrate

The Advisor is the thinnest possible new layer over what the Spark stack already proved. Retrieval rides Orionfold Cortex (pgvector + the NIM embedder, with the recall-gate discipline from the Cortex launch). The operating surface is Orionfold Arena — LaneTruth, the jobs board, the eval drawer, telemetry — extended with four read-only Advisor cards. Training is the NeMo LoRA lane; quantization and publishing ride fieldkit.quant and fieldkit.publish, and the model card’s measurement-first shape is the same one every Orionfold artifact ships with. The newly authored code is the harness itself: packet construction, deterministic scorers, the route bakeoff, the receipt assembler.

The published artifacts close the loop: the model is Orionfold/Advisor-GGUF, the benches and corpus manifest are Orionfold/Advisor-bench, and every receipt quoted on this page is tracked in the public repo under evidence/orionfold-advisor/.

The workflow, generalized

The pattern under the product is reusable for any vertical: freeze a bench before you train; gate retrieval before you gate generation; preflight the raw base before you spend a training hour; let a deterministic router — not hope — decide when the frontier gets consulted; and make promotion a script that reads receipts. The corpus pack makes the whole loop portable — same harness, same gates, different corpus. One Spark ran every step: corpus generation by a local teacher, training, quantization, serving, evaluation, and the decision.

Get it

  • Run the model: Orionfold/Advisor-GGUF — Q8_0, ~12 GB resident, with the packet contract and reasoning-off serving notes on the card.
  • Score your own lane: Orionfold/Advisor-bench — the pool, the frozen held-out, both curveballs, and the corpus manifest they resolve against.
  • Read the receipts: evidence/orionfold-advisor/ — every number on this page, re-runnable.

The cockpit it runs in is the same Orionfold Arena that ships with fieldkit on PyPI.