advisor-gguf
Quantization of nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 .
What this model does
A corpus-grounded advisor is only trustworthy if it cites exactly and refuses cleanly — and prompting alone doesn't get there: the 30B teacher prompted scored 8/21 on the frozen curveball bench with 3 fabrications, while this 4B-SFT-v0.2 lane scores 18/21 with refusals 9/9 and zero private-state leaks. The default Q4_K_M GGUF serves at ~70 tok/s on a DGX Spark (Q8_0 at 42 tok/s, identical bench behavior) at 2.6 GB, so the whole answer/refuse/route loop runs local. Sibling release Orionfold/Advisor-bench carries the frozen OOD bench (pool 75, held-out 28, curveballs 40+21) and the sha-pinned 182-source corpus manifest.
Use cases
- Governed Q&A over an enterprise or personal corpus with exact source-id citations
- Refusal-gated handling of out-of-corpus and private-state questions
- The local citation lane in a governed routing stack that escalates to a frontier model with receipts
Audience — Operators who want a corpus-grounded local advisor whose citation and refusal behavior is bench-proven (frozen OOD curveballs, strict scoring) — not a hosted assistant.
Spec matrix
Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.
| Variant | Perplexity ↓ | Spark tok/s ↑ | Vertical eval ↑ |
|---|---|---|---|
| Q4_K_M Sweet spot | — | 70.00 | 0.86 |
| Q8_0 | — | 42.00 | 0.86 |