← Quantizations
Quant · GGUF · 2 variants

advisor-gguf

Quantization of nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 .

HF Orionfold/Advisor-GGUF License free other Published

What this model does

A corpus-grounded advisor is only trustworthy if it cites exactly and refuses cleanly — and prompting alone doesn't get there: the 30B teacher prompted scored 8/21 on the frozen curveball bench with 3 fabrications, while this 4B-SFT-v0.2 lane scores 18/21 with refusals 9/9 and zero private-state leaks. The default Q4_K_M GGUF serves at ~70 tok/s on a DGX Spark (Q8_0 at 42 tok/s, identical bench behavior) at 2.6 GB, so the whole answer/refuse/route loop runs local. Sibling release Orionfold/Advisor-bench carries the frozen OOD bench (pool 75, held-out 28, curveballs 40+21) and the sha-pinned 182-source corpus manifest.

Use cases

  • Governed Q&A over an enterprise or personal corpus with exact source-id citations
  • Refusal-gated handling of out-of-corpus and private-state questions
  • The local citation lane in a governed routing stack that escalates to a frontier model with receipts

Audience — Operators who want a corpus-grounded local advisor whose citation and refusal behavior is bench-proven (frozen OOD curveballs, strict scoring) — not a hosted assistant.

Spec matrix

Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.

Vertical bench: advisor curveball-v0.2, frozen OOD bench (n=21, scored==strict; refusals 9/9, 0 private-state risk)
Variant Perplexity Spark tok/s Vertical eval
Q4_K_M Sweet spot 70.00 0.86
Q8_0 42.00 0.86

Methods

Read the field note The Refusal Floor Is Trainable — What a Frozen Curveball Proved About Prompts vs Weights A 30B model with a hand-tuned prompt contract refused 3 of 9 adversarial pretexts and fabricated private-looking state 3 times. A 4B trained for 21 minutes refused 9 of 9. The bench that saw the difference was frozen before training — and that discipline is the whole method. Open article

Other Orionfold variants