← Quantizations
Quant · GGUF · 4 variants

patent-strategist-v3-nemo-gguf

Quantization of deepseek-ai/DeepSeek-R1-0528-Qwen3-8B .

HF Orionfold/patent-strategist-v3-nemo-GGUF License free apache-2.0 Trained with nemo Published

What this model does

Patent prosecution work — claim construction, MPEP-grounded office-action responses, Markush analysis, doctrine-of-equivalents reasoning — happens inside firms that can't ship privileged client text to a hosted frontier API. This release distills DeepSeek-R1's chain-of-thought reasoning onto a 5,000-row synthetic patent-reasoning corpus so a single Spark-class box can run the workflow offline, with full IRAC-shaped reasoning chains.

Use cases

  • Claim construction (Markush groups, doctrine of equivalents)
  • MPEP-grounded office-action argument drafting
  • Prior-art relevance + non-obviousness reasoning chains
  • Patent-licensing scenario analysis (most-favored-licensee, FTO)

Audience — Patent attorneys, prosecution-team engineers, and IP-strategy teams running privileged workflows offline on Spark-class hardware (GB10, 128 GB unified memory) or comparable edge devices.

Spec matrix

Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.

Variant Perplexity Spark tok/s Vertical eval
Q4_K_M 10.2415 39.57
Q5_K_M Sweet spot 10.0436 35.00
Q6_K 9.9624 30.66
Q8_0 9.9288 26.51

Choosing this lane

Offline inference of the NeMo-trained model on llama.cpp / Ollama — four quants from Q4_K_M (fastest at 39.6 tok/s on Spark) up to Q8_0 (lowest perplexity, 9.93). Q5_K_M is the recommended balance of speed and fidelity. Reach for the BF16 LoRA sibling when you want the full-precision weights for transformers / vLLM GPU inference.

Methods

Read the field note Two Trainers, One LoRA: NeMo Framework Beats Unsloth by 26% on a Patent-Strategist Fine-Tune Same recipe, same R1-distilled base, same 5000-row patent corpus — once via Unsloth, once via NeMo Framework + Megatron-Bridge. NeMo finishes 26% faster and produces 44% longer patent-strategic chains. The cost is one YARN-defaults landmine and a stdout that lied for four hours. Open article

Known drift

Disclosed limitations with explicit bounds — the scope is named, not implied.

"metes-and-times" terminology
Two known terminology drifts inherited from the v3 synthetic corpus; balance of probe answers (~99%) cite real MPEP sections.
Fabricated MPEP §2163.05(s) citation
Same scope as above — corpus-generator artifact, not a model-wide hallucination pattern. Real §2163.05 has subsections (a)–(f) on written-description support.

Other Orionfold variants