patent-strategist-v3-nemo-gguf
Quantization of deepseek-ai/DeepSeek-R1-0528-Qwen3-8B .
What this model does
Patent prosecution work — claim construction, MPEP-grounded office-action responses, Markush analysis, doctrine-of-equivalents reasoning — happens inside firms that can't ship privileged client text to a hosted frontier API. This release distills DeepSeek-R1's chain-of-thought reasoning onto a 5,000-row synthetic patent-reasoning corpus so a single Spark-class box can run the workflow offline, with full IRAC-shaped reasoning chains.
Use cases
- Claim construction (Markush groups, doctrine of equivalents)
- MPEP-grounded office-action argument drafting
- Prior-art relevance + non-obviousness reasoning chains
- Patent-licensing scenario analysis (most-favored-licensee, FTO)
Audience — Patent attorneys, prosecution-team engineers, and IP-strategy teams running privileged workflows offline on Spark-class hardware (GB10, 128 GB unified memory) or comparable edge devices.
Spec matrix
Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.
| Variant | Perplexity ↓ | Spark tok/s ↑ | Vertical eval ↑ |
|---|---|---|---|
| Q4_K_M | 10.2415 | 39.57 | — |
| Q5_K_M Sweet spot | 10.0436 | 35.00 | — |
| Q6_K | 9.9624 | 30.66 | — |
| Q8_0 | 9.9288 | 26.51 | — |
Choosing this lane
Offline inference of the NeMo-trained model on llama.cpp / Ollama — four quants from Q4_K_M (fastest at 39.6 tok/s on Spark) up to Q8_0 (lowest perplexity, 9.93). Q5_K_M is the recommended balance of speed and fidelity. Reach for the BF16 LoRA sibling when you want the full-precision weights for transformers / vLLM GPU inference.
Methods
Read the field note Two Trainers, One LoRA: NeMo Framework Beats Unsloth by 26% on a Patent-Strategist Fine-Tune Same recipe, same R1-distilled base, same 5000-row patent corpus — once via Unsloth, once via NeMo Framework + Megatron-Bridge. NeMo finishes 26% faster and produces 44% longer patent-strategic chains. The cost is one YARN-defaults landmine and a stdout that lied for four hours. Open articleKnown drift
Disclosed limitations with explicit bounds — the scope is named, not implied.
- "metes-and-times" terminology
- Two known terminology drifts inherited from the v3 synthetic corpus; balance of probe answers (~99%) cite real MPEP sections.
- Fabricated MPEP §2163.05(s) citation
- Same scope as above — corpus-generator artifact, not a model-wide hallucination pattern. Real §2163.05 has subsections (a)–(f) on written-description support.