Name: patent-strategist-v3-nemo-gguf
Published: 2026-05-22T01:26:26Z
License: open

What this model does

Patent prosecution work — claim construction, MPEP-grounded office-action responses, Markush analysis, doctrine-of-equivalents reasoning — happens inside firms that can't ship privileged client text to a hosted frontier API. This release distills DeepSeek-R1's chain-of-thought reasoning onto a 5,000-row synthetic patent-reasoning corpus so a single Spark-class box can run the workflow offline, with full IRAC-shaped reasoning chains.

Use cases

Claim construction (Markush groups, doctrine of equivalents)
MPEP-grounded office-action argument drafting
Prior-art relevance + non-obviousness reasoning chains
Patent-licensing scenario analysis (most-favored-licensee, FTO)

Audience — Patent attorneys, prosecution-team engineers, and IP-strategy teams running privileged workflows offline on Spark-class hardware (GB10, 128 GB unified memory) or comparable edge devices.

Spec matrix

Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.

Variant	Perplexity ↓	Spark tok/s ↑	Vertical eval ↑
Q4_K_M	10.2415	39.57	—
Q5_K_M Sweet spot	10.0436	35.00	—
Q6_K	9.9624	30.66	—
Q8_0	9.9288	26.51	—

Choosing this lane

Offline inference of the NeMo-trained model on llama.cpp / Ollama — four quants from Q4_K_M (fastest at 39.6 tok/s on Spark) up to Q8_0 (lowest perplexity, 9.93). Q5_K_M is the recommended balance of speed and fidelity. Reach for the BF16 LoRA sibling when you want the full-precision weights for transformers / vLLM GPU inference.

Methods

Read the field note Two Trainers, One LoRA: NeMo Framework Beats Unsloth by 26% on a Patent-Strategist Fine-Tune Same recipe, same R1-distilled base, same 5000-row patent corpus — once via Unsloth, once via NeMo Framework + Megatron-Bridge. NeMo finishes 26% faster and produces 44% longer patent-strategic chains. The cost is one YARN-defaults landmine and a stdout that lied for four hours. Open article

Known drift

Disclosed limitations with explicit bounds — the scope is named, not implied.

"metes-and-times" terminology: Two known terminology drifts inherited from the v3 synthetic corpus; balance of probe answers (~99%) cite real MPEP sections.
Fabricated MPEP §2163.05(s) citation: Same scope as above — corpus-generator artifact, not a model-wide hallucination pattern. Real §2163.05 has subsections (a)–(f) on written-description support.

Other Orionfold variants

patent-strategist-v3-nemo Full-precision BF16 weights of the same NeMo fine-tune for transformers / vLLM GPU inference.