GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
Active Lane idle no warm brain
OpenRouter $0.00 spend · since start
Unified · 60 s 8 GB guard band shown at top

← Models

What it's for
  • Builder: reproduce the build — baseline, corpus gates, backend decision, train, probe, quantize, publish — as fieldkit calls
  • User: claim construction, MPEP-grounded office-action drafting, prior-art relevance, and licensing-scenario analysis with reasoning chains surfaced
  • User: ground answers in MPEP text with fieldkit.rag and gate quality with fieldkit.eval scorers
  • Both: run the whole workflow offline on a DGX Spark or on a free Colab / Kaggle GPU (dual-path, runtime-detected)

Audience — AI researchers and engineers who want to reproduce the build, and app developers who want to call the model — on Spark-class hardware (GB10, 128 GB unified memory) or a free cloud GPU.

Quant economics quality × speed per build
Variant
builder sweet spot
user
Known drift bounded · honest
  • The user notebook pins the Q5_K_M quant on both the Spark and cloud paths Q5_K_M is the fast+accurate sweet spot — 10.04 wikitext perplexity at 35 tok/s on a GB10, within 0.8% of Q6_K's accuracy. Heavier/lighter variants are one keyword away; see the sibling GGUF card for the full matrix.
  • The builder notebook's heavy steps (baseline, corpus, train, probe, quantize) render the recorded Spark run, not a live re-execution 5 recorded Spark-only cells; the remaining cells (feasibility envelope, backend decision, the four viz figures, publish dry-run) run live on any runtime.