Name: ii-medical-8b-notebooks
Published: 2026-05-23T17:30:00Z
License: open

Build it

Use it

What this notebook does

The artifact → card → article loop sells the outcome but offers no runnable on-ramp: a researcher who wants to reproduce the five-variant quant, or a developer who wants to call the model, has to reconstruct the journey from prose. These two notebooks close that gap. The builder notebook walks the feasibility → quantize → measure → publish journey as typed fieldkit API calls; the user notebook calls II-Medical-8B on real clinical-reasoning tasks and surfaces its <think> chains. Both are one-click via Open in Colab / Open in Kaggle and run offline on a DGX Spark — no patient text leaves the box.

Use cases

Builder: reproduce the release — feasibility envelope, quantize sweep, the Spark-tested quad + variants table, publish — as fieldkit calls
User: differential diagnosis, management-and-contraindication reasoning, drug-interaction analysis, and documented second opinions with the <think> chain surfaced
User: ground answers in guideline text with fieldkit.rag and gate the MCQ shape with fieldkit.eval.mcq_letter
Both: run offline on a DGX Spark or on a free Colab / Kaggle GPU (dual-path, runtime-detected)

Audience — AI researchers and engineers who want to reproduce the quant, and clinicians, medical educators, and health-app developers who want a private, on-device reasoning assistant — on Spark-class hardware (GB10, 128 GB unified memory) or a free cloud GPU.

Choosing the variant

Two facets of the same notebook — pick by your goal.

builder: Walks the build journey on Spark — fieldkit API calls replacing ad-hoc scripts; surfaces speed, feasibility, and viability.
user: Demonstrates the published model on realistic domain tasks — runtime-detected, runs on Spark or on a free Colab/Kaggle GPU.

Methods

Read the field note Orionfold/II-Medical-8B-GGUF on Spark — five medical-reasoning variants, MedMCQA mini-eval, ChatML reasoning format Five GGUF variants of Intelligent-Internet/II-Medical-8B (Qwen3-8B + DAPO reasoning recipe) measured on a DGX Spark. Q5_K_M lands at 36.4 tok/s, 5.45 GB, and 52% on a MedMCQA n=50 mini-eval — above F16. First reasoning recipe in the series. Open article

Known drift

Bounded limitations — Colab/Kaggle runs use the published quant; reasoning quality may differ from the BF16 weights on Spark. Each entry carries an explicit bound.

Cloud (Colab / Kaggle) path serves the Q4_K_M quant; the Spark path serves Q5_K_M: One quant level apart, and the medical bench is the wider gap — Q4_K_M scores 42% on the MedMCQA n=50 mini-eval vs Q5_K_M's 52% (10 points, ~1.4× the binomial noise floor); both run the identical code path. See the sibling GGUF card.
The builder notebook's quantize + publish steps render the recorded Spark run, not a live re-execution: 2 recorded Spark-only cells (the quantize sweep and the publish dry-run); the remaining cells — feasibility envelope, the spark_quad panel, and the variants table — run live on any runtime from the manifest.
The user notebook's live model-chat cells are not captured in the published marketing snapshot: 4 use-case cells call the model live on any runtime; the snapshot defers their capture pending a serving-detok check, so the reasoning-chain output is described, not screenshotted (the deterministic charts + banners are captured).

Sibling artifacts

The model this notebook targets, plus other variants in the same family.

ii-medical-8b-gguf Target model — quantization