← Quantizations
Quant · GGUF · 5 variants

ii-medical-8b-gguf

Quantization of Intelligent-Internet/II-Medical-8B .

HF Orionfold/II-Medical-8B-GGUF License free apache-2.0 Published

Spec matrix

Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.

Vertical bench: MedMCQA (n=50, mcq_letter)
Variant Perplexity Spark tok/s Vertical eval
Q4_K_M 16.5500 43.57 0.42
Q5_K_M Sweet spot 16.2418 36.36 0.52
Q6_K 16.0139 32.80 0.46
Q8_0 16.2957 28.42 0.48
F16 16.2676 15.94 0.48
Read the field note Orionfold/II-Medical-8B-GGUF on Spark — five medical-reasoning variants, MedMCQA mini-eval, ChatML reasoning format Open article