GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
Active Lane idle no warm brain
OpenRouter $0.00 spend · since start
Unified · 60 s 8 GB guard band shown at top

← Models

Quant economics quality × speed per build
Variant Perplexity tok/s MedMCQA (n=50, mcq_letter)
Q4_K_M 16.550 43.6 0.42
Q5_K_M sweet spot 16.242 36.4 0.52
Q6_K 16.014 32.8 0.46
Q8_0 16.296 28.4 0.48
F16 16.268 15.9 0.48

Perplexity lower = better; tok/s measured on the DGX Spark (GB10, 128 GB unified).

Efficiency curve quality index × tok/s