GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
throughput & first-token from the active lane
Active Lane idle no warm brain
OpenRouter $0.00 spend · session
Reward signal eval-is-reward · verifier gauge · AV-R1 watch · read-only — never dispatches

One gauge across the build: the SFT-init step-0 baseline (does the warm-start produce boxed, scorable answers?) and the live rl_run reward as it trains (AE-1). The verifier IS the reward.