One gauge across the build: the SFT-init step-0 baseline (does the warm-start produce boxed, scorable answers?) and the live rl_run reward as it trains (AE-1). The verifier IS the reward.
fieldkit arena serve on the Spark to feed this rail.
One gauge across the build: the SFT-init step-0 baseline (does the warm-start produce boxed, scorable answers?) and the live rl_run reward as it trains (AE-1). The verifier IS the reward.