Tag
#rlvr
Articles tagged "rlvr" — 2 entries.
Machine that Builds Machines
The Machine Improves Itself — Closed-Loop RLVR on a DGX Spark, Where the Eval Harness Is the Reward
Closed-loop RLVR on one box: an eval→reward→fine-tune loop where the Spark's own verifiers ARE the reward — no learned reward model. The hero finding is defensive: pick the checkpoint on a frozen held-out split, never the training pool, or the loop reports success while it regresses.
uses fieldkit.rlfieldkit.rewardfieldkit.evalfieldkit.lineage
Machine that Builds Machines
The Gate Before the GPU — Deciding SFT vs RL vs RLVR Before You Spend the Run
Building Kepler — a numeric astrodynamics reasoner — from scratch on one Spark. The method choice (SFT vs RL vs RLVR) is decided by cheap gates before any GPU run: a base preflight, an SFT gate, and a Goldilocks headroom gate. A flawless RLVR run that changed nothing is the proof.
uses fieldkit.rlfieldkit.rewardfieldkit.eval