Tag
#reinforce
Articles tagged "reinforce" — 1 entry.
Frontier Scout
ClawGym GRPO on Spark — Closing the Loop the SFT Adapter Couldn't
Phase 5 SFT taught the agent to keep working but never to stop. 34 GRPO steps with a shaped reward unlearn the failure mode — same model, same base, same LoRA-init, but task_complete climbs 0/158 → 154/158, mean turns drop 12 → 5, and per-assertion still inches up +3.1 pp.