Tag

#reinforce

Articles tagged "reinforce" — 1 entry.

Article №33 fine-tuning NeMo 05 May 2026 ~9 hours wall (34 GRPO steps + two evals)

ClawGym GRPO on Spark — Closing the Loop the SFT Adapter Couldn't

Phase 5 SFT taught the agent to keep working but never to stop. 34 GRPO steps with a shaped reward unlearn the failure mode — same model, same base, same LoRA-init, but task_complete climbs 0/158 → 154/158, mean turns drop 12 → 5, and per-assertion still inches up +3.1 pp.