Tag

#grpo

Articles tagged "grpo" — 5 entries.

Article №36 fine-tuning NeMo ~30 min read
Machine that Builds Machines

Adaptive Turn Clipping on a Single Spark — A²TGPO, Studied from Source

A²TGPO redesigns how Information Gain feeds GRPO: turn-group normalization, variance-rescaled accumulation, and adaptive turn-level clipping. The paper's release is the code; the Spark's contribution is the lineage primitive that records what each trial learned.

uses fieldkit.capabilitiesfieldkit.trainingfieldkit.lineage

Article №34 fine-tuning NeMo ~18.5 hours wall (50 T²PO steps + three evals)
Frontier Scout

T²PO on Spark — When the Training Pool Says 28/32 and Held-out Says 9/158

T²PO's two deltas on the Phase 6 ClawGym harness: mean turns 5.00 → 4.61, task_complete 154/158, but the per-assertion ceiling stays flat at 47.7%. The strongest training-side step (45) is the worst held-out checkpoint — pool saturation lies on a single Spark.

uses fieldkit.capabilitiesfieldkit.evalfieldkit.training

Article №33 fine-tuning NeMo ~9 hours wall (34 GRPO steps + two evals)
Frontier Scout

ClawGym GRPO on Spark — Closing the Loop the SFT Adapter Couldn't

Phase 5 SFT taught the agent to keep working but never to stop. 34 GRPO steps with a shaped reward unlearn the failure mode — same model, same base, same LoRA-init, but task_complete climbs 0/158 → 154/158, mean turns drop 12 → 5, and per-assertion still inches up +3.1 pp.

Article №32 fine-tuning NeMo ~3 days end-to-end (mostly waiting on rollouts)
Frontier Scout

ClawGym on Spark — A 7B Base, A LoRA Adapter, and the +15 pp the Adapter Earned

ClawGym shipped only a .github profile, so we built the substrate ourselves — persona task synth, sandbox harness, 200-task corpus, LoRA SFT, matched-base eval. The adapter earns +3.8 pp task pass and +15.0 pp per-assertion against its own base. The diagnostic is the lift.

uses fieldkit.nim

Upcoming agentic NemoClaw ~30 min read
Machine that Builds Machines

SkillOS: Learning Skill Curation for Self-Evolving Agents — Spark reproduction notes

Reproducing the SkillOS curator/executor split on a DGX Spark — both Qwen3-8B (frozen executor + LoRA-trained curator) over a markdown SkillRepo with BM25 retrieval, then extracting the pattern into `fieldkit.skills`.