Stage

Training

When a single rig trains from scratch or continues pre-training — and when it should not. On-device training economics for an individual.

Article №53 fine-tuning Foundation 03 Jun 2026 ~16 min read — a synthesis of a proven run plus the engine it became

Machine that Builds Machines

The Machine Improves Itself — Closed-Loop RLVR on a DGX Spark, Where the Eval Harness Is the Reward

Closed-loop RLVR on one box: an eval→reward→fine-tune loop where the Spark's own verifiers ARE the reward — no learned reward model. The hero finding is defensive: pick the checkpoint on a frozen held-out split, never the training pool, or the loop reports success while it regresses.

uses fieldkit.rlfieldkit.rewardfieldkit.evalfieldkit.lineage

Article №36 fine-tuning NeMo 11 May 2026 ~30 min read

Machine that Builds Machines

Adaptive Turn Clipping on a Single Spark — A²TGPO, Studied from Source

A²TGPO redesigns how Information Gain feeds GRPO: turn-group normalization, variance-rescaled accumulation, and adaptive turn-level clipping. The paper's release is the code; the Spark's contribution is the lineage primitive that records what each trial learned.

uses fieldkit.capabilitiesfieldkit.trainingfieldkit.lineage

Article №35 agentic NeMo 10 May 2026 ~28 min read

Machine that Builds Machines

Reading the Lineage Primitive — cxcscmu Auto-Research, Studied from release_artifacts

cxcscmu's own lineage_on vs lineage_off ablation closes the case: same agent, same trial budget, same prompt template — only the rendered lineage block differs, and the run with lineage produces 5.3× more keeps and 3.2× less wall-time waste. This piece extracts that primitive into fieldkit.lineage.

uses fieldkit.capabilitiesfieldkit.trainingfieldkit.lineage

Article №34 fine-tuning NeMo 09 May 2026 ~18.5 hours wall (50 T²PO steps + three evals)

Frontier Scout

T²PO on Spark — When the Training Pool Says 28/32 and Held-out Says 9/158

T²PO's two deltas on the Phase 6 ClawGym harness: mean turns 5.00 → 4.61, task_complete 154/158, but the per-assertion ceiling stays flat at 47.7%. The strongest training-side step (45) is the worst held-out checkpoint — pool saturation lies on a single Spark.

uses fieldkit.capabilitiesfieldkit.evalfieldkit.training

Article №33 fine-tuning NeMo 05 May 2026 ~9 hours wall (34 GRPO steps + two evals)

Frontier Scout

ClawGym GRPO on Spark — Closing the Loop the SFT Adapter Couldn't

Phase 5 SFT taught the agent to keep working but never to stop. 34 GRPO steps with a shaped reward unlearn the failure mode — same model, same base, same LoRA-init, but task_complete climbs 0/158 → 154/158, mean turns drop 12 → 5, and per-assertion still inches up +3.1 pp.

Article №32 fine-tuning NeMo 05 May 2026 ~3 days end-to-end (mostly waiting on rollouts)

Frontier Scout

ClawGym on Spark — A 7B Base, A LoRA Adapter, and the +15 pp the Adapter Earned

ClawGym shipped only a .github profile, so we built the substrate ourselves — persona task synth, sandbox harness, 200-task corpus, LoRA SFT, matched-base eval. The adapter earns +3.8 pp task pass and +15.0 pp per-assertion against its own base. The diagnostic is the lift.

uses fieldkit.nim

Article №25 fine-tuning NeMo Customizer 01 May 2026 ~2 hours wall — 4 min LoRA training, 4 min race, the rest writing

Machine that Builds Machines

Distilling the Architect — A 3B LoRA Trained on the Agent's Own Trajectory

A4's 50-iter trajectory becomes training data for a Qwen2.5-3B LoRA proposer. Holding out 8 iters, the 3B mode-collapses onto d_model=768 (the trajectory's most-frequent keep) and matches 0 / 8 exact; the 8B at T=0.5 matches 4 / 8 of its own past picks.

Article №24 training Foundation 30 Apr 2026 ~30 minute read · math + economics, no GPU required

Looking Beyond Spark

Derisking the Cloud Pretrain — How a $5K Spark Saves $50K on H100 Rentals

The Spark is too small for a serious pretrain — but it's the right size for the recipe-search that precedes one. Cull 100 candidate architectures down to 3 on one Spark for ~$1 of electricity, then book the cloud node knowing what to train. The expected savings per campaign run into the thousands.

Article №23 foundations Foundation 25 Apr 2026 ~15 minute read · no GPU required

Looking Beyond Spark

What the Agent Actually Built — Five Articles in Plain English, and Why You Probably Don't Want to Train From Scratch

Five technical articles in one day built an unattended AI research loop on a desk for $0.02 of electricity. The plain-English readout: what the agent built (not a usable model), what it changes for one person, and a four-tier roadmap from LoRA in minutes to from-scratch in weeks.

Article №22 agentic NeMo 25 Apr 2026 ~3 hours — 90 min to scaffold the loop, 73 min for the unattended run, the rest is reading the trajectory

Machine that Builds Machines

The Autoresearch Loop — 50 Iterations of an LLM Editing Its Own Trainer Overnight

NIM Llama 3.1 8B drives a structured-perturbation agent loop against a 354M GPT pretrain. 50 iterations, 73.4 min wall, 0.07 kWh of electricity. 8 keeps, 42 reverts, 0 rail blocks, 0 crashes. Best result: val_bpb 10.8534, +0.93% over baseline at d_model=768.

Article №20 training NeMo 25 Apr 2026 ~2 hours — 5 min for the corpus pull, 45 min for a derived container build, 2 min for the Curator pipeline + 40s tokenize, 3 min for the 8-config sweep, the rest is reading the numbers

Machine that Builds Machines

The Data-Path Envelope — When Real Tokens Beat Random Tokens at Pretrain Throughput

Curator-cleaned wikitext-103 (109M tokens, 417 MiB packed) feeding the same 354M GPT pretrain loop from A2. Eight configs swept; data-path overhead is 0.01–0.04% across all of them. New peak: 14,980 tok/s — slightly above A2's random-token ceiling.

Article №19 training NeMo 25 Apr 2026 ~30 min once the NeMo container is on disk — 7.4 min wall for the 16-config sweep, the rest is reading the numbers

Machine that Builds Machines

The GB10 Pretrain Envelope — Sweeping Batch, Sequence, and Precision on One Spark

Same 354M GPT, same training loop, swept across micro-batch (2,4,8,16), sequence length (1024,2048), and precision (bf16,fp8). 16 configurations, 30 steps each. Peak: 14,266 tokens/sec at batch=16, seq=1024, fp8 — 18% above the hand-rolled PyTorch baseline.

Article №18 training NeMo 25 Apr 2026 ~3 hours — 90 min for two container pulls (PyTorch 30 GB, NeMo Framework Megatron Backend 70 GB), 30 min for the matched scripts, 10 min for the two pretrain runs and analysis

Machine that Builds Machines

NeMo Framework on the Spark — What It Earns Over a Hand-Rolled train.py

Same 354M GPT, same 100 steps, same random tokens — once in a hand-rolled train.py against vanilla PyTorch, once via Megatron-Core inside the NeMo Framework container. Same hardware (GB10, 128 GB unified). The framework earns +5.8% throughput and 30% less GPU memory.

Upcoming training NeMo Framework + Llama 3.1 8B planned ~2 days of wall-clock, one long weekend

Machine that Builds Machines

Continued Pre-training on a DGX Spark — NeMo Framework Without a Cluster

When does it make sense to continue pre-training on a single GB10 box, and when is it a category error? A planned run that pushes NeMo Framework, Megatron-LM parallelism, and BF16 mixed precision against the 128 GB unified-memory wall with a small domain corpus.