Tag

#fine-tuning

Articles tagged "fine-tuning" — 14 entries.

Article №53 fine-tuning Foundation ~16 min read — a synthesis of a proven run plus the engine it became
Machine that Builds Machines

The Machine Improves Itself — Closed-Loop RLVR on a DGX Spark, Where the Eval Harness Is the Reward

Closed-loop RLVR on one box: an eval→reward→fine-tune loop where the Spark's own verifiers ARE the reward — no learned reward model. The hero finding is defensive: pick the checkpoint on a frozen held-out split, never the training pool, or the loop reports success while it regresses.

uses fieldkit.rlfieldkit.rewardfieldkit.evalfieldkit.lineage

Article №52 fine-tuning NeMo ~18 min read — synthesis of a multi-day greenfield-vertical build on one Spark
Machine that Builds Machines

The Gate Before the GPU — Deciding SFT vs RL vs RLVR Before You Spend the Run

Building Kepler — a numeric astrodynamics reasoner — from scratch on one Spark. The method choice (SFT vs RL vs RLVR) is decided by cheap gates before any GPU run: a base preflight, an SFT gate, and a Goldilocks headroom gate. A flawless RLVR run that changed nothing is the proof.

uses fieldkit.rlfieldkit.rewardfieldkit.eval

Article №44 fine-tuning NeMo ~16 hours wall (7h 34m Unsloth + 5h 38m NeMo + conversion + merge + probe)
Looking Beyond Spark

Two Trainers, One LoRA: NeMo Framework Beats Unsloth by 26% on a Patent-Strategist Fine-Tune

Same recipe, same R1-distilled base, same 5000-row patent corpus — once via Unsloth, once via NeMo Framework + Megatron-Bridge. NeMo finishes 26% faster and produces 44% longer patent-strategic chains. The cost is one YARN-defaults landmine and a stdout that lied for four hours.

Article №43 fine-tuning Foundation ~1 hour (one container, six gates, two GGUFs)
Machine that Builds Machines

Unsloth on the Spark — When the Train-Time Peak Equals the Base-Load Peak

Six gates clear in one container against the v1 reset: pip install --no-deps preserves the s40 stack, FastLanguageModel loads at 16.94 GB peak, a 100-step LoRA train holds the same envelope, save_pretrained_gguf() emits both quants in 207 seconds end-to-end.

Article №42 fine-tuning Foundation ~12 hours (2× 131-min trains + diagnosis)
Machine that Builds Machines

The Trainer Was Fine, the Corpus Wasn't: Three Misdiagnoses on a Patent-Specialist Fine-Tune

Five thousand rows of synthetic patent reasoning, two clean 131-minute LoRA trains, three rounds of confident diagnosis — and none of them found the bug. The bug was the corpus all along. A field report on the cheapest mistake to make on the Spark.

Article №34 fine-tuning NeMo ~18.5 hours wall (50 T²PO steps + three evals)
Frontier Scout

T²PO on Spark — When the Training Pool Says 28/32 and Held-out Says 9/158

T²PO's two deltas on the Phase 6 ClawGym harness: mean turns 5.00 → 4.61, task_complete 154/158, but the per-assertion ceiling stays flat at 47.7%. The strongest training-side step (45) is the worst held-out checkpoint — pool saturation lies on a single Spark.

uses fieldkit.capabilitiesfieldkit.evalfieldkit.training

Article №33 fine-tuning NeMo ~9 hours wall (34 GRPO steps + two evals)
Frontier Scout

ClawGym GRPO on Spark — Closing the Loop the SFT Adapter Couldn't

Phase 5 SFT taught the agent to keep working but never to stop. 34 GRPO steps with a shaped reward unlearn the failure mode — same model, same base, same LoRA-init, but task_complete climbs 0/158 → 154/158, mean turns drop 12 → 5, and per-assertion still inches up +3.1 pp.

Article №32 fine-tuning NeMo ~3 days end-to-end (mostly waiting on rollouts)
Frontier Scout

ClawGym on Spark — A 7B Base, A LoRA Adapter, and the +15 pp the Adapter Earned

ClawGym shipped only a .github profile, so we built the substrate ourselves — persona task synth, sandbox harness, 200-task corpus, LoRA SFT, matched-base eval. The adapter earns +3.8 pp task pass and +15.0 pp per-assertion against its own base. The diagnostic is the lift.

uses fieldkit.nim

Article №25 fine-tuning NeMo Customizer ~2 hours wall — 4 min LoRA training, 4 min race, the rest writing
Machine that Builds Machines

Distilling the Architect — A 3B LoRA Trained on the Agent's Own Trajectory

A4's 50-iter trajectory becomes training data for a Qwen2.5-3B LoRA proposer. Holding out 8 iters, the 3B mode-collapses onto d_model=768 (the trajectory's most-frequent keep) and matches 0 / 8 exact; the 8B at T=0.5 matches 4 / 8 of its own past picks.

Article №23 foundations Foundation ~15 minute read · no GPU required
Looking Beyond Spark

What the Agent Actually Built — Five Articles in Plain English, and Why You Probably Don't Want to Train From Scratch

Five technical articles in one day built an unattended AI research loop on a desk for $0.02 of electricity. The plain-English readout: what the agent built (not a usable model), what it changes for one person, and a four-tier roadmap from LoRA in minutes to from-scratch in weeks.

Article №16 foundations Foundation ~25 minute read
Looking Beyond Spark

Looking Beyond Spark — Fine-Tuning a 100B Nemotron

A working answer to: how many GPUs to fine-tune a 100B Nemotron? Three methods, three memory footprints — full FT ≈ 1.6 TB needs 24× H100; LoRA ≈ 250 GB fits 8× H100; QLoRA ≈ 65 GB fits 1× H200. The Spark's 3B LoRA teaches the math.

uses fieldkit.capabilities

Article №14 fine-tuning Hugging Face PEFT + Qwen2.5-3B-Instruct ~45 minutes end-to-end — 5 min corpus via NIM 8B, 69 s training, 3 min benchmark, plus a 6 GB base-model download
Second Brain

LoRA on Your Own Q&A — What 231 Pairs Actually Teach a 3B Model

231 own-voice Q&A pairs, a rank-16 LoRA, 69 s of training on a GB10 Spark. The adapter won't memorize your exact numbers, but it will take a model that refuses 61% of questions about your work and turn it into one that answers all of them in your voice. For facts you still need RAG.

uses fieldkit.eval

Upcoming fine-tuning NeMo Customizer + Nemotron Nano 9B v2 planned ~4 hours per sweep
LLM Wiki

LoRA on Nemotron Nano — Fine-tuning a 9B Without Blowing Unified Memory

A planned walk through LoRA fine-tuning on Nemotron Nano 9B with NeMo Customizer: rank and alpha sweeps, a tiny domain corpus, and the memory accounting that keeps a PEFT run from tripping the Spark's 128 GB unified-memory wall.

Upcoming fine-tuning Foundation planned ~45 min read
Machine that Builds Machines

Synthetic Corpus Frameworks on the Spark — From a Bespoke Pipeline to an Orchestration Layer

A bespoke synth pipeline got 200 rows into a 5000-row reasoning corpus before a fourth meta-state surface form forced a retreat. The diagnosis: a regex-floor approach cannot catch novel surface forms by construction. The fix is the open-source orchestration layer.