Tag

#qwen2.5

Articles tagged "qwen2.5" — 1 entry.

Article №25 fine-tuning NeMo Customizer 01 May 2026 ~2 hours wall — 4 min LoRA training, 4 min race, the rest writing

Distilling the Architect — A 3B LoRA Trained on the Agent's Own Trajectory

A4's 50-iter trajectory becomes training data for a Qwen2.5-3B LoRA proposer. Holding out 8 iters, the 3B mode-collapses onto d_model=768 (the trajectory's most-frequent keep) and matches 0 / 8 exact; the 8B at T=0.5 matches 4 / 8 of its own past picks.