Tag

#gradient-checkpointing

Articles tagged "gradient-checkpointing" — 1 entry.

Article №43 fine-tuning Foundation ~1 hour (one container, six gates, two GGUFs)
Machine that Builds Machines

Unsloth on the Spark — When the Train-Time Peak Equals the Base-Load Peak

Six gates clear in one container against the v1 reset: pip install --no-deps preserves the s40 stack, FastLanguageModel loads at 16.94 GB peak, a 100-step LoRA train holds the same envelope, save_pretrained_gguf() emits both quants in 207 seconds end-to-end.