Tag

#gpt

Articles tagged "gpt" — 1 entry.

Article №18 training NeMo ~3 hours — 90 min for two container pulls (PyTorch 30 GB, NeMo Framework Megatron Backend 70 GB), 30 min for the matched scripts, 10 min for the two pretrain runs and analysis
Machine that Builds Machines

NeMo Framework on the Spark — What It Earns Over a Hand-Rolled train.py

Same 354M GPT, same 100 steps, same random tokens — once in a hand-rolled train.py against vanilla PyTorch, once via Megatron-Core inside the NeMo Framework container. Same hardware (GB10, 128 GB unified). The framework earns +5.8% throughput and 30% less GPU memory.