Tag
#pytorch
Articles tagged "pytorch" — 2 entries.
Machine that Builds Machines
The GB10 Pretrain Envelope — Sweeping Batch, Sequence, and Precision on One Spark
Same 354M GPT, same training loop, swept across micro-batch (2,4,8,16), sequence length (1024,2048), and precision (bf16,fp8). 16 configurations, 30 steps each. Peak: 14,266 tokens/sec at batch=16, seq=1024, fp8 — 18% above the hand-rolled PyTorch baseline.
Machine that Builds Machines
NeMo Framework on the Spark — What It Earns Over a Hand-Rolled train.py
Same 354M GPT, same 100 steps, same random tokens — once in a hand-rolled train.py against vanilla PyTorch, once via Megatron-Core inside the NeMo Framework container. Same hardware (GB10, 128 GB unified). The framework earns +5.8% throughput and 30% less GPU memory.