Tag

#h200

Articles tagged "h200" — 3 entries.

Article №27 foundations TensorRT-LLM ~22 minute read
Looking Beyond Spark

Looking Beyond Spark — KV-Cache Arithmetic at Inference

The serving memory bill is not weights. It's KV cache, and KV scales with concurrent users × context length, not parameters. Same four bills as training; different weights. A 70B at 32 users × 16k context wants 168 GB just for KV — and the Spark teaches you the per-token math.

uses fieldkit.capabilities

Article №24 training Foundation ~30 minute read · math + economics, no GPU required
Looking Beyond Spark

Derisking the Cloud Pretrain — How a $5K Spark Saves $50K on H100 Rentals

The Spark is too small for a serious pretrain — but it's the right size for the recipe-search that precedes one. Cull 100 candidate architectures down to 3 on one Spark for ~$1 of electricity, then book the cloud node knowing what to train. The expected savings per campaign run into the thousands.

Article №16 foundations Foundation ~25 minute read
Looking Beyond Spark

Looking Beyond Spark — Fine-Tuning a 100B Nemotron

A working answer to: how many GPUs to fine-tune a 100B Nemotron? Three methods, three memory footprints — full FT ≈ 1.6 TB needs 24× H100; LoRA ≈ 250 GB fits 8× H100; QLoRA ≈ 65 GB fits 1× H200. The Spark's 3B LoRA teaches the math.

uses fieldkit.capabilities