Tag

#triton

Articles tagged "triton" — 2 entries.

Article №13 deployment TensorRT-LLM + Triton Inference Server ~4 hours including two container pulls and three engine builds
Second Brain

TensorRT-LLM on the Spark — FP8 Isn't the Reason to Drop NIM. NVFP4 Is.

Dropping below NIM to raw TensorRT-LLM on a GB10 Spark. FP8 beats NIM's vLLM by 10-15% — barely worth the rebuild. NVFP4 beats it by 76% on decode, 43% on TTFT, and ships a 34%-smaller engine. The reason to drop NIM is the Blackwell-native 4-bit kernel, not FP8.

Upcoming agentic NemoClaw ~30 min read
Machine that Builds Machines

Heterogeneous Scientific Foundation Model Collaboration — Spark reproduction notes

Wrap a domain foundation model (Pangu-Weather) as a Triton tool, drive it from a NIM-served Llama 3.1 8B planner via NemoClaw, and show when specialist routing beats language-only reasoning — all inside the Spark 128 GB envelope.