Tag

#trtllm

Articles tagged "trtllm" — 1 entry.

Upcoming dev-tools NVIDIA Nsight Systems + CUDA Toolkit planned ~4 hours including trace analysis

Tracing a NIM Request with Nsight Systems — What the 24.8 tok/s Number Hides

A planned kernel-level trace of a single NIM inference request on GB10. Where does the wall-clock time actually go — tokenization, KV-cache attention, the sampling loop, memcpy? The article turns 24.8 tokens per second into a timeline you can point at and say 'that line is the bottleneck'.