Tag: roofline
All the articles with the tag "roofline".
-
The roofline model for LLM inference
Why single-stream LLM decode uses ~0.3% of an H100's tensor-core throughput, and how the roofline model explains nearly every inference optimization that matters.