Posts
All the articles I've posted.
-
The roofline model for LLM inference
Why single-stream LLM decode uses ~0.3% of an H100's tensor-core throughput, and how the roofline model explains nearly every inference optimization that matters.
-
Building in public: a commitment to mastering GPU kernels
Why I'm documenting my path to becoming a top-tier AI kernel engineer for NVIDIA GPUs — in the open, with code and numbers.