About

I’m Jared Frost, a GPU/AI kernel engineer. I come from a hardware engineering background and work primarily in C++ and CUDA, down at the level where performance is won or lost: memory hierarchies, warp scheduling, occupancy, and the gap between a kernel’s theoretical and achieved throughput.

My focus is LLM inference and GPU acceleration, end to end — from the NVIDIA Jetson Orin Nano at the edge to multi-GPU Hopper in the datacenter.

What I build

A selection of what I’m working on (more on the projects page):

genie-ai-runtime — a from-scratch AI inference engine for the Jetson Orin Nano, specialized for Qwen 3.5 4B and shipping in the GeniePod product. Faster than llama.cpp on the same silicon.
hopper-qwen-72b — hand-written CUDA kernels serving Qwen 2.5 72B Instruct across 2×/4×/8× Hopper GPUs (TP=8), built to beat vLLM throughput at the kernel level.
triton-vm-prover — an 11,000-line C++/CUDA zk-STARK prover for Triton VM on the RTX PRO 6000, designed from scratch and ~10× faster than the top-ranked CPU prover. GPU acceleration beyond machine learning.
tether — deploying and optimizing Vision-Language-Action models on Jetson for real-time robotics.
genie-claw, plus maintaining OpenClaw — low-latency AI-agent harnesses.

The common thread: take a workload and squeeze the hardware — a 4B model on 8 GB of edge memory, a 72B model sharded across NVLink, or a STARK prover saturating a workstation GPU.

This blog

A public lab notebook. Deep technical write-ups, build logs, and reproducible benchmarks — numbers with the setup attached, the why explained rather than just the what, and peak vs. achievable kept honest.

The mission

Near term, I’m working toward becoming a top-tier AI kernel engineer for NVIDIA GPUs — the person who takes an inference workload and recovers the last fraction of a percent.

Longer term, that feeds a bigger goal I track in the open: master AI inference, AI agent harness systems, and hardware engineering — then design a physical AI chip. My ai-hardware-engineer-roadmap repo is the plan and my daily learning log (200+ stars); this blog is the lab notebook that fills it in — see why I frame efficient AI in three layers.

If that’s interesting to you, or you’re hiring for exactly this kind of work, let’s talk.

Contact

GitHub — github.com/ai-hpc
LinkedIn — Jared Frost
Email — jared@fastcrest.com