Posts

All the articles I've posted.

Proof you can't fake: shipping dual-vendor Intel TDX + NVIDIA CC attestation for AI training claims

12 Jul, 2026

I closed every trust gap in a real proof-of-training system: TDX measured-VM quotes, Intel DCAP signature verification, and NVIDIA JWKS-verified GPU tokens — live, not claimed.
Distilling frontier reasoning into an edge model — Unsloth + W&B in Colab, Axolotl on Blackwell

9 Jul, 2026

Distill Opus 4.6 / Qwen3.5 reasoning into ~14k SFT samples; prototype on Llama 3.2 3B in Colab with Unsloth + W&B, then Axolotl on RTX PRO 6000 Blackwell.
TRT-LLM vs vLLM vs llama.cpp on Blackwell — sparkinfer v0.3.7 owns decode on the RTX 5090

28 Jun, 2026

sparkinfer v0.3.7: ~30% faster decode than llama.cpp on RTX 5090 at the same LLM quality — 2.5 MB native binary. I develop and maintain it. Plus TRT-LLM vs vLLM vs llama.cpp on PRO 6000.
The confidential API: sending a prompt to a cloud GPU the provider can't read

25 Jun, 2026

HTTPS protects the wire, not the server. A confidential API binds an attested TEE to an ECDH session key — so prompts stay encrypted even from the provider.
Trust before secrets: confidential AI inference with CPU TEEs and GPU Confidential Computing

25 Jun, 2026

Renting a GPU means trusting someone else's host. Confidential computing flips it: attest the CPU TEE and GPU, then release API keys only to a verified CVM.
Training a DFlash drafter on a B200 — the real grind behind block-diffusion speculative decoding

24 Jun, 2026

I trained a DFlash drafter for a 72B on 2×B200 with the latest SGLang: the version-drift tax, the warm-start trap, accept 1.49→1.71, and what's next.

Posts

Proof you can't fake: shipping dual-vendor Intel TDX + NVIDIA CC attestation for AI training claims

Distilling frontier reasoning into an edge model — Unsloth + W&B in Colab, Axolotl on Blackwell

TRT-LLM vs vLLM vs llama.cpp on Blackwell — sparkinfer v0.3.7 owns decode on the RTX 5090

The confidential API: sending a prompt to a cloud GPU the provider can't read

Trust before secrets: confidential AI inference with CPU TEEs and GPU Confidential Computing

Training a DFlash drafter on a B200 — the real grind behind block-diffusion speculative decoding