AI Studio

Applied AI, with a long memory.

Where I keep my working notes on what actually works in production — and what only works in a demo.

Currently shipping

An eval-first workflow for LLM features that takes minutes, not weeks.

I'm packaging the playbook I use with teams into a small toolkit. If your AI feature keeps "feeling worse" after every prompt change, we should talk.

Practical multi-step agents that plan, call tools, and recover from failure.

Reproducible offline + online evals. Treat your eval set like a product.

Hybrid retrieval, reranking, and grounding with citation guarantees.

Distillation, quantization, and on-device deployment patterns.

Red-teaming, jailbreak monitors, and PII-aware logging at scale.

Caching, speculative decoding, and routing — making AI bills sane.