AI Studio

Applied AI, with a long memory.

Where I keep my working notes on what actually works in production — and what only works in a demo.

Currently shipping

An eval-first workflow for LLM features that takes minutes, not weeks.

I'm packaging the playbook I use with teams into a small toolkit. If your AI feature keeps "feeling worse" after every prompt change, we should talk.

Agents

Practical multi-step agents that plan, call tools, and recover from failure.

Evals

Reproducible offline + online evals. Treat your eval set like a product.

Retrieval

Hybrid retrieval, reranking, and grounding with citation guarantees.

Small models

Distillation, quantization, and on-device deployment patterns.

Safety

Red-teaming, jailbreak monitors, and PII-aware logging at scale.

Cost & latency

Caching, speculative decoding, and routing — making AI bills sane.