Field notes

Notes on evaluating voice systems.

Methodology drafts, benchmark write-ups, and engineering lessons from building repeatable checks for clinical and production Voice AI.

BenchmarksMay 20269 min

Five voice stacks, one consent script.

Same scenario, same pass/fail checks, different failure surfaces across realtime providers.

Benchmark
EngineeringApr 20267 min

Replay infrastructure for non-deterministic agents.

How to preserve comparable traces when every model run can choose a different path.

Draft
ProductApr 20265 min

Why we picked JSON Schema over Protobuf.

Evaluation specs need to be diffable, reviewable, and easy to author before they need binary speed.

Draft
Release notes

Get benchmark drops when the spec changes.