Field notes

Notes on evaluating voice systems.

Methodology drafts, benchmark write-ups, and engineering lessons from building repeatable checks for clinical and production Voice AI.

MethodologyMay 20266 min

Why time-to-first-token is not enough.

A fast first packet can still hide repair loops, consent drift, and interruption failures.

Field noteFull post coming soon

BenchmarksMay 20269 min

Same scenario, same pass/fail checks, different failure surfaces across realtime providers.

Benchmark

EngineeringApr 20267 min

How to preserve comparable traces when every model run can choose a different path.

Draft

ProductApr 20265 min

Evaluation specs need to be diffable, reviewable, and easy to author before they need binary speed.

Draft