Why time-to-first-token is not enough.
A fast first packet can still hide repair loops, consent drift, and interruption failures.
Field noteFull post coming soon
Methodology drafts, benchmark write-ups, and engineering lessons from building repeatable checks for clinical and production Voice AI.
A fast first packet can still hide repair loops, consent drift, and interruption failures.
Same scenario, same pass/fail checks, different failure surfaces across realtime providers.
How to preserve comparable traces when every model run can choose a different path.
Evaluation specs need to be diffable, reviewable, and easy to author before they need binary speed.