A rewrite of the voice inference pipeline ships in v2.3.1. 40ms shaved off the end-to-end round-trip, taking us from 280ms to 240ms median latency.
The wins came from three places: smarter chunking on the ASR side, removing a JSON serialization step in the inference broker, and pre-warming the TTS voice instance before the first user turn finishes.
Why this matters
40ms doesn't sound like much, but it compounds. On a 20-turn voice call, that's 800ms returned to the customer. More importantly, it pushes us under the 250ms threshold where conversation starts feeling natural rather than stilted.
The engineering write-up has the full story, including the failed approaches that led us here. Read the write-up.
No customer action required. Rolled out gradually over 48 hours; all customers now on the new pipeline.