v2.3.1 changelog

A rewrite of the voice inference pipeline ships in v2.3.1. 40ms shaved off the end-to-end round-trip, taking us from 280ms to 240ms median latency.

The wins came from three places: smarter chunking on the ASR side, removing a JSON serialization step in the inference broker, and pre-warming the TTS voice instance before the first user turn finishes.

Why this matters

40ms doesn't sound like much, but it compounds. On a 20-turn voice call, that's 800ms returned to the customer. More importantly, it pushes us under the 250ms threshold where conversation starts feeling natural rather than stilted.

The engineering write-up has the full story, including the failed approaches that led us here. Read the write-up.

No customer action required. Rolled out gradually over 48 hours; all customers now on the new pipeline.

Voice latency reduced from 280ms to 240ms

Why this matters

Voice latency reduced from 280ms to 240ms

Why this matters