AIVA 2.0: the rebuild.

In early 2025 we made a call that most startups never make voluntarily: we stopped shipping features for four months and rebuilt the entire product from scratch. This is the story of why we did it, what we changed, and what it cost us.

The original AIVA was built fast. We had a working voice agent in six weeks, customers in production in three months. The code reflected that. We'd glued together four third-party services — a transcription API, a language model, a TTS engine, and Twilio — with a thin orchestration layer that was held together mostly by optimism. It worked. Until it didn't.

The performance ceiling

By mid-2025 we had 80 customers and were hitting walls we'd built ourselves. Voice latency was stubbornly stuck at 380ms average. Adding a new language meant touching five different configuration files and praying nothing broke. Our infrastructure bill was growing faster than our revenue because we were paying for three services to do what one well-designed system could do. And our on-call rotation was a nightmare — when something went wrong at 2am it was never obvious which of four vendors was the culprit.

The performance ceiling was the most visible problem. 380ms sounds fast, but in voice it isn't. A human conversation flows at under 200ms of response lag. Anything above 250ms starts to feel like the call is dropping. We were losing customers not because AIVA gave wrong answers, but because it felt slow — and in voice, feeling slow and being slow are the same thing.

What we rebuilt

The new stack collapses everything into a single inference pipeline we own end-to-end. Speech recognition, language understanding, response generation, and synthesis all run in one unified process instead of four API calls chained together. The round-trip that used to cross four network boundaries now crosses zero.

We also moved to regional deployment. The original AIVA ran from a single region in Mumbai. The rebuilt version runs in Mumbai, Frankfurt, and Virginia — with automatic routing to the closest region based on the caller's network path. European and North American deployments of our customers' voice agents now see latencies under 180ms. Mumbai customers see under 160ms.

The language pipeline is the part I'm most proud of. In AIVA 1.0, adding a language was a project. In AIVA 2.0, it's a config file and a model checkpoint. We've shipped six new languages since the rebuild launched — each in under two weeks — compared to six weeks minimum in the old system.

What it cost us

Four months of no new features. Two customers who couldn't wait and churned. A team that was exhausted by the end. And a difficult conversation with early investors who wanted to see the metrics moving.

We don't regret any of it. Every meaningful thing we've shipped in the last six months — analytics, the new language pipeline — was possible because of the 2.0 infrastructure. The rebuild was a six-month multiplier on every feature that came after it.

The lesson we'd pass on: if you're hitting performance ceilings in month 12, the ceiling is probably architectural, not algorithmic. You can't optimize your way out of a bad architecture. Sometimes the right call is to stop and rebuild.

The voice latency numbers today: 158ms average, 210ms p95. We shipped our first customer over 1 million calls last month with zero outages. That's what the rebuild bought us.

EngineeringVoiceML

Written by

Arjun Patel

Co-founder

The performance ceiling

What we rebuilt

What it cost us

The voice latency numbers today: 158ms average, 210ms p95. We shipped our first customer over 1 million calls last month with zero outages. That's what the rebuild bought us.

AIVA 2.0: the rebuild.

The performance ceiling

What we rebuilt

What it cost us

More from the team.

How we shaved 40ms off voice latency.

Rewriting the voice pipeline (and why we'd do it again).

Like this? Get more.

AIVA 2.0: the rebuild.

The performance ceiling

What we rebuilt

What it cost us

More from the team.

How we shaved 40ms off voice latency.

Rewriting the voice pipeline (and why we'd do it again).

Like this? Get more.