TTS

Burki + Cartesia

Sonic-fast TTS purpose-built for real-time voice agents.

Cartesia's Sonic model targets the latency frontier — sub-100ms first-byte streaming TTS that pairs beautifully with Deepgram + a fast LLM for end-to-end response under 600ms. Burki ships native Sonic support as a first-class TTS adapter.

Why connect Cartesia to Burki?

A voice AI assistant is only useful when it can connect to the rest of your stack. The Cartesia integration helps Burki fit into the systems your team already uses for calling, transcription, reasoning, speech, routing, or customer data. That means the assistant can move from a demo into a production workflow without forcing you to replace your existing tools.

Burki keeps the integration flexible. You can run one assistant with Cartesia, pair it with other providers for a full voice pipeline, and change providers later if your cost, latency, coverage, or quality requirements change. This is especially useful for teams that need different settings per assistant, region, campaign, or customer segment.

The result is a voice agent that is easier to operate: fewer hard-coded assumptions, clearer provider boundaries, and a pricing model that separates Burki's platform fee from the third-party services you already trust.

What you get with Burki + Cartesia

Sub-100ms streaming TTS via WebSocket
Voice cloning and multilingual support
Cheaper per-minute than ElevenLabs at comparable quality
Pronunciation overrides for product/brand vocabulary
Optimized for the Burki sub-second response target

When to choose Cartesia

Choose Cartesia when latency is the ceiling on user experience and you can trade a small amount of voice character for speed. Choose ElevenLabs when voice quality and emotion are the differentiator.

How to configure

# .env
CARTESIA_API_KEY=...

# Per-assistant config:
# TTS -> Provider: Cartesia
# Model: sonic-english
# Voice: 79f8b5fb-2cc8-479a-80df-29f7a7cf1a3e

For full setup, see the docs.

Pricing

Burki: $0.03/min platform fee. Cartesia: per-character pricing, typical voice agent ~$0.03/min. Significantly cheaper than ElevenLabs while delivering best-in-class latency. BYO mode passes Cartesia rates through unchanged.

Pairs well with

STT

Deepgram

The fastest, most accurate STT for production voice agents.

Telephony

Twilio

The default Burki telephony stack — proven, global, programmable.

Telephony

Telnyx

Carrier-grade telephony at lower per-minute cost than Twilio.

LLM

Groq

The fastest LLM inference on the market.

Ship a voice agent on Cartesia today

200 free minutes. No credit card. Five-minute setup.

Start Free