Voices & Languages
Configure Text-to-Speech voices, languages, and pronunciation to create natural-sounding agents.
Overview
Language Configuration
Fillers (Natural Speech)
Adding a Language
Basic Configuration
Voice Format
The voice parameter uses the format engine.voice:model where model is optional:
Available TTS Engines
Filler Phrases
Add natural pauses and filler words:
Speech fillers: Used during natural conversation pauses
Function fillers: Used while the AI is executing a function
Multi-Language Support
Use code="multi" for automatic language detection and matching:
The multi code supports: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch.
Note: Speech recognition hints do not work when using code="multi". If you need hints for specific terms, use individual language codes instead.
For more control over individual languages with custom fillers:
Pronunciation Rules
Fix pronunciation of specific words:
Set Multiple Pronunciations
Voice Selection Guide
Choosing the right TTS engine and voice significantly impacts caller experience. Consider these factors:
Use Case Recommendations
TTS Engine Comparison
Choosing an Engine
Prioritize latency (Rime, Polly, Deepgram):
- Interactive conversations where quick response matters
- High-volume production systems
- Cost-sensitive deployments
Prioritize quality (ElevenLabs, OpenAI):
- Premium customer experiences
- Brand-sensitive applications
- When voice quality directly impacts business outcomes
Prioritize features (Google Cloud, Azure):
- Need SSML for fine-grained control
- Complex multilingual requirements
- Specific enterprise integrations
Testing and Evaluation Process
Before selecting a voice for production:
- Create test content with domain-specific terms, company names, and typical phrases
- Test multiple candidates from your shortlisted engines
- Evaluate each voice:
- Pronunciation accuracy (especially brand names)
- Natural pacing and rhythm
- Emotional appropriateness
- Handling of numbers, dates, prices
- Test with real users if possible—internal team members or beta callers
- Measure latency in your deployment environment
Voice Personality Considerations
Match voice to brand:
- Formal brands → authoritative, measured voices
- Friendly brands → warm, conversational voices
- Tech brands → clear, modern-sounding voices
Consider your audience:
- Older demographics may prefer clearer, slower voices
- Technical audiences tolerate more complex terminology
- Regional preferences may favor certain accents
Test edge cases:
- Long monologues (product descriptions)
- Lists and numbers (order details, account numbers)
- Emotional content (apologies, celebrations)
Dynamic Voice Selection
Change voice based on context:
Language Codes Reference
Supported language codes: