*** id: ca0bbd75-cf3f-49c9-a0d2-b4c077dcdd57 title: Voices & Languages sidebar-title: Voices & Languages position: 4 slug: /python/guides/voice-language max-toc-depth: 3 subtitle: >- Configure Text-to-Speech voices, languages, and pronunciation to create natural-sounding agents. ------------------------ ### Overview #### Language Configuration | Parameter | Description | Example | | --------- | --------------------- | ------------------------------------------------------- | | `name` | Human-readable name | `"English"` | | `code` | Language code for STT | `"en-US"` | | `voice` | TTS voice identifier | `"rime.spore"` or `"elevenlabs.josh:eleven_turbo_v2_5"` | #### Fillers (Natural Speech) | Parameter | Description | Example | | ------------------ | --------------------------------------- | -------------------------------------- | | `speech_fillers` | Used during natural conversation pauses | `["Um", "Well", "So"]` | | `function_fillers` | Used while executing a function | `["Let me check...", "One moment..."]` | ### Adding a Language #### Basic Configuration ```python from signalwire_agents import AgentBase class MyAgent(AgentBase): def __init__(self): super().__init__(name="my-agent") # Basic language setup self.add_language( name="English", # Display name code="en-US", # Language code for STT voice="rime.spore" # TTS voice ) ``` #### Voice Format The voice parameter uses the format `engine.voice:model` where model is optional: ```python ## Simple voice (engine.voice) self.add_language("English", "en-US", "rime.spore") ## With model (engine.voice:model) self.add_language("English", "en-US", "elevenlabs.josh:eleven_turbo_v2_5") ``` ### Available TTS Engines | Provider | Engine Code | Example Voice | Reference | | --------------- | ------------ | ----------------------------------------------- | -------------------------------------------------------- | | Amazon Polly | `amazon` | `amazon.Joanna-Neural` | [Voice IDs](/docs/platform/voice/tts/amazon-polly#usage) | | Cartesia | `cartesia` | `cartesia.a167e0f3-df7e-4d52-a9c3-f949145efdab` | [Voice IDs](/docs/platform/voice/tts/cartesia#usage) | | Deepgram | `deepgram` | `deepgram.aura-asteria-en` | [Voice IDs](/docs/platform/voice/tts/deepgram) | | ElevenLabs | `elevenlabs` | `elevenlabs.thomas` | [Voice IDs](/docs/platform/voice/tts/elevenlabs#usage) | | Google Cloud | `gcloud` | `gcloud.en-US-Casual-K` | [Voice IDs](/docs/platform/voice/tts/gcloud#usage) | | Microsoft Azure | `azure` | `azure.en-US-AvaNeural` | [Voice IDs](/docs/platform/voice/tts/azure#usage) | | OpenAI | `openai` | `openai.alloy` | [Voice IDs](/docs/platform/voice/tts/openai#voices) | | Rime | `rime` | `rime.luna:arcana` | [Voice IDs](/docs/platform/voice/tts/rime#voices) | ### Filler Phrases Add natural pauses and filler words: ```python self.add_language( name="English", code="en-US", voice="rime.spore", speech_fillers=[ "Um", "Well", "Let me think", "So" ], function_fillers=[ "Let me check that for you", "One moment please", "I'm looking that up now", "Bear with me" ] ) ``` **Speech fillers**: Used during natural conversation pauses **Function fillers**: Used while the AI is executing a function ### Multi-Language Support Use `code="multi"` for automatic language detection and matching: ```python class MultilingualAgent(AgentBase): def __init__(self): super().__init__(name="multilingual-agent") # Multi-language support (auto-detects and matches caller's language) self.add_language( name="Multilingual", code="multi", voice="rime.spore" ) self.prompt_add_section( "Language", "Automatically detect and match the caller's language without " "prompting or asking them to verify. Respond naturally in whatever " "language they speak." ) ``` The `multi` code supports: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. **Note**: Speech recognition hints do not work when using `code="multi"`. If you need hints for specific terms, use individual language codes instead. For more control over individual languages with custom fillers: ```python class CustomMultilingualAgent(AgentBase): def __init__(self): super().__init__(name="custom-multilingual") # English (primary) self.add_language( name="English", code="en-US", voice="rime.spore", speech_fillers=["Um", "Well", "So"], function_fillers=["Let me check that"] ) # Spanish self.add_language( name="Spanish", code="es-MX", voice="rime.luna", speech_fillers=["Eh", "Pues", "Bueno"], function_fillers=["Dejame verificar", "Un momento"] ) # French self.add_language( name="French", code="fr-FR", voice="rime.claire", speech_fillers=["Euh", "Alors", "Bon"], function_fillers=["Laissez-moi verifier", "Un instant"] ) self.prompt_add_section( "Language", "Automatically detect and match the caller's language without " "prompting or asking them to verify." ) ``` ### Pronunciation Rules Fix pronunciation of specific words: ```python class AgentWithPronunciation(AgentBase): def __init__(self): super().__init__(name="pronunciation-agent") self.add_language("English", "en-US", "rime.spore") # Fix brand names self.add_pronunciation( replace="ACME", with_text="Ack-me" ) # Fix technical terms self.add_pronunciation( replace="SQL", with_text="sequel" ) # Case-insensitive matching self.add_pronunciation( replace="api", with_text="A P I", ignore_case=True ) # Fix names self.add_pronunciation( replace="Nguyen", with_text="win" ) ``` ### Set Multiple Pronunciations ```python ## Set all pronunciations at once self.set_pronunciations([ {"replace": "ACME", "with": "Ack-me"}, {"replace": "SQL", "with": "sequel"}, {"replace": "API", "with": "A P I", "ignore_case": True}, {"replace": "CEO", "with": "C E O"}, {"replace": "ASAP", "with": "A sap"} ]) ``` ### Voice Selection Guide Choosing the right TTS engine and voice significantly impacts caller experience. Consider these factors: #### Use Case Recommendations | Use Case | Recommended Voice Style | | ----------------- | ----------------------------------------- | | Customer Service | Warm, friendly (`rime.spore`) | | Technical Support | Clear, professional (`rime.marsh`) | | Sales | Energetic, persuasive (elevenlabs voices) | | Healthcare | Calm, reassuring | | Legal/Finance | Formal, authoritative | #### TTS Engine Comparison | Engine | Latency | Quality | Cost | Best For | | ---------------- | --------- | --------- | ------ | ------------------------------ | | **Rime** | Very fast | Good | Low | Production, low-latency needs | | **ElevenLabs** | Medium | Excellent | Higher | Premium experiences, emotion | | **Google Cloud** | Medium | Very good | Medium | Multilingual, SSML features | | **Amazon Polly** | Fast | Good | Low | AWS integration, Neural voices | | **OpenAI** | Medium | Excellent | Medium | Natural conversation style | | **Azure** | Medium | Very good | Medium | Microsoft ecosystem | | **Deepgram** | Fast | Good | Medium | Speech-focused applications | | **Cartesia** | Fast | Good | Medium | Specialized voices | #### Choosing an Engine **Prioritize latency (Rime, Polly, Deepgram):** * Interactive conversations where quick response matters * High-volume production systems * Cost-sensitive deployments **Prioritize quality (ElevenLabs, OpenAI):** * Premium customer experiences * Brand-sensitive applications * When voice quality directly impacts business outcomes **Prioritize features (Google Cloud, Azure):** * Need SSML for fine-grained control * Complex multilingual requirements * Specific enterprise integrations #### Testing and Evaluation Process Before selecting a voice for production: 1. **Create test content** with domain-specific terms, company names, and typical phrases 2. **Test multiple candidates** from your shortlisted engines 3. **Evaluate each voice:** * Pronunciation accuracy (especially brand names) * Natural pacing and rhythm * Emotional appropriateness * Handling of numbers, dates, prices 4. **Test with real users** if possible—internal team members or beta callers 5. **Measure latency** in your deployment environment #### Voice Personality Considerations **Match voice to brand:** * Formal brands → authoritative, measured voices * Friendly brands → warm, conversational voices * Tech brands → clear, modern-sounding voices **Consider your audience:** * Older demographics may prefer clearer, slower voices * Technical audiences tolerate more complex terminology * Regional preferences may favor certain accents **Test edge cases:** * Long monologues (product descriptions) * Lists and numbers (order details, account numbers) * Emotional content (apologies, celebrations) ### Dynamic Voice Selection Change voice based on context: ```python class DynamicVoiceAgent(AgentBase): DEPARTMENT_VOICES = { "support": {"voice": "rime.spore", "name": "Alex"}, "sales": {"voice": "rime.marsh", "name": "Jordan"}, "billing": {"voice": "rime.coral", "name": "Morgan"} } def __init__(self): super().__init__(name="dynamic-voice") def on_swml_request(self, request_data=None, callback_path=None, request=None): # Determine department from called number call_data = (request_data or {}).get("call", {}) called_num = call_data.get("to", "") if "555-1000" in called_num: dept = "support" elif "555-2000" in called_num: dept = "sales" else: dept = "billing" config = self.DEPARTMENT_VOICES[dept] self.add_language("English", "en-US", config["voice"]) self.prompt_add_section( "Role", f"You are {config['name']}, a {dept} representative." ) ``` ### Language Codes Reference Supported language codes: | Language | Codes | | ------------ | ------------------------------------------------------------------------------------------------ | | Multilingual | `multi` (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, Dutch) | | Bulgarian | `bg` | | Czech | `cs` | | Danish | `da`, `da-DK` | | Dutch | `nl` | | English | `en`, `en-US`, `en-AU`, `en-GB`, `en-IN`, `en-NZ` | | Finnish | `fi` | | French | `fr`, `fr-CA` | | German | `de` | | Hindi | `hi` | | Hungarian | `hu` | | Indonesian | `id` | | Italian | `it` | | Japanese | `ja` | | Korean | `ko`, `ko-KR` | | Norwegian | `no` | | Polish | `pl` | | Portuguese | `pt`, `pt-BR`, `pt-PT` | | Russian | `ru` | | Spanish | `es`, `es-419` | | Swedish | `sv`, `sv-SE` | | Turkish | `tr` | | Ukrainian | `uk` | | Vietnamese | `vi` | ### Complete Voice Configuration Example ```python from signalwire_agents import AgentBase class FullyConfiguredVoiceAgent(AgentBase): def __init__(self): super().__init__(name="voice-configured") # Primary language with all options self.add_language( name="English", code="en-US", voice="rime.spore", speech_fillers=[ "Um", "Well", "Let me see", "So" ], function_fillers=[ "Let me look that up for you", "One moment while I check", "I'm searching for that now", "Just a second" ] ) # Secondary language self.add_language( name="Spanish", code="es-MX", voice="rime.luna", speech_fillers=["Pues", "Bueno"], function_fillers=["Un momento", "Dejame ver"] ) # Pronunciation fixes self.set_pronunciations([ {"replace": "ACME", "with": "Ack-me"}, {"replace": "www", "with": "dub dub dub"}, {"replace": ".com", "with": "dot com"}, {"replace": "@", "with": "at"} ]) self.prompt_add_section( "Role", "You are a friendly customer service agent." ) ```