languages | SignalWire

Use ai.languages to configure the spoken language of your AI Agent, as well as the TTS engine, voice, and fillers.

Properties

ai.languages

object[]

An array of objects that accept the following properties.

languages[].name

stringRequired

Name of the language (“French”, “English”, etc). This value is used in the system prompt to instruct the LLM what language is being spoken.

languages[].code

stringRequired

Set the language code for ASR (Automatic Speech Recognition) (STT (Speech-to-text)) purposes. By default, SignalWire uses Deepgram’s Nova-3 STT engine, so this value should match a code from Deepgram’s Nova-3 language codes table.

If a different STT model was selected using the openai_asr_engine parameter, you must select a code supported by that engine.

languages[].voice

stringRequired

String format: <engine id>.<voice id>. Select engine from gcloud, polly, elevenlabs, deepgram, cartesia, rime, inworld, or minimax. Select voice from TTS provider reference. For example, "gcloud.fr-FR-Neural2-B". See voice usage for more details.

languages[].emotion

stringDefaults to None

Enables automatic emotion for the set TTS engine. This allows the AI to express emotions when speaking. A global emotion or specific emotions for certain topics can be set within the prompt of the AI. Valid values: auto

Only works with the Cartesia and MiniMax TTS engines. For a fixed MiniMax emotion, use params.emotion instead.

languages[].function_fillers

string[]Defaults to None

An array of strings to be used as fillers in the conversation when the agent is calling a SWAIG function. The filler is played asynchronously during the function call.

languages[].model

stringDefaults to None

The model to use for the specified TTS engine (e.g. arcana). Check the TTS provider reference for the available models.

languages[].speech_fillers

string[]Defaults to None

An array of strings to be used as fillers in the conversation. This helps the AI break silence between responses.

speech_fillers are used between every ‘turn’ taken by the LLM, including at the beginning of the call. For more targed fillers, consider using function_fillers.

languages[].speed

stringDefaults to None

The speed to use for the specified TTS engine. This allows the AI to speak at a different speed at different points in the conversation. The speed behavior can be defined in the prompt of the AI. Valid values: auto

Only works with Cartesia TTS engine.

languages[].params

objectDefaults to None

TTS engine-specific parameters for this language.

params.similarity

numberDefaults to 0.75

The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. The higher the similarity, the closer the AI will sound to the original voice. Valid values range from 0.0 to 1.0.

Only works with the ElevenLabs TTS engine.

params.stability

numberDefaults to 0.50

The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. Valid values range from 0.0 to 1.0.

Only works with the ElevenLabs TTS engine.

params.speakingRate

numberDefaults to 1.0

Adjusts how quickly the voice speaks. Values below 1.0 slow the voice down; values above 1.0 speed it up. Valid values range from 0.5 to 1.5.

Only works with the Inworld TTS engine.

params.temperature

numberDefaults to 1.0

Controls the randomness and expressiveness of the generated speech. Lower values produce a more consistent, predictable delivery; higher values introduce more variation. Valid values range from 0.0 to 2.0.

Only works with the Inworld TTS engine.

params.speed

numberDefaults to 1.0

How quickly the voice speaks. Values below 1.0 slow the voice down; values above 1.0 speed it up. Valid values range from 0.5 to 2.0.

Only works with the MiniMax TTS engine.

params.vol

numberDefaults to 1.0

The speaking volume. Lower values are quieter. Valid values range from 0.1 to 1.0.

Only works with the MiniMax TTS engine.

params.pitch

integerDefaults to 0

The pitch shift in semitones. Negative values lower the pitch; positive values raise it. Valid values range from -12 to 12.

Only works with the MiniMax TTS engine.

params.emotion

string

A fixed emotional tone for the generated speech. Valid values are happy, sad, angry, fearful, disgusted, surprised, and neutral. To vary the emotion automatically during a conversation, use languages[].emotion set to auto instead.

Only works with the MiniMax TTS engine.

languages[].fillers

string[]Defaults to NoneDeprecated

An array of strings to be used as fillers in the conversation and when the agent is calling a SWAIG function. Deprecated: Use speech_fillers and function_fillers instead.

languages[].engine

stringDefaults to gcloudDeprecated

The engine to use for the language. For example, "elevenlabs". Deprecated. Set the engine with the voice parameter.

Use `voice` strings

Compose the voice string using the <engine id>.<voice id> syntax.

First, select your engine using the gcloud, polly, elevenlabs, deepgram, cartesia, rime, inworld, or minimax identifier. Append a period (.), and then the specific voice ID (for example, en-US-Casual-K) from the TTS provider. Refer to SignalWire’s Supported Voices and Languages for guides on configuring voice ID strings for each provider.

Supported voices and languages

SignalWire’s cloud platform integrates with leading text-to-speech providers. For a comprehensive list of supported engines, languages, and voices, refer to our documentation on Supported Voices and Languages.

Examples

Set a single language

SWML will automatically assign the language (and other required parameters) to the defaults in the above table if left unset. This example uses ai.language to configure a specific English-speaking voice from ElevenLabs.

1 languages:
2   - name: English
3     code: en-US
4     voice: elevenlabs.rachel
5     speech_fillers:
6       - one moment please,
7       - hmm...
8       - let's see,

Set multiple languages

SWML will automatically assign the language (and other required parameters) to the defaults in the above table if left unset. This example uses ai.language to configure multiple languages using different TTS engines.

1 languages:
2   - name: Mandarin
3     code: cmn-TW
4     voice: gcloud.cmn-TW-Standard-A
5   - name: English
6     code: en-US
7     voice: elevenlabs.rachel

Configure per-language ElevenLabs parameters

Configure different stability and similarity values for each language using languages[].params:

1 ai:
2   languages:
3     - name: English
4       code: en-US
5       voice: elevenlabs.josh
6       params:
7         stability: 0.6
8         similarity: 0.8
9     - name: Spanish
10       code: es-ES
11       voice: elevenlabs.maria
12       params:
13         stability: 0.4
14         similarity: 0.9

Properties

ai.languages

languages[].name

languages[].code

languages[].voice

languages[].emotion

languages[].function_fillers

languages[].model

languages[].speech_fillers

languages[].speed

languages[].params

params.similarity

params.stability

params.speakingRate

params.temperature

params.speed

params.vol

params.pitch

params.emotion

languages[].fillers

languages[].engine

Use voice strings

Supported voices and languages

Examples

Set a single language

Set multiple languages

Configure per-language ElevenLabs parameters

Use `voice` strings