Say

The <Say> verb reads the supplied text back to the caller. It is useful for text that is difficult to pre-record. The gender and language in which the text will be read is customizable.

Verb attributes

language

string

The attribute language allows you to specify the dialect (language and locale) of voice. See below for all language specifications.

loop

integerDefaults to 1

The attribute loop specifies the number of times a text is to be repeated. If loop is set to 0, the text will be continuously repeated until the call is terminated.

voice

stringDefaults to woman

The attribute voice supports: man, woman, alice, Amazon Polly voices by prefixing them with Polly., Amazon Polly Neural voices by prefixing them with Polly. and ending them with -Neural, Google Cloud voices by prefixing them with gcloud.. Polly Neural and Google Wavenet voices are charged a premium price compared to Polly Standard and Google Standard voices. alice is deprecated and provided for backward compatibility. See below for language specifications on each of these voices.

Supported voices and languages

The supported voices and languages can be found here.

Nouns

The noun of a cXML verb is nested within the verb upon which the verb acts. <Say> has the following noun:

Noun	Description
`plain text`	The text that will be read to the caller. Limit: 4,096 unicode characters.

Speech synthesis markup language (SSML)

Speech Synthesis Markup Language (SSML) is an XML-based markup language that provides a standard way to mark up text for synthesized speech.

SSML is usually wrapped within <speak> tags. But, when using SSML with the <Say> verb, you can ignore those <speak> tags. The rest of the SSML tags will be placed inside the <Say> verb.

Below are the supported SSML tags. When using an Amazon Polly voice, please refer to Amazon Polly SSML Documentation instead.

Tag	Description
`<break>`	A pause in speech. Set the length of the pause with the `time` attribute. Maximum pause time is 10s. Include the unit `s` or `ms` when setting a `time`. The `strength` attribute can also be used for pauses. See below for possible values.
`<emphasis>`	Emphasize words or phrases. This tag changes the rate and volume of speech. More emphasis generates louder and slower speech while less emphasis generates quieter and faster speech. Emphasis can be modified with the `level` attribute. See below for possible values.
`<lang>`	Specify another language for specific words or phrases. Set the language with the `xml:lang` attribute. Possible languages are: `en-US`, `en-GB`, `en-IN`, `en-AU`, `en-CA`, `de-DE`, `es-ES`, `it-IT`, `ja-JP`, `fr-FR` (English, German, Spanish, Italian, Japanese, French).
`<p>`	Add a pause between paragraphs.
`<phoneme>`	Phonetic pronunciation for specified words or phrases. Set the phonetic alphabet to use with the `alphabet` attribute. See below for possible values. In addition, you can use the `ph` attribute to set the phonetic pronunciation to speak. See here for a list of supported symbols.
`<prosody>`	Modify the `volume`, `pitch`, and `rate` of the tagged speech.
`<s>`	Add a pause between sentences.
`<say-as>`	Describe how text should be interpreted. See below for all the possible values of the `interpret-as` attribute of the `<say-as>` tag.
`<sub>`	Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the `alias` attribute.

`strength` attribute

The strength attribute has the following values. Default is medium.

Value	Description
`none`	No pause. Can be used to remove a pause that would normally occur.
`x-weak`	No pause.
`weak`	Treat adjacent words as if separated by a single comma.
`medium`	Treat adjacent words as if separated by a single comma.
`strong`	Sentence break.
`x-strong`	Paragraph break.

`level` attribute

The level attribute has the following values. Default is moderate.

Value	Description
`strong`	Increase the volume and slow down the speaking rate. Speech is louder and slower.
`moderate`	Increase the volume and slow down the speaking rate, but not as much as `strong`.
`reduced`	Decrease the volume and speed up the speaking rate. Speech is softer and faster.

`alphabet` attribute

The alphabet attribute has the following values.

Value	Description
`ipa`	The International Phonetic Alphabet (IPA).
`x-sampa`	The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA).

`volume` attribute

The volume attribute has the following values. Set the volume with one of the values below. Then, you can specify a percentage to increase or decrease the volume of the speech. See here for more information.

Value	Description
`silent`	No volume.
`x-soft`	Lowest volume.
`soft`	Lower volume.
`medium`	Normal volume.
`loud`	Louder volume.
`x-loud`	Loudest volume.

`pitch` attribute

The pitch attribute has the following values. Set the pitch with one of the values below. Then, you can specify a percentage to increase or decrease the pitch of the speech. See here for more information.

Value	Description
`x-low`	Lowest pitch.
`low`	Lower pitch.
`medium`	Normal pitch.
`high`	Higher pitch.
`x-high`	Highest pitch.

`rate` attribute

The rate attribute has the following values. Set the rate with one of the values below. Then, you can specify a percentage to increase or decrease the speed of the speech. See here for more information.

Value	Description
`x-slow`	Slowest rate.
`slow`	Slower rate.
`medium`	Normal rate.
`fast`	Faster rate.
`x-fast`	Fastest rate.

`interpret-as` attribute

The interpret-as attribute has the following values.

Value	Description
`characters`	Spell out each letter.
`spell-out`	Spell out each letter.
`cardinal`	Interpret value as cardinal number.
`number`	Interpret value as cardinal number.
`ordinal`	Interpret value as ordinal number.
`digits`	Spell each digit separately.
`fraction`	Interpret value as fraction.
`unit`	Interpret value as measurement.
`date`	Interpret value as a date. Use `format` attribute to indicate format of date: `mdy`, `dmy`, `ymd`, `md`, `dm`, `ym`, `my`, `d`, `m`, `y`.
`time`	Interpret as a duration of minutes and seconds.
`telephone`	Interpret as telephone number.
`address`	Interpret as part of a street address.
`interjection`	Interpret as an interjection.
`expletive`	”Bleep” out content in tag.

Example

1 <Response>
2   <Say>
3     Welcome to SignalWire
4     <break strength="x-weak" time="100ms"/>
5     <emphasis level="moderate">Emphasized words</emphasis>
6     <p>Words in a paragraph</p>
7     <phoneme alphabet="x-sampa" ph="pÉªËˆkÉ‘Ën">Phonetic pronunciation</phoneme>
8     <prosody pitch="-10%" rate="85%" volume="-6dB">Words to speak</prosody>
9     <s>Words in a sentence.</s>
10     <say-as interpret-as="spell-out">Words</say-as>
11     <sub alias="alias">Words to be substituted</sub>
12   </Say>
13 </Response>

Here is an example of how to use some of the SSML tags within the Say verb.

Nesting

No other verbs can be nested within <Say>. However, <Say> can be nested within <Gather>.

Examples

A simple message to be read

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say>Hello World.</Say>
4 </Response>

‘Hello World’ will be read once in a male voice.

A simple message to be read using Amazon Polly voice

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say voice="Polly.Joanna">Hello World.</Say>
4 </Response>

‘Hello World’ will be read once using the Amazon Polly “Joanna” voice.

A simple message to be read using Amazon Polly Neural voice

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say voice="Polly.Joanna-Neural">Hello World.</Say>
4 </Response>

‘Hello World’ will be read once using the Amazon Polly “Joanna” Neural voice. Amazon Polly Neural voices are charged a premium price compared to Amazon Polly Standard voices.

A simple message to be read using Google Cloud text-to-speech voice

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say voice="gcloud.en-US-Standard-A">Hello World.</Say>
4 </Response>

‘Hello World’ will be read once using the Google Cloud text-to-speech en-US-Standard-A voice.

Repetition of a message in a foreign language

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say voice="alice" language="fr-CA" loop="5">Bonjour.</Say>
4 </Response>

‘Hello’ will be repeated 5 times in Canadian French.

Notes on usage

There is a 4,096 Unicode character limit on the text
Numbers are spoken, or read, based on context. For example, ‘234’ is read as “two hundred thirty-four”, whereas ‘2 3 4’ is read as “two three four”.
Short pauses in spoken text are accomplished by inserting punctuations, i.e. commas and periods, in the written text. For longer pauses, place text in a separate <Say> verbs and place a <Pause> verb in between them.
Dates, times, money amounts, and abbreviations may not follow intuitive pronunciations. Test these situations to ensure they are pronounced to your liking.

*Twilio and TwiML are trademarks of Twilio, Inc. SignalWire, Inc. and its products are not affiliated with or endorsed by Twilio, Inc.

The <Say> verb reads the supplied text back to the caller. It is useful for text that is difficult to pre-record. The gender and language in which the text will be read is customizable.

Verb attributes

language

string

The attribute language allows you to specify the dialect (language and locale) of voice. See below for all language specifications.

loop

integerDefaults to 1

The attribute loop specifies the number of times a text is to be repeated. If loop is set to 0, the text will be continuously repeated until the call is terminated.

voice

stringDefaults to woman

Supported voices and languages

The supported voices and languages can be found here.

Nouns

The noun of a cXML verb is nested within the verb upon which the verb acts. <Say> has the following noun:

Noun	Description
`plain text`	The text that will be read to the caller. Limit: 4,096 unicode characters.

Speech synthesis markup language (SSML)

Speech Synthesis Markup Language (SSML) is an XML-based markup language that provides a standard way to mark up text for synthesized speech.

SSML is usually wrapped within <speak> tags. But, when using SSML with the <Say> verb, you can ignore those <speak> tags. The rest of the SSML tags will be placed inside the <Say> verb.

Below are the supported SSML tags. When using an Amazon Polly voice, please refer to Amazon Polly SSML Documentation instead.

Tag	Description
`<break>`	A pause in speech. Set the length of the pause with the `time` attribute. Maximum pause time is 10s. Include the unit `s` or `ms` when setting a `time`. The `strength` attribute can also be used for pauses. See below for possible values.
`<emphasis>`	Emphasize words or phrases. This tag changes the rate and volume of speech. More emphasis generates louder and slower speech while less emphasis generates quieter and faster speech. Emphasis can be modified with the `level` attribute. See below for possible values.
`<lang>`	Specify another language for specific words or phrases. Set the language with the `xml:lang` attribute. Possible languages are: `en-US`, `en-GB`, `en-IN`, `en-AU`, `en-CA`, `de-DE`, `es-ES`, `it-IT`, `ja-JP`, `fr-FR` (English, German, Spanish, Italian, Japanese, French).
`<p>`	Add a pause between paragraphs.
`<phoneme>`	Phonetic pronunciation for specified words or phrases. Set the phonetic alphabet to use with the `alphabet` attribute. See below for possible values. In addition, you can use the `ph` attribute to set the phonetic pronunciation to speak. See here for a list of supported symbols.
`<prosody>`	Modify the `volume`, `pitch`, and `rate` of the tagged speech.
`<s>`	Add a pause between sentences.
`<say-as>`	Describe how text should be interpreted. See below for all the possible values of the `interpret-as` attribute of the `<say-as>` tag.
`<sub>`	Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the `alias` attribute.

`strength` attribute

The strength attribute has the following values. Default is medium.

Value	Description
`none`	No pause. Can be used to remove a pause that would normally occur.
`x-weak`	No pause.
`weak`	Treat adjacent words as if separated by a single comma.
`medium`	Treat adjacent words as if separated by a single comma.
`strong`	Sentence break.
`x-strong`	Paragraph break.

`level` attribute

The level attribute has the following values. Default is moderate.

Value	Description
`strong`	Increase the volume and slow down the speaking rate. Speech is louder and slower.
`moderate`	Increase the volume and slow down the speaking rate, but not as much as `strong`.
`reduced`	Decrease the volume and speed up the speaking rate. Speech is softer and faster.

`alphabet` attribute

The alphabet attribute has the following values.

Value	Description
`ipa`	The International Phonetic Alphabet (IPA).
`x-sampa`	The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA).

`volume` attribute

Value	Description
`silent`	No volume.
`x-soft`	Lowest volume.
`soft`	Lower volume.
`medium`	Normal volume.
`loud`	Louder volume.
`x-loud`	Loudest volume.

`pitch` attribute

Value	Description
`x-low`	Lowest pitch.
`low`	Lower pitch.
`medium`	Normal pitch.
`high`	Higher pitch.
`x-high`	Highest pitch.

`rate` attribute

Value	Description
`x-slow`	Slowest rate.
`slow`	Slower rate.
`medium`	Normal rate.
`fast`	Faster rate.
`x-fast`	Fastest rate.

`interpret-as` attribute

The interpret-as attribute has the following values.

Value	Description
`characters`	Spell out each letter.
`spell-out`	Spell out each letter.
`cardinal`	Interpret value as cardinal number.
`number`	Interpret value as cardinal number.
`ordinal`	Interpret value as ordinal number.
`digits`	Spell each digit separately.
`fraction`	Interpret value as fraction.
`unit`	Interpret value as measurement.
`date`	Interpret value as a date. Use `format` attribute to indicate format of date: `mdy`, `dmy`, `ymd`, `md`, `dm`, `ym`, `my`, `d`, `m`, `y`.
`time`	Interpret as a duration of minutes and seconds.
`telephone`	Interpret as telephone number.
`address`	Interpret as part of a street address.
`interjection`	Interpret as an interjection.
`expletive`	”Bleep” out content in tag.

Example

1 <Response>
2   <Say>
3     Welcome to SignalWire
4     <break strength="x-weak" time="100ms"/>
5     <emphasis level="moderate">Emphasized words</emphasis>
6     <p>Words in a paragraph</p>
7     <phoneme alphabet="x-sampa" ph="pÉªËˆkÉ‘Ën">Phonetic pronunciation</phoneme>
8     <prosody pitch="-10%" rate="85%" volume="-6dB">Words to speak</prosody>
9     <s>Words in a sentence.</s>
10     <say-as interpret-as="spell-out">Words</say-as>
11     <sub alias="alias">Words to be substituted</sub>
12   </Say>
13 </Response>

Here is an example of how to use some of the SSML tags within the Say verb.

Nesting

No other verbs can be nested within <Say>. However, <Say> can be nested within <Gather>.

Examples

A simple message to be read

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say>Hello World.</Say>
4 </Response>

‘Hello World’ will be read once in a male voice.

A simple message to be read using Amazon Polly voice

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say voice="Polly.Joanna">Hello World.</Say>
4 </Response>

‘Hello World’ will be read once using the Amazon Polly “Joanna” voice.

A simple message to be read using Amazon Polly Neural voice

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say voice="Polly.Joanna-Neural">Hello World.</Say>
4 </Response>

‘Hello World’ will be read once using the Amazon Polly “Joanna” Neural voice. Amazon Polly Neural voices are charged a premium price compared to Amazon Polly Standard voices.

A simple message to be read using Google Cloud text-to-speech voice

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say voice="gcloud.en-US-Standard-A">Hello World.</Say>
4 </Response>

‘Hello World’ will be read once using the Google Cloud text-to-speech en-US-Standard-A voice.

Repetition of a message in a foreign language

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3      <Say voice="alice" language="fr-CA" loop="5">Bonjour.</Say>
4 </Response>

‘Hello’ will be repeated 5 times in Canadian French.

Notes on usage

There is a 4,096 Unicode character limit on the text
Numbers are spoken, or read, based on context. For example, ‘234’ is read as “two hundred thirty-four”, whereas ‘2 3 4’ is read as “two three four”.
Short pauses in spoken text are accomplished by inserting punctuations, i.e. commas and periods, in the written text. For longer pauses, place text in a separate <Say> verbs and place a <Pause> verb in between them.
Dates, times, money amounts, and abbreviations may not follow intuitive pronunciations. Test these situations to ensure they are pronounced to your liking.

*Twilio and TwiML are trademarks of Twilio, Inc. SignalWire, Inc. and its products are not affiliated with or endorsed by Twilio, Inc.

1	<Response>
2	<Say>
3	Welcome to SignalWire
4	<break strength="x-weak" time="100ms"/>
5	<emphasis level="moderate">Emphasized words</emphasis>
6	<p>Words in a paragraph</p>
7	<phoneme alphabet="x-sampa" ph="pÉªËˆkÉ‘Ën">Phonetic pronunciation</phoneme>
8	<prosody pitch="-10%" rate="85%" volume="-6dB">Words to speak</prosody>
9	<s>Words in a sentence.</s>
10	<say-as interpret-as="spell-out">Words</say-as>
11	<sub alias="alias">Words to be substituted</sub>
12	</Say>
13	</Response>

1	<?xml version="1.0" encoding="UTF-8"?>
2	<Response>
3	<Say>Hello World.</Say>
4	</Response>

1	<?xml version="1.0" encoding="UTF-8"?>
2	<Response>
3	<Say voice="Polly.Joanna">Hello World.</Say>
4	</Response>

1	<?xml version="1.0" encoding="UTF-8"?>
2	<Response>
3	<Say voice="Polly.Joanna-Neural">Hello World.</Say>
4	</Response>

1	<?xml version="1.0" encoding="UTF-8"?>
2	<Response>
3	<Say voice="gcloud.en-US-Standard-A">Hello World.</Say>
4	</Response>

1	<?xml version="1.0" encoding="UTF-8"?>
2	<Response>
3	<Say voice="alice" language="fr-CA" loop="5">Bonjour.</Say>
4	</Response>

Verb attributes

language

loop

voice

Supported voices and languages

Nouns

Speech synthesis markup language (SSML)

strength attribute

level attribute

alphabet attribute

volume attribute

pitch attribute

rate attribute

interpret-as attribute

Example

Nesting

Examples

A simple message to be read

A simple message to be read using Amazon Polly voice

A simple message to be read using Amazon Polly Neural voice

A simple message to be read using Google Cloud text-to-speech voice

Repetition of a message in a foreign language

Notes on usage

Verb attributes

language

loop

voice

Supported voices and languages

Nouns

Speech synthesis markup language (SSML)

strength attribute

level attribute

alphabet attribute

volume attribute

pitch attribute

rate attribute

interpret-as attribute

Example

Nesting

Examples

A simple message to be read

A simple message to be read using Amazon Polly voice

A simple message to be read using Amazon Polly Neural voice

A simple message to be read using Google Cloud text-to-speech voice

Repetition of a message in a foreign language

Notes on usage

`strength` attribute

`level` attribute

`alphabet` attribute

`volume` attribute

`pitch` attribute

`rate` attribute

`interpret-as` attribute

`strength` attribute

`level` attribute

`alphabet` attribute

`volume` attribute

`pitch` attribute

`rate` attribute

`interpret-as` attribute