Text to Speech (TTS) lets your voice application speak dynamic prompts to callers without pre-recorded audio, which is useful for customer service greetings, announcements, and menu experiences that need to change frequently. This guide shows how to play TTS with the SignalWire RELAY Realtime SDK using the playTTS method, including choosing synthesized voices and languages, and using Speech Synthesis Markup Language (SSML) for finer control over pronunciation, cadence, and emphasis.
Text-to-Speech with SignalWire RELAY Realtime SDK
Programmable voice technology allows you to expand your customer service operations: greeting guests, providing routine assistance, or directing callers to relevant agents and associates. Synthesized voice, also known as text-to-speech (TTS), empowers businesses with scalable support systems that can reach diverse audiences at any time of day.
We recently discussed text-to-speech using our compatibility API, but with SignalWire, there is more than one method to implement this tool. In this post, we’ll go over how to use TTS with our RELAY Realtime SDK using the playTTS method. If you’d prefer a video format, you can watch that below.
RELAY Realtime SDK: Voice
Drafting a programmable voice application in Node.js that makes and receives voice calls–to and from both traditional PSTN numbers and SIP endpoints–is quick and easy when using the RELAY Realtime SDK. Our RELAY APIs use WebSocket technology, which allow for simultaneous and bi-directional data transmission. Using the RELAY Realtime SDK, you too can deploy low latency voice applications, featuring text-to-speech, using a simple script.
Check out our developer documentation for instructions on how to install the Realtime SDK, and take a glance at our Realtime API guides for tips on configuring your SignalWire phone numbers for use with your RELAY application.
Reading a message
For the purposes of this blog, we’ll focus specifically on the playTTS method. As the name implies, the playTTS method enables you to play text-to-speech to the caller.
A diversity of languages
SignalWire can tap into the synthesized voices provided by Amazon Polly and Google Cloud to intuitively customize the gender and language of your text-to-speech. Here, you can augment your synthesized voice to address an audience that speaks European Portuguese.
Speech synthesis markup language (SSML)
SignalWire’s RELAY Realtime SDK can also employ the Speech Synthesis Markup Language for more granular control over how our synthesized voice is presented to the caller.
By wrapping your text in the <speak> tag, you can make use of subsequent SSML tags to alter the cadence and pronunciation, among other parameters, of your text-to-speech. See Amazon Polly or Google Cloud documentation for supported tags.
What will SignalWire’s inspire you to develop? If you have any questions while getting started with our programmable Voice API, stop by our Community Slack or our Forum to connect with our team.
Frequently asked questions
What is Text to Speech (TTS) in a voice application?
Text to Speech (TTS) converts text into spoken audio during a live call, so your application can generate prompts dynamically instead of relying only on prerecorded files.
How do you play Text to Speech (TTS) using the SignalWire RELAY Realtime SDK?
Use the playTTS method to speak a text string to the caller, which lets you generate and play prompts on demand inside your call handling logic.
How do you change the language or voice for Text to Speech (TTS)?
SignalWire can use synthesized voices from providers like Amazon Polly and Google Cloud, allowing you to select voices and languages that fit your audience.
What is Speech Synthesis Markup Language (SSML), and when should you use it?
Speech Synthesis Markup Language (SSML) is markup you add around TTS text, such as a <speak> wrapper and additional tags, to control delivery details like cadence and pronunciation.
Can the RELAY Realtime SDK handle calls to phone numbers and Session Initiation Protocol (SIP) endpoints?
Yes, the RELAY Realtime SDK supports building Node.js voice applications that make and receive calls to Public Switched Telephone Network (PSTN) numbers and Session Initiation Protocol (SIP) endpoints.