Programmable voice technology allows you to expand your customer service operations: greeting guests, providing routine assistance, or directing callers to relevant agents and associates. Synthesized voice, also known as text-to-speech, empowers businesses with scalable support systems that can reach diverse audiences at any time of day.
We recently discussed text-to-speech using our compatibility API, but with SignalWire, there is more than one method to implement this tool. In this post, we’ll go over how to use TTS with our RELAY Realtime SDK using the playTTS method. If you’d prefer a video format, you can watch that here.
RELAY Realtime SDK: Voice
Drafting an application in Node.js that makes and receives voice calls–to and from both traditional PSTN numbers and SIP endpoints–is quick and easy when using the RELAY Realtime SDK. Our RELAY APIs use WebSocket technology, which allow for simultaneous and bi-directional data transmission. Using the RELAY Realtime SDK, you too can deploy low latency telecom applications, featuring text-to-speech, using a simple script.
Check out our developer documentation for instructions on how to install the Realtime SDK, and take a glance at our Realtime API guides for tips on configuring your SignalWire phone numbers for use with your RELAY application.
Reading a Message
For the purposes of this blog, we’ll focus specifically on the playTTS method. As the name implies, the playTTS method enables you to play text-to-speech to the caller.
A Diversity of Languages
SignalWire can tap into the synthesized voices provided by Amazon Polly and Google Cloud to intuitively customize the gender and language of your text-to-speech. Here, you can augment your synthesized voice to address an audience that speaks European Portuguese.
Speech Synthesis Markup Language (SSML)
SignalWire’s RELAY Realtime SDK can also employ the Speech Synthesis Markup Language for more granular control over how our synthesized voice is presented to the caller.
By wrapping your text in the <speak> tag, you can make use of subsequent SSML tags to alter the cadence and pronunciation, among other parameters, of your text-to-speech. See Amazon Polly or Google Cloud documentation for supported tags.