If you want to add Speech Recognition to a given call to increase interactivity, SignalWire allows you to do just that. If instead, you want to transcribe entire calls, SignalWire allows you to stream call audio out to a server of your choice, and then do anything you like with it.
In this post, we'll explore how to stream call audio out to Deepgram for transcriptions using SignalWire's Compatibility API.
Setting Up the Environment
Before diving into the code, let's ensure you have everything you’ll need to get started. In order to follow along you should have:
A basic understanding of Node.js and Git
Docker installed on your machine
An Ngrok account for local development and testing
Running the Application
To begin, clone the signalwire-in-seconds repository and navigate to the streaming-to-deepgram folder. Inside this folder, locate the .env.example file and make a copy named .env. Fill in the required information, including your Deepgram token, Ngrok token, and port number.
Once the environment is set up, you can build and run the Docker container. Open a terminal and execute the following commands:
docker build . -t deepgram
docker run -it --rm -p 3000:3000 --name deepgram --env-file .env deepgram
With the container up and running, your application is now ready to receive POST requests to the /startStreaming endpoint. You should see the following in your terminal:
Please connect a Phone Number with the following Webhook URL: https://####-###-##-##-##.ngrok-free.app/startStreaming
For simplicity, we’ll connect a phone number to this endpoint by changing its settings, so when a phone call comes in SignalWire asks startStreaming for what to do next:
Understanding the Code
Now that our application is running, let's explore the code step-by-step to gain a clear understanding of how the call audio streaming and transcription process works.
We start by setting up the necessary dependencies, such as loading environment variables using dotenv, importing the ngrok package for creating a tunnel to our machine, setting up an Express application, importing the SignalWire Compatibility API RestClient, and finally the Deepgram SDK:
The following code snippet demonstrates the /startStreaming endpoint, which is the entry point for call initiation and audio streaming:
When a POST request is made to this endpoint, a SignalWire VoiceResponse object is created, and a few instructions get added, resulting in the following XML instructions:
SignalWire will process these instructions, and start streaming call audio to the provided WebSocket URL. Since Stream runs asynchronously, SignalWire also processes the rest of the instructions. In this case we’re just pausing and saying a few words that will be picked up later on by Deepgram.
Handling Transcriptions with Deepgram
In order to clean up our code and facilitate printing transcription results to the terminal, we use two helper functions:
The createDeepgramConnection function sets up a new connection to Deepgram’s transcription service with a few parameters, which you can tweak by following Deepgram’s Streaming API reference.
The listenForTranscriptionResults function sets up a listener for transcriptReceived events from Deepgram and logs the transcribed text, specifying the call SID and the call leg.
Handling SignalWire’s Stream
When we use <Stream>, SignalWire starts streaming audio to our WebSocket server. In this part of the code we listen for new connections (each call will generate a new one), and set up connections to Deepgram for both inbound and outbound tracks.
We then listen for new messages coming over that WebSocket connection, and process them depending on the kind of event:
Connected - We just print a new call was connected
Start - The Stream is about to start, so we start listening for transcription results
Media - Depending on the track the audio belongs to, we send the audio payload to Deepgram for them to transcribe
Stop - We just print the call ended
The end result
When you call the number associated with your /startStreaming endpoint, you’ll start seeing the following output in your console:
New Connection Initiated. A new call was connected. 18862b57-e0db-4262-8245-8ca67b64fbd6 outbound Please wait one moment. 18862b57-e0db-4262-8245-8ca67b64fbd6 outbound Ok. 18862b57-e0db-4262-8245-8ca67b64fbd6 inbound Thank you so much for your help, by the way. 18862b57-e0db-4262-8245-8ca67b64fbd6 outbound We are open Monday through Friday. 18862b57-e0db-4262-8245-8ca67b64fbd6 inbound Cool. Thank you! Call has ended.
With this knowledge, you can now leverage the capabilities of Deepgram and SignalWire’s programmable voice API to build innovative voice-based applications with seamless and efficient voice transcriptions. Sign up for a free trial and start exploring the possibilities with SignalWire today!
To learn more about SignalWire's capabilities, visit our developer docs. Join the vibrant SignalWire community on Slack and our Forum to interact with the team and share your projects. We can't wait to see what you build!
Want to learn more? Check out the companion video: