The <Stream> instruction makes it possible to send raw audio streams from a running phone call over WebSockets in near real time, to a specified URL. The audio frames themselves are base64 encoded, embedded in a json string, together with other information like sequence number and timestamp. The feature can be used with Speech-To-Text systems and others.
An example on how to use Stream:
This cXML will instruct Signalwire to make a copy of the audio frames of the current call and send them in near real-time over WebSocket to wss://your-application.com/audiostream.
<Stream> will start the audio stream in an asynchronous manner; it will continue with the next cXML instruction at once. In case there is no instruction, Signalwire will disconnect the call.
Absolute or relative URL. A WebSocket connection to the url will be established and audio will start flowing towards the Websocket server. The only supported protocol is wss. For security reasons ws is NOT supported.
An authentication Bearer token that can be supplied when starting a stream. The remote server can then authenticate the websocket connection request from the supplied token. More information can be found in the WebSocket connection section.
Specifies the audio codec for the stream. See Supported Codecs for full list.
Unique name for the Stream, per Call. It is used to stop a Stream by name.
If true, and the stream is bidirectional, the stream offers a realtime experience to the call parties by managing packet delays and bursts. If false, the user benefits from buffered audio, which can be played out with delay.
Absolute or relative URL. SignalWire will make a HTTP GET or POST request to this URL when a Stream is started, stopped or there is an error.
GET or POST. The type of HTTP request to use when requesting a statusCallback.
This attribute can be one of: inbound_track, outbound_track, both_tracks. For both_tracks there will be both inbound_track and outbound_track events. If the stream is bidirectional, the only available attribute is inbound_track.
StatusCallback parametersFor a statusCallback, SignalWire will send a request with the following parameters:
The unique ID of the Account this call is associated with.
A unique identifier for the call. May be used to later retrieve this message from the REST API.
If the stream is part of a conference, the unique identifier for the conference.
If an error has occurred, this will contain a detailed error message. See StreamError Values for possible values.
One of stream-started, stream-stopped, or stream-error.
If defined, this is the unique name of the Stream. Defaults to the StreamSid.
The unique identifier for this Stream.
The track configuration for this stream: inbound_track, outbound_track, or both_tracks.
The time of the event in ISO 8601 format.
The unique call identifier.
When StreamEvent is stream-error, the StreamError field will contain one of the following values:
The codec attribute allows you to control the audio codec used for the stream. The following codecs are supported:
Codec Format Examples:
PCMU@8000h (default if no codec specified)L16@24000hL16@16000hWhen establishing a stream, SignalWire initiates a WebSocket connection to your specified URL endpoint. The connection begins with an HTTP upgrade request containing the following headers:
Bearer token for authentication if authBearerToken attribute is provided (format: “Bearer token_here”).
Connection type for the upgrade (value: “Upgrade”).
The destination server hosting the WebSocket endpoint (e.g., “example.com”).
Base64-encoded random value used for the WebSocket handshake.
WebSocket protocol version (value: “13”).
Protocol upgrade request indicating a switch to WebSocket (value: “websocket”).
Once the WebSocket connection is established, SignalWire will send various events throughout the stream’s lifecycle.
These events are delivered as JSON-formatted WebSocket messages, each containing an event property that identifies the message type.
SignalWire sends the following event types to your WebSocket server:
When using bidirectional streams with <Connect><Stream>, you can also send messages to SignalWire:
SignalWire sends the Connected event immediately after establishing the WebSocket connection. This initial message outlines the communication protocol for all subsequent interactions.
Example Connected message
SignalWire delivers this message right after the Connected event, providing essential stream configuration details.
This message appears only once when the stream initializes.
Example Start message
Media messages deliver the actual audio content from the call as it flows through the stream.
Example Media message
SignalWire transmits a stop message when the stream terminates or the associated call concludes.
Example stop message
SignalWire generates DTMF messages whenever touch-tone key presses are detected in the audio stream.
Example DTMF message
SignalWire delivers mark messages as acknowledgments for completed audio playback or cleared buffer operations. These responses match the mark identifiers from your earlier transmissions to SignalWire.
Example Mark message
When you create a Stream within a <Connect><Stream> element, the connection becomes bidirectional. Your application can transmit WebSocket messages to SignalWire, enabling you to inject audio into the active call and manage the stream’s behavior.
The messages that your WebSocket server can send back to SignalWire are:
Transmitting audio to SignalWire requires constructing a valid media message with the correct structure.
The payload encoding depends on the codec specified in your Stream configuration:
audio/x-mulaw with 8000 Hz sample rateAll audio must be base64 encoded. SignalWire queues incoming media messages and plays them sequentially. To stop playback and clear the queue, transmit a clear message.
Ensure your media.payload contains only raw audio data without file format headers. Including format headers will result in corrupted audio playback.
Example media message (payload abbreviated):
Transmit a mark message following your media messages to receive confirmation when audio playback finishes. SignalWire responds with a matching mark identifier once the audio completes playing (or immediately if no audio is queued).
You’ll also receive mark confirmations when the audio queue is cleared via a clear message.
Example mark message:
Transmit a clear message to halt audio playback and flush the audio queue. This action triggers SignalWire to return any pending mark messages for the cleared audio segments.
Example clear message:
Transmit a DTMF message to inject touch-tone digits into the call. This allows you to programmatically send DTMF tones as if they were pressed on a keypad.
Example DTMF message:
The <Stream> instruction can allow you to receive audio into the call too. In
this case, the stream must be bidirectional. The external service (e.g., an AI
agent) will then be able to both hear the call and play audio.
To initialize a bidirectional stream, wrap the <Stream> instruction in <Connect> instead of <Start>.
It is possible to stop a stream at any time by name. For instance by naming the Stream “mystream”, you can later use the unique name of “mystream” to stop the stream.
To pass parameters towards the wss server, it is possible to include additional key value pairs.
This can be done by using the nested <Parameter> cXML noun. These parameters will be added to the Start message, as json.
inbound represents the audio Signalwire receives from the call, outbound represents the audio generated by Signalwire for the Call.*Twilio and TwiML are trademarks of Twilio, Inc. SignalWire, Inc. and its products are not affiliated with or endorsed by Twilio, Inc.