Stream

The <Stream> instruction makes it possible to send raw audio streams from a running phone call over WebSockets in near real time, to a specified URL. The audio frames themselves are base64 encoded, embedded in a json string, together with other information like sequence number and timestamp. The feature can be used with Speech-To-Text systems and others.

Attributes

An example on how to use Stream:

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3   <Start>
4      <Stream url="wss://your-application.com/audiostream" />
5   </Start>
6 </Response>

This cXML will instruct Signalwire to make a copy of the audio frames of the current call and send them in near real-time over WebSocket to wss://your-application.com/audiostream.

<Stream> will start the audio stream in an asynchronous manner; it will continue with the next cXML instruction at once. In case there is no instruction, Signalwire will disconnect the call.

1 const { RestClient } = require("@signalwire/compatibility-api");
2 const response = new RestClient.LaML.VoiceResponse();
3 
4 const start = response.start();
5 start.stream({
6   name: "Example Audio Stream",
7   url: "wss://your-application.com/audiostream",
8 });
9 
10 console.log(response.toString());

Conference Stream

url

stringRequired

Absolute or relative URL. A WebSocket connection to the url will be established and audio will start flowing towards the Websocket server. The only supported protocol is wss. For security reasons ws is NOT supported.

authBearerToken

string

An authentication Bearer token that can be supplied when starting a stream. The remote server can then authenticate the websocket connection request from the supplied token. More information can be found in the WebSocket connection section.

codec

stringDefaults to PCMU@8000h

Specifies the audio codec for the stream. See Supported Codecs for full list.

name

string

Unique name for the Stream, per Call. It is used to stop a Stream by name.

realtime

booleanDefaults to false

If true, and the stream is bidirectional, the stream offers a realtime experience to the call parties by managing packet delays and bursts. If false, the user benefits from buffered audio, which can be played out with delay.

statusCallback

string

Absolute or relative URL. SignalWire will make a HTTP GET or POST request to this URL when a Stream is started, stopped or there is an error.

statusCallbackMethod

stringDefaults to POST

GET or POST. The type of HTTP request to use when requesting a statusCallback.

track

stringDefaults to inbound_track

This attribute can be one of: inbound_track, outbound_track, both_tracks. For both_tracks there will be both inbound_track and outbound_track events. If the stream is bidirectional, the only available attribute is inbound_track.

Looking to use our REST APIs?

You can utilize our REST API to both start and stop streams.

`StatusCallback` parameters

For a statusCallback, SignalWire will send a request with the following parameters:

AccountSid

string

The unique ID of the Account this call is associated with.

CallSid

string

A unique identifier for the call. May be used to later retrieve this message from the REST API.

ConferenceSid

string

If the stream is part of a conference, the unique identifier for the conference.

StreamError

string

If an error has occurred, this will contain a detailed error message. See StreamError Values for possible values.

StreamEvent

string

One of stream-started, stream-stopped, or stream-error.

StreamName

string

If defined, this is the unique name of the Stream. Defaults to the StreamSid.

StreamSid

string

The unique identifier for this Stream.

StreamTrack

string

The track configuration for this stream: inbound_track, outbound_track, or both_tracks.

Timestamp

string

The time of the event in ISO 8601 format.

Unique-ID

string

The unique call identifier.

StreamError values

When StreamEvent is stream-error, the StreamError field will contain one of the following values:

Error Value	Meaning	Common Causes
`invalid_url`	Invalid WebSocket URL format	URL doesn’t use `wss://` protocol
`missing_url`	No URL provided	`url` attribute not specified
`invalid_track`	Invalid track configuration	Track value not one of: `inbound_track`, `outbound_track`, `both_tracks`
`codec_error`	Codec not supported or misconfigured	Requested codec not enabled for your account, or multi-channel audio requested (only mono is supported)
`connection_refused`	Remote endpoint rejected connection	Your WebSocket server refused the connection
`connection_refused_timeout_or_ssl_error`	Connection timeout or SSL failure	WebSocket server unreachable or SSL certificate issues
`general_error`	Internal stream initialization failed	Internal error during stream setup - contact support with StreamSid
`Duplicated stream ID`	Stream ID already in use	Conference already has a stream with this ID
`Duplicated stream name`	Stream name already in use	Conference already has a stream with this name

Supported codecs

The codec attribute allows you to control the audio codec used for the stream. The following codecs are supported:

Codec Value	Sample Rates Available
`PCMU` (default)	`8000h`
`L16`	`16000h`, `24000h`

Codec Format Examples:

PCMU@8000h (default if no codec specified)
L16@24000h
L16@16000h

WebSocket connection

When establishing a stream, SignalWire initiates a WebSocket connection to your specified URL endpoint. The connection begins with an HTTP upgrade request containing the following headers:

Authorization

string

Bearer token for authentication if authBearerToken attribute is provided (format: “Bearer token_here”).

Connection

string

Connection type for the upgrade (value: “Upgrade”).

Host

string

The destination server hosting the WebSocket endpoint (e.g., “example.com”).

Sec-WebSocket-Key

string

Base64-encoded random value used for the WebSocket handshake.

Sec-WebSocket-Version

string

WebSocket protocol version (value: “13”).

Upgrade

string

Protocol upgrade request indicating a switch to WebSocket (value: “websocket”).

Once the WebSocket connection is established, SignalWire will send various events throughout the stream’s lifecycle. These events are delivered as JSON-formatted WebSocket messages, each containing an event property that identifies the message type.

SignalWire sends the following event types to your WebSocket server:

Connected - Initial handshake message confirming the connection
Start - Stream metadata and configuration details
Media - Audio data packets
DTMF - Touch-tone digit events
Mark - Audio playback completion acknowledgments (echoed back when you send a mark)
Stop - Stream termination notification

When using bidirectional streams with <Connect><Stream>, you can also send messages to SignalWire:

Media - Send audio data into the call
Mark - Request playback completion acknowledgment
Clear - Flush the audio buffer
DTMF - Inject DTMF tones into the call

Connected message

SignalWire sends the Connected event immediately after establishing the WebSocket connection. This initial message outlines the communication protocol for all subsequent interactions.

Property	Description
event	The string value of `connected`.
protocol	Defines the protocol for the WebSocket connection’s lifetime. Value: `Call`
version	Semantic version of the protocol. Current version: `0.2.0`

Example Connected message

1 {
2   "event": "connected",
3   "protocol": "Call",
4   "version": "0.2.0"
5 }

Start message

SignalWire delivers this message right after the Connected event, providing essential stream configuration details. This message appears only once when the stream initializes.

Property	Description
event	The string value of `start`.
sequenceNumber	Message sequence tracking, starting from “1” and incrementing with each message.
start	Container holding stream configuration and metadata details.
start.streamSid	The unique identifier of the Stream.
start.accountSid	The Account identifier that created the Stream.
start.callSid	Call session identifier where the stream originated.
start.tracks	Array specifying which audio directions will be transmitted. Possible values: `["inbound"]`, `["outbound"]`, or `["inbound", "outbound"]`.
start.customParameters	Object containing custom key-value pairs configured during stream creation. Only present when custom parameters are defined.
start.mediaFormat	Configuration details for audio data formatting.
start.mediaFormat.encoding	Audio codec format. Possible values: `audio/x-mulaw` (PCMU), `audio/x-L16` (L16).
start.mediaFormat.sampleRate	Audio sampling frequency in Hz. Possible values: 8000 (PCMU), 16000 (L16), or 24000 (L16).
start.mediaFormat.channels	Audio channel count. Always 1 (mono). Multi-channel audio is not supported.

Example Start message

1 {
2   "event": "start",
3   "sequenceNumber": "1",
4   "start": {
5     "streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
6     "accountSid": "b08dacad-2f6c-4de1-93d6-cc732e0c69c5",
7     "callSid": "76ac3c36-56da-4a3e-a0d6-b5f8df6da9ad",
8     "tracks": [
9       "inbound"
10     ],
11     "customParameters": {},
12     "mediaFormat": {
13       "encoding": "audio/x-L16",
14       "sampleRate": 24000,
15       "channels": 1
16     }
17   }
18 }

Media message

Media messages deliver the actual audio content from the call as it flows through the stream.

Property	Description
event	The string value of `media`.
sequenceNumber	Sequential message counter for ordering, starting at “1” and incrementing per transmission.
media	Container with audio data and associated metadata.
media.track	Audio track identifier. One of: `inbound` or `outbound`.
media.chunk	Chunk counter for this track. Starts at “1” and increments with each chunk.
media.timestamp	Presentation timestamp in milliseconds from the start of the stream.
media.payload	Base64-encoded raw audio data.

Example Media message

1 {
2   "event": "media",
3   "sequenceNumber": "42",
4   "media": {
5     "track": "inbound",
6     "chunk": "1",
7     "timestamp": "0",
8     "payload": "<base64-encoded-audio>"
9   }
10 }

Stop message

SignalWire transmits a stop message when the stream terminates or the associated call concludes.

Property	Description
event	The string value of `stop`.
sequenceNumber	Message sequence counter.

Example stop message

1 {
2   "event": "stop",
3   "sequenceNumber": "999"
4 }

DTMF message

SignalWire generates DTMF messages whenever touch-tone key presses are detected in the audio stream.

Property	Description
event	Event type identifier set to `dtmf`.
sequenceNumber	Message sequence counter.
streamSid	Stream identifier. Only included for bidirectional streams.
dtmf	Container holding the detected touch-tone details.
dtmf.digit	The digit that was pressed. Values: 0-9, *, #, A-D.
dtmf.duration	Duration of the key press in milliseconds.

Example DTMF message

1 {
2   "event": "dtmf",
3   "sequenceNumber": "123",
4   "streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
5   "dtmf": {
6     "digit": "5",
7     "duration": 2000
8   }
9 }

Mark message

SignalWire delivers mark messages as acknowledgments for completed audio playback or cleared buffer operations. These responses match the mark identifiers from your earlier transmissions to SignalWire.

Property	Description
event	Event type designation set to `mark`.
streamSid	Stream connection identifier. Only included for bidirectional streams.
mark	Container with the mark acknowledgment details.
mark.name	The mark identifier echoed back from your original transmission.

Example Mark message

1 {
2   "event": "mark",
3   "streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
4   "mark": {
5     "name": "my-custom-mark"
6   }
7 }

Sending WebSocket messages

When you create a Stream within a <Connect><Stream> element, the connection becomes bidirectional. Your application can transmit WebSocket messages to SignalWire, enabling you to inject audio into the active call and manage the stream’s behavior.

The messages that your WebSocket server can send back to SignalWire are:

Media - Send audio data back into the call
Mark - Track when audio playback completes
Clear - Interrupt buffered audio
DTMF - Inject DTMF tones into the call

Send a media message

Transmitting audio to SignalWire requires constructing a valid media message with the correct structure.

The payload encoding depends on the codec specified in your Stream configuration:

Default (PCMU/mulaw): audio/x-mulaw with 8000 Hz sample rate
L16@16000h: Linear PCM with 16000 Hz sample rate
L16@24000h: Linear PCM with 24000 Hz sample rate

All audio must be base64 encoded. SignalWire queues incoming media messages and plays them sequentially. To stop playback and clear the queue, transmit a clear message.

Ensure your `media.payload` contains only raw audio data without file format headers. Including format headers will result in corrupted audio playback.

Property	Description
event	Specifies the message type. Set to `"media"` for audio data.
streamSid	Target stream identifier for audio playback
media	Container object holding the audio payload
media.payload	Base64-encoded audio data (format varies by codec configuration)

Example media message (payload abbreviated):

1 {
2   "event": "media",
3   "streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
4   "media": {
5     "payload": "a3242sa..."
6   }
7 }

Send a mark message

Transmit a mark message following your media messages to receive confirmation when audio playback finishes. SignalWire responds with a matching mark identifier once the audio completes playing (or immediately if no audio is queued).

You’ll also receive mark confirmations when the audio queue is cleared via a clear message.

Property	Description
event	Message type identifier. Set to `"mark"` for completion tracking.
streamSid	Target stream identifier for the mark operation
mark	Container object with mark details
mark.name	Custom identifier to track specific audio segments or playback events

Example mark message:

1 {
2  "event": "mark",
3  "streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
4  "mark": {
5    "name": "my label"
6  }
7 }

Send a clear message

Transmit a clear message to halt audio playback and flush the audio queue. This action triggers SignalWire to return any pending mark messages for the cleared audio segments.

Property	Description
event	Message type identifier. Set to `"clear"` for audio interruption.
streamSid	Target stream identifier where audio should be stopped.

Example clear message:

1 {
2  "event": "clear",
3  "streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390"
4 }

Send a DTMF message

Transmit a DTMF message to inject touch-tone digits into the call. This allows you to programmatically send DTMF tones as if they were pressed on a keypad.

Property	Description
event	Message type identifier. Set to `"dtmf"` for DTMF injection.
streamSid	Target stream identifier where DTMF should be sent.
dtmf	Container object with DTMF details
dtmf.digit	The digit to send. Valid values: 0-9, *, #, A-D

Example DTMF message:

1 {
2  "event": "dtmf",
3  "streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
4  "dtmf": {
5    "digit": "5"
6  }
7 }

Examples

Conference stream

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3 <Dial trim="do-not-trim">
4   <Conference beep="false" startConferenceOnEnter="true" trim="do-not-trim" streamUrl="wss://206.189.19.130:8765/">test
5     <Stream name="my_conference_stream"
6             url="wss://206.189.19.130:8765/"
7             streamStartConferenceOnEnter="true"
8             bidir="true">
9       <Parameter name="foo1" value="bar1"/>
10       <Parameter name="foo2" value="bar2"/>
11     </Stream>
12   </Conference>
13 </Dial>
14 </Response>

Bidirectional stream

The <Stream> instruction can allow you to receive audio into the call too. In this case, the stream must be bidirectional. The external service (e.g., an AI agent) will then be able to both hear the call and play audio.

To initialize a bidirectional stream, wrap the <Stream> instruction in <Connect> instead of <Start>.

1 <Connect>
2     <Stream url="wss://mystream.ngrok.io/audiostream" />
3 </Connect>

Starting and stopping streams

It is possible to stop a stream at any time by name. For instance by naming the Stream “mystream”, you can later use the unique name of “mystream” to stop the stream.

1 <Start>
2     <Stream name="mystream" url="wss://mystream.ngrok.io/audiostream" />
3 </Start>

1 <Stop>
2    <Stream name="mystream" />
3 </Stop>

Custom parameters

To pass parameters towards the wss server, it is possible to include additional key value pairs. This can be done by using the nested <Parameter> cXML noun. These parameters will be added to the Start message, as json.

1 <?xml version="1.0" encoding="UTF-8"?>
2 <Response>
3    <Start>
4      <Stream url="wss://your-application.com/audiostream" >
5         <Parameter name="Cookie" value ="948f9938-299a-d43e-0df4-af3a7eccb0ac"/>
6         <Parameter name="Type" value ="SIP" />
7       </Stream>
8     </Start>
9 </Response>

Notes on usage

The url does not support query string parameters. To pass custom key value pairs to the WebSocket, make use of Custom Parameters instead.
There is a one to one mapping of a stream to a websocket connection, therefore there will be at most one call being streamed over a single websocket connection. Information will be provided so that you can handle multiple inbound connections and manage the association between the unique stream identifier (StreamSid) and the connection.
On any given call there are inbound and outbound tracks, inbound represents the audio Signalwire receives from the call, outbound represents the audio generated by Signalwire for the Call.

*Twilio and TwiML are trademarks of Twilio, Inc. SignalWire, Inc. and its products are not affiliated with or endorsed by Twilio, Inc.

Attributes

Stream

Conference Stream

url

authBearerToken

codec

name

realtime

statusCallback

statusCallbackMethod

track

Looking to use our REST APIs?

StatusCallback parameters

AccountSid

CallSid

ConferenceSid

StreamError

StreamEvent

StreamName

StreamSid

StreamTrack

Timestamp

Unique-ID

StreamError values

Supported codecs

WebSocket connection

Authorization

Connection

Host

Sec-WebSocket-Key

Sec-WebSocket-Version

Upgrade

Connected message

Start message

Media message

Stop message

DTMF message

Mark message

Sending WebSocket messages

Send a media message

Ensure your media.payload contains only raw audio data without file format headers. Including format headers will result in corrupted audio playback.

Send a mark message

Send a clear message

Send a DTMF message

Examples

Conference stream

Bidirectional stream

Starting and stopping streams

Custom parameters

Notes on usage

`StatusCallback` parameters

Ensure your `media.payload` contains only raw audio data without file format headers. Including format headers will result in corrupted audio playback.