Contact Sales

All fields are required

Your Audio Pipeline Fails Without Errors | SignalWire
Audio Architecture for Voice AI

Your Audio Pipeline Fails Without Errors

Codec transcoding between vendors breaks silently. No exception, no log entry, no alert. Callers hear garbled audio for days before anyone notices.

< 1.2s
typical AI response latency
0
vendor hops in the media path
2.7B+
minutes processed on the platform
4+
codec conversions in a typical multi-vendor audio path
The problem nobody warns you about

Silent Failures Across Vendor Boundaries

Codec mismatch produces no error

Phone networks speak G.711 mu-law at 8kHz. Your STT model expects PCM at 16kHz. Your TTS generates audio at 24kHz. When the transcoding between vendors fails, there is no error message. Only garbled audio and confused callers.

WebSocket connects, audio vanishes

The HTTP 101 handshake completes successfully. Media packets drop silently between services. Developers spend hours correlating logs across providers before discovering the audio never arrived.

TTS chunks fragment at codec boundaries

When your TTS provider generates audio in chunks and your transcoding layer uses fixed frame sizes, boundaries misalign. The result: glitches at every chunk transition that require custom buffer management per provider.

Barge-in races across external services

Old TTS audio keeps playing because the buffer was not cleared. The 5ms window between voice activity detection and buffer flush is nearly impossible to hit when audio processing spans services with variable network latency.

Build a Voice AI Agent

from signalwire_agents import AgentBase
from signalwire_agents.core.function_result import SwaigFunctionResult

class SupportAgent(AgentBase):
    def __init__(self):
        super().__init__(name="Support Agent", route="/support")
        self.prompt_add_section("Instructions",
            body="You are a customer support agent. "
                 "Greet the caller and resolve their issue.")
        self.add_language("English", "en-US", "rime.spore:mistv2")

    @AgentBase.tool(name="check_order")
    def check_order(self, order_id: str):
        """Check the status of a customer order.

        Args:
            order_id: The order ID to look up
        """
        return SwaigFunctionResult(f"Order {order_id}: shipped, ETA April 2nd")

agent = SupportAgent()
agent.run()

Multi-vendor audio pipeline vs. single media engine

Multi-Vendor Pipeline

  • G.711 to PCM transcoding between telephony and STT providers
  • WebSocket relay adds encoding, decoding, and network hops
  • TTS chunk boundaries misalign with codec frame sizes
  • Barge-in requires coordinating buffer flush across external services
  • Silent failures produce no errors, no logs, no alerts

SignalWire

  • PSTN audio enters the media engine and gets processed in place
  • No WebSocket relay, no base64 encoding, no sample rate mismatch
  • TTS streaming and STT processing share the same audio buffer
  • Barge-in handled natively with sub-millisecond buffer management
  • Format mismatches caught and reported as structured errors

How codec transcoding fails silently

FailureWhat callers hearError message
Mu-law sent to a service expecting PCMGarbled, unintelligible audioNone
TTS buffer fragmented at codec boundariesChoppy, stuttering responsesNone
Sample rate mismatch after WebSocket relayChipmunk or slow-motion audioNone
Base64 encoding error in the mu-law pathStatic or silenceNone
Media packets dropped between servicesComplete silence mid-callWebSocket reports connected

Twilio sends mulaw 8kHz. If the STT expects PCM 16kHz, you get garbled audio. No error. No log. The call sounds broken and you have no idea why.

From five-vendor audio pipeline to one

1

Define your agent

Write a YAML document or Python class. Point a phone number at it.

2

Audio handled internally

The media engine negotiates codecs, processes STT and TTS, and manages buffers in one process.

3

Test with real calls

Call the number. Audio quality is consistent because there are no vendor boundaries to cross.

4

Ship to production

Same platform, same audio path, same behavior at scale. No integration surprises.

Upsampling 8kHz audio to 16kHz does not recover lost frequency information. STT accuracy is determined by the input codec, not the model. If your audio pipeline transcodes through G.711 before reaching the AI, you have already lost the data that matters.

FAQ

Can I bring my own STT or TTS provider?

Yes. The platform supports multiple STT and TTS providers. Codec negotiation and format conversion happen inside the media engine regardless of which provider you choose.

What codecs does SignalWire support?

G.711 (mu-law and A-law), G.722, and Opus with automatic negotiation. Conversion between PSTN codecs and AI model requirements happens inside the media engine.

How does barge-in work without external WebSocket coordination?

Voice activity detection, TTS streaming, and STT processing share the same audio buffer inside the media engine. Barge-in is a native operation, not a race condition across external services.

Who built this platform?

The team that wrote FreeSWITCH, the open-source telephony engine that processes trillions of minutes across the industry. Production audio engineering is the foundation.

Trusted by 2,000+ companies

Stop debugging codecs. Start shipping agents.

One media engine. No vendor gaps. No silent failures. Built by the team that wrote FreeSWITCH.