Contact Sales

All fields are required

Every Vendor Hop Costs You 200ms | SignalWire
Latency Benchmark

Every Vendor Hop Costs You 200ms

Most platforms measure one step and call it latency. Full roundtrip is what callers feel: sentence ends, AI responds. That gap decides whether they stay or hang up.

< 1.2s
typical AI response latency
1
platform for the full AI pipeline
2.7B
minutes processed
40%
more abandonment above 1.2s

Trusted by 2,000+ companies

The problem

Bolt-on pipelines stack latency at every boundary

Six hops between caller and AI

PSTN to telephony provider, telephony to WebSocket, WebSocket to your server, server to STT, STT to LLM, LLM to TTS, TTS back through the chain. Each hop adds 50 to 300ms.

Partial metrics hide the real number

STT-to-first-token measures one step. TTS time-to-first-byte measures another. Neither measures the time a caller waits between finishing a sentence and hearing the AI respond.

Optimization cannot eliminate architecture

Switching to a faster TTS provider saves 200ms but does not remove the other five network boundaries. Co-locating servers helps, but four to six network transits remain.

Streaming helps, but boundaries remain

Streaming STT and TTS reduces batch delays. Each stream still crosses a network boundary. Streaming over WebSocket to an external service is faster than batch, but slower than processing inside one engine.

Build a Voice AI Agent

from signalwire_agents import AgentBase
from signalwire_agents.core.function_result import SwaigFunctionResult

class SupportAgent(AgentBase):
    def __init__(self):
        super().__init__(name="Support Agent", route="/support")
        self.prompt_add_section("Instructions",
            body="You are a customer support agent. "
                 "Greet the caller and resolve their issue.")
        self.add_language("English", "en-US", "rime.spore:mistv2")

    @AgentBase.tool(name="check_order")
    def check_order(self, order_id: str):
        """Check the status of a customer order.

        Args:
            order_id: The order ID to look up
        """
        return SwaigFunctionResult(f"Order {order_id}: shipped, ETA April 2nd")

agent = SupportAgent()
agent.run()

Multi-vendor pipeline vs. single engine

Bolt-on pipeline

  • Six to nine network hops per turn
  • 770ms to 2,080ms measured roundtrip
  • Each vendor measures its own slice
  • Codec transcoding between services (G.711 to PCM)
  • WebSocket piping adds silent failure modes
  • Co-location helps one hop, not six

SignalWire

  • Orchestration inside the media stack
  • 800-1200ms typical full roundtrip
  • One engine measures the entire path
  • Native codec handling, no transcoding step
  • Audio stays inside the media engine
  • Built by the FreeSWITCH team

Independent latency measurements by platform

PlatformMeasured latencySource
Twilio950ms averageTelnyx: Voice AI Agents Compared
Vonage800 to 1,200msTelnyx: Voice AI Agents Compared
Vapi (India region)1,450msTrustpilot reviews, production reports
Bland AI800ms averageG2 reviews
DIY WebSocket stack1,920ms medianDEV Community benchmark
DIY WebRTC stack2,060ms medianDEV Community benchmark
LiveKit + Twilio (EU)4,000ms+ per turnGitHub issues, production reports
SignalWire800-1200ms typicalFull roundtrip measurement

Where milliseconds accumulate in a bolt-on stack

HopWhat happensLatency added
PSTN to telephony providerCall ingress, media stream setup50 to 100ms
Telephony to WebSocketBase64 encode mu-law, open stream30 to 80ms
WebSocket to your serverNetwork transit, decode, buffer20 to 50ms
Server to STTCodec convert, stream audio, wait for transcript200 to 400ms
STT to LLMSend transcript, wait for first tokens200 to 800ms
LLM to TTSSend text, wait for first audio chunk150 to 400ms
TTS back through chainEncode, transmit, decode at each boundary120 to 250ms
Total770 to 2,080ms

How SignalWire achieves sub-second response

1

Call arrives at the media engine

PSTN ingress with no external telephony provider in the path. The audio is already inside the engine.

2

STT streams concurrently

Audio processes every 250ms during speech. No waiting for the caller to finish before transcription begins.

3

LLM inference runs in parallel

The transcript streams to the LLM while the caller is still speaking. Response generation overlaps with transcription.

4

TTS generates audio inside the engine

No network hop to an external synthesis service. Audio goes from TTS to PSTN egress without leaving the platform.

Some platforms report STT-to-first-token. Others report TTS time-to-first-byte. Full roundtrip measures what the caller experiences: the gap between finishing a sentence and hearing the AI respond.

What latency feels like to callers

LatencyCaller experienceBusiness impact
Under 500msFeels instantaneousOptimal engagement
500 to 800msSlight pause, still conversationalAcceptable for most use cases
800 to 1,200msNoticeable delay, like a bad international callCallers start talking over the agent
1,200 to 2,000msAwkward pauses, callers check if the line dropped40% increase in call abandonment
Above 2,000msCaller hangs up or asks for a humanSupport escalation, lost revenue

FAQ

How is 800-1200ms measured?

Full roundtrip: the moment the caller stops speaking to the moment the caller hears the AI respond. Not a partial metric like STT-to-first-token or TTS time-to-first-byte. With speech-to-speech voice models, latency can be as low as 600ms.

Where do the competitor numbers come from?

Twilio and Vonage numbers come from a Telnyx benchmark (a competitor publishing independent measurements). Vapi numbers come from Trustpilot reviews and production reports. DIY stack numbers come from DEV Community benchmarks.

Can I reduce latency by switching to a faster TTS provider?

Switching providers saves time on one hop but does not eliminate the other five to eight network boundaries. Architecture determines the floor. Optimization determines how close you get to it.

Does SignalWire lock me into specific STT or LLM providers?

No. You can bring your own models. The AI kernel orchestrates them from inside the media engine, eliminating the orchestration overhead of bolt-on pipelines.

What about caching LLM responses to reduce latency?

Caching helps for common queries but removes the benefit of having an AI agent that handles novel conversations. Every external API call is a network round-trip that no cache eliminates.

Trusted by

Measure the full roundtrip yourself.

Run the same conversation on your current stack and on SignalWire. Compare what your callers actually experience.