All fields are required
You are managing five SDKs, five invoices, and five escalation paths to make one voice call. There is a simpler architecture.
Building a voice AI agent today means stitching together a telephony provider, a speech-to-text service, a language model, a text-to-speech engine, and your own orchestration layer. Your code becomes the glue between five independent systems with different timing guarantees, different error models, and different billing.
Every network hop between components adds latency. Every vendor boundary adds a failure mode. Every integration adds state you need to track. The result: 2-4 seconds of response latency, race conditions at scale, and five support queues when something breaks at 2am.
from signalwire_agents import AgentBase
from signalwire_agents.core.function_result import SwaigFunctionResult
class SupportAgent(AgentBase):
def __init__(self):
super().__init__(name="Support Agent", route="/support")
self.prompt_add_section("Instructions",
body="You are a customer support agent. "
"Greet the caller and resolve their issue.")
self.add_language("English", "en-US", "rime.spore:mistv2")
@AgentBase.tool(name="check_order")
def check_order(self, order_id: str):
"""Check the status of a customer order.
Args:
order_id: The order ID to look up
"""
return SwaigFunctionResult(f"Order {order_id}: shipped, ETA April 2nd")
agent = SupportAgent()
agent.run()
The platform sends tool call requests to your agent when tools are invoked. Your agent returns results. The platform handles everything between the caller and your code.
@agent.tool("check_availability",
description="Check available appointment slots for a given date",
parameters={
"date": {"type": "string", "description": "Date in YYYY-MM-DD format"}
})
def check_availability(args):
slots = scheduling_api.get_slots(args["date"])
if slots:
return f"Available times: {', '.join(slots)}"
return "No availability on that date."
@agent.tool("book_appointment",
description="Book an appointment",
parameters={
"date": {"type": "string"},
"time": {"type": "string"},
"patient_name": {"type": "string"}
})
def book_appointment(args):
result = scheduling_api.book(args["date"], args["time"], args["patient_name"])
return f"Appointment confirmed for {args['patient_name']} on {args['date']} at {args['time']}"
Each step has its own prompt scope and tool scope. The model in the "identify" step cannot access scheduling tools. The model in the "schedule" step cannot access patient lookup. The model handles language; your code handles truth.
agent.add_step(
name="identify",
prompt="Ask for the caller's name and date of birth to look up their account.",
tools=["lookup_patient"],
transitions={"patient_found": "schedule", "not_found": "new_patient"}
)
agent.add_step(
name="schedule",
prompt="Help the caller book, reschedule, or cancel an appointment.",
tools=["check_availability", "book_appointment", "cancel_appointment"],
transitions={"billing_question": "transfer_front_desk", "done": "farewell"}
)
The AI kernel orchestrates speech recognition, language model inference, and speech synthesis from inside the media stack with direct access to the audio stream. The kernel eliminates the hops between the audio and the orchestration layer. 800-1200ms typical response latency vs. 2-4 seconds for bolt-on architectures.
Call state, conversation context, and step transitions live inside the platform. Your application does not reconstruct state from webhooks or manage distributed state across vendors.
The platform integrates with multiple LLM providers. Switch models without changing agent code. When models commoditize, your investment in agent architecture remains.
Modular tool packages for common tasks: datetime parsing, search integration, data lookups. Write your own skills for reusable business logic across agents.
pip install signalwire-agents. One package, no vendor chain to configure.
Set the prompt, add tools for your business logic, define conversation steps if needed.
Run the agent as a standard Python process. Call it from a browser or SIP client.
Container, VM, or serverless function. Your agent is a standard Python microservice. Connect a phone number and start taking calls.
Define multi-step call flows as configuration. The platform executes them. No server infrastructure required for standard patterns.
For applications that need to react to events outside the call (supervisor intervention, CRM triggers), a persistent bidirectional connection provides live control over active calls.
Transfer, inject prompts, and modify state on any active call via REST calls. Common operations without maintaining a persistent connection.
Yes. The platform is model-agnostic. You can use OpenAI, Anthropic, Google, or self-hosted models. Switch providers without changing your agent code.
AI processing: speech-to-text, language model inference, text-to-speech, and orchestration. Transport (PSTN, SIP) is billed separately at carrier rates. One invoice. No per-component billing for AI.
That stack requires your application to orchestrate audio between three services, manage state across all of them, and handle failures independently. SignalWire's AI kernel orchestrates all of those components from inside the media stack. Fewer hops, fewer failure modes, one vendor to call.
Yes. Declarative YAML flows and the Python SDK work on the same platform. Start with YAML for standard patterns, switch to Python when you need custom logic. Both approaches deploy the same way.
Trusted by
Install the Python SDK and build your first voice AI agent in an afternoon.