Contact Sales

All fields are required

Testable Voice Agents: Version Control for AI | SignalWire
DevOps for Voice AI

Your Voice Agent Deserves a Test Suite

Every prompt tweak is a live experiment on real callers. Declarative agents are state machines you can test, diff, version, and deploy through CI/CD.

git diff
to see what changed
git revert
to roll back
pytest
to verify each step
git log
to audit every change
The Problem

Prompt Blobs Are Not Software

You Cannot Unit Test a Prompt

A 2,000-word prompt blob has no defined interface, no isolation boundary, and no deterministic behavior. Every deployment is a bet.

Diffing Prose Tells You Nothing

Someone changed the prompt. The diff shows 47 lines of modified prose. What behavior changed? Nobody knows until a caller reports it.

Rollback Means Finding an Old Prompt

The agent broke after the last update. The old version is somewhere in a dashboard, a Slack message, or a Google Doc. Good luck.

Auditing Means Guessing

A regulated industry auditor asks: which instructions were active during this call? With a monolithic prompt, the answer is all of them. Or none. Depends on the model's mood.

Build a Voice AI Agent

from signalwire_agents import AgentBase
from signalwire_agents.core.function_result import SwaigFunctionResult

class SupportAgent(AgentBase):
    def __init__(self):
        super().__init__(name="Support Agent", route="/support")
        self.prompt_add_section("Instructions",
            body="You are a customer support agent. "
                 "Greet the caller and resolve their issue.")
        self.add_language("English", "en-US", "rime.spore:mistv2")

    @AgentBase.tool(name="check_order")
    def check_order(self, order_id: str):
        """Check the status of a customer order.

        Args:
            order_id: The order ID to look up
        """
        return SwaigFunctionResult(f"Order {order_id}: shipped, ETA April 2nd")

agent = SupportAgent()
agent.run()

Prompt and Pray vs. Testable Agents

Prompt and Pray

  • Cannot unit test a probabilistic prompt blob
  • Diffing 2,000 words of prose to find what changed
  • No defined interface to regression test against
  • Rolling back means finding an old prompt in a dashboard
  • Auditing means guessing which instruction applied

Declarative Agents

  • Each step is an isolated unit with defined inputs and outputs
  • Structured diffs show exactly what changed
  • Step contracts define expected tool calls and transitions
  • Rollback with git revert
  • Audit with git log; every step and transition is recorded

Capability Comparison

CapabilityPrompt BlobDeclarative Agent
Version controlBlob in a dashboardStructured artifact in git
Diff between versionsManual prose comparisongit diff
Code reviewRead a 2,000-word promptReview a step change
Unit testingNot possible (probabilistic)Step-level isolation
Integration testingManual QA callsAutomated conversation simulation
Regression testingHope nothing brokeCI/CD pipeline on every push
RollbackFind the old prompt somewheregit revert
Audit trailWhat was the prompt on March 3rd?git log
A/B testingTwo prompt blobs, no metricsTwo versioned configs, metrics per version

Meaningful Diffs in Pull Requests

The change is clear: a new tool was added to the authenticated step. Reviewers evaluate whether the tool belongs in this step and whether transitions still make sense.

diff
- name: authenticated
   prompt: "Help the customer with their account."
-  tools: [check_balance, update_address]
+  tools: [check_balance, update_address, schedule_service]
   transitions:
     farewell: "Issue resolved"

Ship Voice AI Like Software

1

Define your agent as config

Write agent definitions in YAML or generate them from the Python SDK. Both produce structured, versionable artifacts.

2

Write tests for each step

Test tool availability, transition rules, and conversation flows. Each step is an isolated unit with deterministic boundaries.

3

Review in pull requests

Agent changes go through code review. Structured diffs make it clear what behavior changed and why.

4

Deploy through CI/CD

Validate, test, and deploy through your existing pipeline. No manual dashboard updates. No copy-paste into a web form.

💡
Each step has explicit inputs, tools, and transitions. You are not testing whether the AI does the right thing in all scenarios. You are testing whether this step calls the right tool and transitions to the right next step. That is a tractable testing problem.

FAQ

Can I use my existing CI/CD pipeline?

Yes. Agent definitions are files in your repository. They go through the same pipeline as application code: commit, review, test, merge, deploy.

What about the probabilistic nature of LLMs?

You test the structure, not the prose. Each step has defined tools and transitions. The model handles natural language within bounded constraints that your tests verify.

How do I A/B test agent versions?

Deploy two versioned configurations. Route a percentage of calls to each version. Compare metrics per version: resolution rate, handle time, transfer rate, customer satisfaction.

What is the difference between YAML and SDK testing?

YAML agents validate schema and structure. SDK agents add Python testing tools: IDE support, debuggers, mock frameworks. Both produce testable, diffable artifacts.

Trusted by 2,000+ companies

Ship Voice AI With Confidence

Version control, CI/CD, and automated testing for every AI agent you deploy.