Adam Kalsey

How to Design Voice AI Agents for the Unexpected

Your voice AI can perform at or above human accuracy most of the time, but a single awkward loop, nonsensical response, or missed edge case will define its reputation. This is how voice AI success is measured: not by how often the system gets things right, but how it behaves when something goes wrong.

When Taco Bell ended its voice AI drive-through pilot, it was due to service quality, not mistakes like incorrect orders or mispronunciations. The main issues were edge cases: loops of repeated questions, inability to handle odd requests, and system exploits by trolls.

Just like in many other cases, the problem seems like a loss of state or context. For example, the drive-through bot repeatedly asked for a drink order when one was already selected.

Taco Bell notably didn’t indicate that the AI had lower accuracy than human order takers.

In other projects, AI agents perform similarly or better than humans overall. Across hundreds of interactions, the AI will succeed at least as often as a human.

The difference lies in when something goes wrong. When a human encounters an edge case, they might not always perform well, but they rarely fail in the same sloppy, hilarious fashion that the Taco Bell AI examples show. A human might fail to get an edge case right, but the failure will at least look human.

The lesson for AI implementations is to focus on those edge cases. If the AI falls apart when faced with the unexpected, then no one will remember that it succeeded more often than humans.

You can’t just aim for high overall accuracy. You must anticipate the edge cases that will define your users' experience.

How to identify AI voice edge cases

To find the edge cases, think about your assumptions, and find ways to break them. What if:

Someone orders zero items, or a million?
They change their mind mid-sentence?
They ask to cancel something they never ordered?
Someone speaks with a heavy accent, there’s background noise, or interruptions?

These scenarios will expose where your AI gets confused or stuck.

Monitor your real usage patterns and look for the interactions that didn't go as planned. These failed conversations are your edge cases in the wild, and they're more valuable than any theoretical scenario you can imagine.

The customer who somehow ended up with seventeen orders of the same item, the conversation that looped for three minutes, the person who got charged for something they never ordered—these are the moments that will define your AI's reputation.

Fortunately, fixing these AI edge cases isn’t that much different than what software designers do today with non-AI systems.

The real problem: General purpose AI in specific use cases

We don’t have insider details on how Taco Bell built its voice agent, but the outcome is familiar enough to guess what happened: many voice AI pilots are deploying a general-purpose AI agent instead of one purpose-built for their needs. They’re trying to control the agent exclusively through a system prompt, so the non-deterministic nature of LLMs will sometimes cause chaos.

Design for the edge case

People implementing Voice AI need to plan for these edge cases. AI is amazing, but it’s not magic. Your voice AI needs guardrails to keep the conversation in line with your expectations. Identifying the edge cases is the key to knowing what guardrails to implement.

AI needs training, it needs validation, and it needs to know how to follow business rules. You can’t put an out-of-the-box voice agent in production without customizing how it behaves. And the customization is deeper than you’d expect.

Avoiding business logic failures

Your AI system needs to validate inputs and results just as rigorously as any traditional software system. Without proper business logic validation, edge cases will expose your AI's willingness to agree to impossible requests.

Consider a restaurant reservation system that happily books tables at 3 AM when the restaurant is closed. The AI understood the request perfectly and processed it flawlessly, but it ignored basic business constraints that any human would recognize.

Order-taking systems face similar challenges. An AI might accept impossible combinations like "chicken bacon milkshake" or "salad with extra ranch, hold the dressing." These requests seem absurd to humans, but an AI without proper guardrails might try to accommodate them, creating confusion in the kitchen and frustration for customers.

Size limits matter too. Your voice AI should prevent unreasonable orders, whether they're misunderstandings ("I'll have a thousand tacos") or malicious (someone trying to crash your system with massive orders).

Similarly, product boundaries need enforcement. A Taco Bell AI agent shouldn't let customers order Big Macs, no matter how politely they ask.

Trying to solve these problems with prompts will lead to failure. For one thing, LLMs are designed to provide answers and will often answer even if you ask them not to. But even when you avoid that problem, your instructions can be lost when the LLM loses context.

Context and state management

One common failure is losing track of context. For example, an AI agent may repeatedly ask for a drink order even after the customer has already chosen one. These loops happen when the system doesn’t properly maintain state across turns in the conversation.

If you’re storing state in the LLM, you’re at the mercy of the context window. The longer a conversation goes on, the more likely the LLM will forget what’s happened previously.

An LLM doesn’t work like a human brain; even the most recent parts of a conversation can fall out of context, looking like the AI has short-term memory loss. When you lose state, you frustrate customers.

Combine AI with traditional programming

The solution is to design for the edge case. After transcribing the audio input:

Clean it before sending it to the LLM.
Scrub it of common issues.
Evaluate it to see if the request fits your desired inputs.

You don’t blindly trust user input in your applications, so don’t trust it in your AI application either.

Don’t rely solely on LLM context for state management. Explicitly maintain state in your application and frequently wipe the LLM’s memory and provide it with your own by passing new contexts in your prompt.

For example, this blackjack game demo works by using a state machine and updating key variables while the game goes on.

The state machine controls the flow through the application. If you’ve gone over 21, the state machine won’t allow you to hit again. If you’re out of chips, the state machine prevents a new hand from starting. Basically, it enforces rules and prevents the system from making impossible moves.

You can use this same concept to manage your own application logic. Already asked the customer if they want anything to drink? Advance them past that gate in the state machine so the AI doesn’t ask again.

The path forward: Design voice AI for the unexpected

If you want deterministic results from your AI voice agent, you must combine AI capabilities with traditional programming approaches. AI excels at understanding natural language and generating appropriate responses, but it struggles with consistency and business rule enforcement.

The solution is to use it strategically. Let the AI handle what it does best: parsing natural language, understanding intent, and generating conversational responses. But rely on traditional programming for state management, input validation, access control, and business logic enforcement.

This hybrid approach gives you the best of both worlds: the flexibility and natural interaction that AI provides, combined with the reliability and predictability that your business requires. Your customers get the smooth, conversational experience they want, while your business gets the consistent, secure results it needs.

Success with AI voice agents isn’t about deploying the most advanced AI models. It’s about thoughtfully combining AI with solid engineering practices. When your edge cases are handled gracefully, no one will remember that your AI wasn't perfect. They'll just remember that it worked.

Excited about the future of voice AI? Join our community of like-minded developers and experts on the SignalWire Discord server.

Edge Cases Define Your Voice AI Success

How to Design Voice AI Agents for the Unexpected

How to identify AI voice edge cases

The real problem: General purpose AI in specific use cases

Design for the edge case

Avoiding business logic failures

Context and state management

Combine AI with traditional programming

The path forward: Design voice AI for the unexpected

Related Articles

Secure Payment Processing with SignalWire AI Agents

Introducing the SignalWire AI Agents SDK

Building Context-Aware Call Flows with AI Agents