Contact Sales

All fields are required

How a Voice AI Agent Powers Real-Time Browser Control… | SignalWire
Product

How a Voice AI Agent Powers Real-Time Browser Control with SignalWire

Buid AI-controlled web UIs

At ClueCon 2025, Anthony Minessale, creator of FreeSWITCH and CEO of SignalWire, took the stage during the annual Dangerous Demos competition. What he showed was a virtual blackjack game controlled entirely by voice, built in under 700 lines of Python, and synced in real time with a web interface.

This demo was more than just a blackjack game. It revealed a new development model for voice AI agents, one where browser UI, media handling, conversation flow, and backend logic are all controlled by a stateless AI agent running inside a single, tightly integrated system.

This is a practical illustration of how SignalWire’s Programmable Unified Communications (PUC) stack allows developers to control state, manage logic, and synchronize front-end interfaces all without handing control over to the language model itself.


Let’s break down how it worked.

What’s the problem with traditional voice AI agents?

Voice AI agents typically suffer from one or more of the following issues:

  • State drift when the AI is responsible for remembering everything

  • Latency caused by patching together third-party APIs, leading to multiple network hops

  • Uncoordinated UI, where if there are visuals they fall out of sync with voice

  • Fragile architecture, where the developer manages telephony, AI, and frontend separately

Most voice AI agent solutions offload the entire experience to the LLM. That includes memory, state, flow control, and sometimes even UI logic. The result is that once the model loses context or misinterprets a command, the entire session goes sideways.

SignalWire solves this with a vertically integrated stack that runs media, AI orchestration, state storage, and browser interaction under one roof. And instead of making the AI responsible for everything, it’s treated like a tool inside a tightly controlled orchestration layer. The core logic lives in infrastructure, not prompts.

How the blackjack demo works

State management outside the LLM

All game state (chip counts, card totals, win/loss outcomes) is maintained in the application layer running on SignalWire. The AI doesn’t “remember” anything. At each step, the application passes updated variables into the prompt using variable expansion.

Instead of sending an entire history, the prompt is rewritten dynamically, so the AI only sees what it needs. This design keeps the voice AI agent stateless and predictable, while the platform enforces the actual game rules, using the global_data param to carry state between steps.

At each phase:

  • State is injected into the prompt (“You have 17 points. Hit or stand?”)

  • The AI reacts to the current step only

  • The system rewrites the prompt, not the whole history

This allows structured, stateful interactions with a stateless model, a best-practice approach for LLM orchestration in real-time applications.

Browser synchronization via events

Every game step is rendered in a live web app running in the browser. When the user places a bet or takes an action, a SignalWire event updates the visual interface.

The voice AI agent and the DOM are always in sync, meaning the AI says "you bust," and the browser shows the hand at the same time.

This works because the same AI agent that handles voice also controls the browser. Using the swml_user_event() function, it sends structured UI events to the frontend:

  • Show cards dealt

  • Update chip counts

  • Animate player hits

  • Reveal dealer hand

  • Reset the table

This allows the voice and browser UI to stay fully synchronized, with no external state store or polling layer.

This synchronization pattern is not limited to games. It’s a generalized mechanism for AI-controlled web UIs, where voice AI agents can manipulate any visual component on a page in real time.

Single-service architecture with webRTC

The entire application, including web UI, API, AI logic, and video calling, runs inside a single Python process using the SignalWire Agents SDK. This is a full-stack AI voice app served from one port, one process, one codebase.

A webRTC video call is integrated with gameplay, offering an immersive dealer interaction. Static assets (cards, videos, audio) and API live at the same domain and port. Protected endpoints use basic auth, while the public frontend is open for play.

Structured step system (state machine)

Each stage of the interaction, from betting, to dealing, to resolving, is expressed as a discrete step in the AI prompt lifecycle. Only the current step is sent to the AI, which keeps the model focused on one task at a time and prevents hallucination or flow drift.

It also mimics a state machine, while still allowing for flexible user input.

For example, a user can say “all in” instead of “I bet 900,” and the system interprets it correctly based on the current step’s logic.

The game flow is managed through explicit steps.Each step defines:

  • What functions the AI can call (hit, stand, place_bet)

  • What prompt bullets to show

  • What the valid next transitions are

This prevents hallucination and ensures that the AI follows the game rules while still understanding varied human language inputs.

Was this really built in an hour?

Anthony built the full demo in an hour using ~700 lines of Python. This was possible because with SignalWire, there’s no need to manually wire together telephony, audio streaming, web sockets, or middleware services. The telecom stack, AI orchestration, and real-time UI sync are part of SignalWire’s implementation of the PUC model.

A blueprint for AI voice agent apps with UI control

This game demo is actually a fun blueprint for how modern AI voice applications should behave.

Here’s what type of applications this architecture unlocks:

  • Virtual shopping assistants that show products live on a website as users speak

  • Healthcare AI agents that can see patient conditions via WebRTC and guide diagnostics

  • Support agents that view diagnostic screens, control web-based tools, and perform actions during the call

  • Scheduling bots that coordinate calendars in the browser while keeping the user engaged on voice

Start building stateful, synchronized, and scalable voice AI agents

The blackjack demo shows what happens when you don’t force AI to do the job of infrastructure. SignalWire lets you maintain application state inside the platform, synchronize voice with visual UI in real time, and enforce flows while keeping conversations natural.

This is what it means to build with Programmable Unified Communications: better performance, tighter control, and dramatically lower latency.

Ready to create fully working apps in just hours? To try it yourself, sign up for a free account or join us on Discord to get access to our community of likeminded developers.

Related Articles