If We Build It, Tech Will Come

How spending decades preparing for a culmination of tech is rewarding.


Anthony Minessale

When I was a kid, we had "Speak and Spell", and later the first Apple computers that could read back anything you typed. Everyone loved making it say funny things, I know I did. The best AI was the ghosts in PAC MAN.

When I was in my post-high-school years, Creative Labs had hardware-assisted applications that could speak, and the Internet was nascent but available at some tiny number of kbps. By the time we started FreeSWITCH back in the mid-2000's the Internet was starting to get fast enough to carry voice traffic reliably. CMU released its speech recognizer as open source. We wrote several hundred thousand lines of code to build a framework that could import arbitrary text-to-speech and speech-to-text technologies into a phone call paradigm. All this code allowed others to write scripts that could interact with the real world. For years we had the infamous Original FreeSWITCH Pizza Demo that could be written in javascript.

At that point, people would use a collection of sound files to simulate the canned questions and responses because it sounded better than the early TTS voices. So we wrote even more code to try to logically string words together to produce the classic "you have" … "three" … "messages" effect. By this time, the Internet was really ramping up.

By the time we started SignalWire, Amazon, Google, and many others started making much more fancy ASR and TTS interfaces that still fit into some of those original paradigms. Google DialogFlow emerged and tried to make conversational sense out of some of these paradigms. Our mission was and still is to hyper-scale FreeSWITCH the same way, something like how YouTube hyper-scales streaming media or AWS scales many useful services.

We made a native interface to DialogFlow early on, but the evolution of this technology was somewhat slow and awkward. Having a conversation with it was possible, and our interface allowed you to have a voice conversation with it, but having it do things at the end of the conversation was limited. It took like a year for them to come up with a way to even transfer a call.

This is the moment. I started to imagine SWML SignalWire Markup Language. It started out as "What if this Dialog Flow thing could return a blob of JSON that described a collection of reactions and behaviors that encompassed all the things we can do?" With this idea, it would be trivial for it to hand off some data like, call these three sip URLs and this one public phone number at the same time (or sequentially) and connect the caller to the first one who answers, maybe play a particular sound before or after connection the call. Etc.

Today, we have achieved SWML as the first part of the incremental release of our mission to build modern tools to access all of this technology.

Ironically, now that SWML exists, the concept of using it in DiaglogFlow has expired, and instead, it stands on its own as a way to express nearly anything you might want to do at scale. It facilitated a new Visual Call Flow Builder to create any call forwarding or voice application with minor effort.

Most important, it created a window to expose new technology faster. We can quickly add more behaviors and useful tools to this new paradigm. Soon all of this will be combined with the ability to register users and direct calls to private accounts and facilitate on-demand UCaaS infrastructure.

One of the new technologies in everyone's mind today is Generative Artificial Intelligence or Generative AI. Nearly every year, there is some kind of hype about one thing or the other, especially in the Investment world. I think this time last year; it was web3. Some other year it was security or NFT or whatever.

This current time is the intersection of the Internet, Compute, AI, and Telecommunications we've all been hoping for here at SignalWire. All those decades of preparation has given us a clear path to being able to transform communications to a digital experience that can scale and be modified with little effort, replacing things that used to take us months of coding to accomplish, and the irony is all our coding is actually the foundation of how we can bring all this technology to the surface for everyone to use.

When we saw the possibilities that this technology opened up, it was evident that we could harness it and bring many of our ideas full circle. It's fun to talk to the AI and have it write a funny poem about Anakin Skywalker, but what could we use it for in the modernizing communications paradigms business?

The obvious answer was to simplify all the complexity of building automated attendants and virtual agents. This is something we have been working on since the beginning in the first place.

With SWML and our scaled infrastructure, it was possible to create our newest product, SignalWire AI Agents. It allowed us to reduce a complex voice menu written in large amounts of code down to some carefully crafted prompts. And these new voice applications could understand nearly anything you said, any way you said it.

The most important thing is that we wanted to make sure these agents could interact with the backend of our customers and perform all the power of SWML so it could play files, transfer calls, authenticate the user without exposing any information to the AI engine etc. This is why SWML allowed also to create the SignalWre AI Gateway or SWAIG. Calling a bot no longer has to be an annoying and frustrating experience.

I talked a bit about this in one of my Recent Blogs

With ClueCon coming up in August, it will be a great chance for everyone to play with this technology and compare the ease of building applications with some of our new advancements.

If you want to try out our Agents, call us, we have one answering the phone for us. +1.650.382.0000