Contact Sales

All fields are required

Using SignalWire Datasphere to Build Specialized AI | SignalWire
Developers

Using SignalWire Datasphere to Build Specialized AI

Using a RAG API for specialized knowledge

Len Graham

SignalWire Datasphere is a Retrieval-Augmented Generation (RAG) API that lets AI applications answer from your own documents instead of inventing responses. This guide shows how a voice agent can use Datasphere to retrieve relevant chunks from a vectorized PDF, then respond with grounded, context-aware answers, using functions like get_vector_data for retrieval and send_message to text instructions or a recap after the call. The bartender demo is the example, but the practical pattern is building specialized AI agents that need reliable, source-based responses for procedures, policies, and large reference documents.

What is SignalWire Datasphere?

SignalWire’s Datasphere is a new Retrieval-Augmented Generation (RAG) API for handling large amounts of data and creating AI applications based on specialized knowledge. Using structured documents like PDFs, developers can create their own libraries of information for truly knowledgeable AI applications.

This custom-built AI stack is perfect for an AI bartender who needs quick access to thousands of drink recipes, ingredient combinations, and bar tips. In this post, we’ll walk through a JSON example of how to use Datasphere to create an application, with the resulting AI agent being an expert bartender.

This is a specialized agent pattern: a voice agent that can (1) interpret a caller’s intent in natural language, (2) retrieve the right facts from a controlled knowledge base using Retrieval-Augmented Generation (RAG), and (3) answer using only what was retrieved, instead of guessing. That same pattern applies to real workflows where accuracy matters, like troubleshooting runbooks, policy and compliance lookups, product documentation support, employee onboarding, eligibility checks, and internal operations questions. In those cases, the point is not that the agent can talk, it’s that it can reliably pull the right passage from your documents, summarize it for voice, and stay grounded in source material so you get consistent answers at scale.

Meet Kevin, the bartender

Meet Kevin, the digital employee designed to serve as your personal bartender. Kevin is equipped with smart functions and structured responses to enhance the user experience when callers need to make the perfect drink. Call Kevin at +1 (747) 337-4657 to see how the demo works in action!

SignalWire’s Datasphere takes large text documents, breaks them down into searchable chunks, and stores them as vectors. This allows Kevin to quickly and intelligently respond to user queries, such as "How do I make a Manhattan?" or "What can I do with gin and tonic water?"

By combining advanced vectorization techniques with customizable search algorithms, Datasphere ensures that Kevin can deliver accurate, context-driven responses every time, making your AI bartender an accurate and efficient resource when making drinks.

Key features

With customizable parameters like verbosity, temperature, and top-p, Kevin is designed to interact in a professional, reliable manner that mirrors human interactions, and results in an AI that is ideal for following strict processes and procedures.

By leveraging the get_vector_data function, Kevin ensures that every answer is grounded in accurate data—no strange AI hallucinations or making up drinks that don’t exist. Other features include:

Call recording: Kevin records conversations in high-quality stereo .wav format, ensuring that important details are captured for future reference.

Conversational abilities: Kevin engages in dynamic conversations, asking the right questions and providing accurate answers based on available data.

  • Customizable prompts: Kevin's responses follow a structured prompt system. He greets users, asks what drink they'd like to make, and provides clear instructions.

  • Post-prompt handling: After the conversation, Kevin summarizes the interaction and can send a message with the details to the user.

  • Language support: Kevin can communicate in multiple languages, powered by the OpenAI engine. In English, Kevin uses fillers like "one moment, please" to keep the conversation flowing naturally.

SWAIG Functions: Kevin is integrated with send_message and get_vector_data functions to deliver information and perform actions.

  • Send text messages: Kevin can send messages with drink instructions or summaries of the conversation directly to the user's phone.

  • Get vector data: The get_vector_data function is a powerful feature that allows Kevin to retrieve specific information from a vectorized PDF based on user queries. When a user asks a question, Kevin references the vectorized data and provides an accurate answer, ensuring he doesn't fabricate details. This function is especially useful for situations where AI needs to rely on predefined processes or detailed procedures.

Exploring the code snippet

This JSON block defines the get_vector_data function, which is designed to retrieve and use vectorized data from a specific document via a webhook call.

Function Name: get_vector_data

The function is responsible for fetching vectorized data based on a user’s question.

Fillers

These are placeholder texts or responses used by the system when this function is triggered. For example, "this is the get vector data function firing," will be spoken or displayed when the function starts running.

Data map

The data map is where the function sends a request to an external service (webhook) to retrieve data.

  • Webhook Configuration:
    • Method: The HTTP method used is POST, meaning the function will send data to the webhook.

    • URL: https://space_name.signalwire.... — the API endpoint to which the request is sent.

    • Headers: Metadata for the API call:
      • Content-Type: Specifies that the data sent in the body of the request is in JSON format.

      • Authorization: Uses a Basic authentication header. The Project_ID and API_KEY should be base64-encoded to create proper credentials for this API call.

  • Params: Parameters included in the POST request:
    • query_string: The user’s question that is passed in.

    • document_id: A specific ID (694ced7b-b656-417e-bc86-ce22549b4562) that identifies which document to search for the query.

    • count: Specifies the maximum number of results to return (in this case, 5).

  • Output:
    • Response: The function will extract specific data from the response of the webhook call. In this case, it returns the first chunk of text from the search results and its corresponding document_id.

    • Action: This field is an empty array, but you could use SignalWire Markup Language (SWML) here to send SMS, trigger a call, etc.

Purpose

This section describes the purpose of the function: to handle the question a user asks and retrieve relevant vectorized data from the specified document.

Argument

  • Type: Object — the function expects an object-type argument.

  • Properties:
    • user_question:
      • Type: string — the function accepts the user’s question as a string input.

      • Description: The user’s question.

{
 "function": "get_vector_data",
 "fillers": {
 "en-US": [
 "This is the get vector data function firing"
 ]
 },
 "data_map": {
 "webhooks": [
 {
 "method": "POST",
 "url": "https://space_name.signalwire.com/api/datasphere/documents/search",
 "headers": {
 "Content-Type": "application/json",
 "Authorization": "Basic OGVhMjI0YzktM--USE--Project_ID:API_KEY--TO-BASE64-ENCODE--NkYjFh"
 },
 "params": {
 "query_string": "${args.user_question}",
 "document_id": "694ced7b-b656-417e-bc86-ce22549b4562",
 "count": 1
 },
 "output": {
 "response": "Use this information to answer the user's query, only provide answers from this information and do not make up anything: ${chunks[0].text} and ${chunks[0].document_id}",
 "action": []
 }
 }
 ]
 },
 "purpose": "The question the user will ask",
 "argument": {
 "properties": {
 "user_question": {
 "type": "string",
 "description": "The question the user will ask."
 }
 },
 "type": "object"
 }
}

This is a fun example of how to create personalized recommendations that help users explore new information. As AI continues to evolve, the ability to access, manage, and retrieve accurate information is becoming more critical than ever. SignalWire’s Datasphere API provides the perfect foundation for smart applications like Kevin the AI Bartender to deliver fast, accurate responses and offer a truly interactive experience.

The technology behind Kevin showcases how innovative tools like Datasphere can reshape customer experiences, making them more intuitive and enjoyable for end-users. Start building your own AI voice agents today and experiment with the Datasphere API by signing up for a SignalWire Space and bringing questions to our community Discord!

Frequently asked questions

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a pattern where an AI response is generated using retrieved passages from a curated knowledge base, so the output is grounded in source material instead of relying only on a model’s general training.

How does a vector database help an AI agent answer accurately from documents?

A vector database stores document content as embeddings, which makes semantic retrieval possible. When a user asks a question, the system searches for the most relevant chunks and returns them to the agent so the answer stays tied to the actual document content.

What is chunking, and why does chunk size affect answer quality?

Chunking is how a document is split into smaller sections for indexing and retrieval. Smaller chunks can improve precision, while larger chunks can preserve context, and the best choice depends on the structure of the source documents and the kinds of questions users ask.

How do you prevent hallucinations when an AI agent answers technical or procedural questions?

Use a knowledge base and retrieval step, then constrain the agent to answer only from retrieved passages. When retrieval is the source of truth, the agent can refuse questions that are outside the indexed material instead of guessing.

How can a voice agent send a follow-up summary or instructions after a call?

A common pattern is to generate a short post-call summary and send it via SMS, for example instructions, a checklist, or the key steps discussed. In this demo, the agent includes a messaging function for follow-ups.

Related Articles