Search Knowledge

View as Markdown

Search & Knowledge

Add RAG-style knowledge search to your agents using local vector indexes (.swsearch files) or PostgreSQL with pgvector. Build indexes with sw-search CLI and integrate using the native_vector_search skill.

Knowledge search transforms your agent from a general-purpose assistant into a domain expert. By connecting your agent to documents—FAQs, product manuals, policies, API docs—it can answer questions based on your actual content rather than general knowledge.

This is called RAG (Retrieval-Augmented Generation): when asked a question, the agent first retrieves relevant documents, then uses them to generate an accurate response. The result is more accurate, verifiable answers grounded in your authoritative sources.

Good use cases:

  • Customer support with FAQ/knowledge base
  • Product information lookup
  • Policy and procedure questions
  • API documentation assistant
  • Internal knowledge management
  • Training and onboarding assistants

Not ideal for:

  • Real-time data (use APIs instead)
  • Transactional operations (use SWAIG functions)
  • Content that changes very frequently
  • Highly personalized information (use database lookups)

Search System Overview

Build Time:

Documents → sw-search CLI → .swsearch file (SQLite + vectors)

Runtime:

Agent → native_vector_search skill → SearchEngine → Results

Backends:

BackendDescription
SQLite.swsearch files - Local, portable, no infrastructure
pgvectorPostgreSQL extension for production deployments
RemoteNetwork mode for centralized search servers

Building Search Indexes

Use the sw-search CLI to create search indexes:

$## Basic usage - index a directory
$sw-search ./docs --output knowledge.swsearch
$
$## Multiple directories
$sw-search ./docs ./examples --file-types md,txt,py
$
$## Specific files
$sw-search README.md ./docs/guide.md
$
$## Mixed sources
$sw-search ./docs README.md ./examples --file-types md,txt

Chunking Strategies

StrategyBest ForParameters
sentenceGeneral text--max-sentences-per-chunk 5
paragraphStructured docs(default)
slidingDense text--chunk-size 100 --overlap-size 20
pagePDFs(uses page boundaries)
markdownDocumentation(header-aware, code detection)
semanticTopic clustering--semantic-threshold 0.6
topicLong documents--topic-threshold 0.2
qaQ&A applications(optimized for questions)
$sw-search ./docs \
> --chunking-strategy markdown \
> --file-types md \
> --output docs.swsearch

This strategy:

  • Chunks at header boundaries
  • Detects code blocks and extracts language
  • Adds “code” tags to chunks containing code
  • Preserves section hierarchy in metadata

Sentence Chunking

$sw-search ./docs \
> --chunking-strategy sentence \
> --max-sentences-per-chunk 10 \
> --output knowledge.swsearch

Installing Search Dependencies

$## Query-only (smallest footprint)
$pip install signalwire-agents[search-queryonly]
$
$## Build indexes + vector search
$pip install signalwire-agents[search]
$
$## Full features (PDF, DOCX processing)
$pip install signalwire-agents[search-full]
$
$## All features including NLP
$pip install signalwire-agents[search-all]
$
$## PostgreSQL pgvector support
$pip install signalwire-agents[pgvector]

Using Search in Agents

Add the native_vector_search skill to enable search:

1from signalwire_agents import AgentBase
2
3
4class KnowledgeAgent(AgentBase):
5 def __init__(self):
6 super().__init__(name="knowledge-agent")
7 self.add_language("English", "en-US", "rime.spore")
8
9 self.prompt_add_section(
10 "Role",
11 "You are a helpful assistant with access to company documentation. "
12 "Use the search_documents function to find relevant information."
13 )
14
15 # Add search skill with local index
16 self.add_skill(
17 "native_vector_search",
18 index_file="./knowledge.swsearch",
19 count=5, # Number of results
20 tool_name="search_documents",
21 tool_description="Search the company documentation"
22 )
23
24
25if __name__ == "__main__":
26 agent = KnowledgeAgent()
27 agent.run()

Skill Configuration Options

1self.add_skill(
2 "native_vector_search",
3 # Index source (choose one)
4 index_file="./knowledge.swsearch", # Local SQLite index
5 # OR
6 # remote_url="http://search-server:8001", # Remote search server
7 # index_name="default",
8
9 # Search parameters
10 count=5, # Results to return (1-20)
11 similarity_threshold=0.0, # Min score (0.0-1.0)
12 tags=["docs", "api"], # Filter by tags
13
14 # Tool configuration
15 tool_name="search_knowledge",
16 tool_description="Search the knowledge base for information"
17)

pgvector Backend

For production deployments, use PostgreSQL with pgvector:

1self.add_skill(
2 "native_vector_search",
3 backend="pgvector",
4 connection_string="postgresql://user:pass@localhost/db",
5 collection_name="knowledge_base",
6 count=5,
7 tool_name="search_docs"
8)

Search Flow

Search Flow.
Search Flow

CLI Commands

Build Index

$## Basic build
$sw-search ./docs --output knowledge.swsearch
$
$## With specific file types
$sw-search ./docs --file-types md,txt,rst --output knowledge.swsearch
$
$## With chunking strategy
$sw-search ./docs --chunking-strategy markdown --output knowledge.swsearch
$
$## With tags
$sw-search ./docs --tags documentation,api --output knowledge.swsearch

Validate Index

$sw-search validate knowledge.swsearch

Search Index

$sw-search search knowledge.swsearch "how do I configure auth"

Complete Example

1#!/usr/bin/env python3
2## documentation_agent.py - Agent that searches documentation
3from signalwire_agents import AgentBase
4from signalwire_agents.core.function_result import SwaigFunctionResult
5
6
7class DocumentationAgent(AgentBase):
8 """Agent that searches documentation to answer questions"""
9
10 def __init__(self):
11 super().__init__(name="docs-agent")
12 self.add_language("English", "en-US", "rime.spore")
13
14 self.prompt_add_section(
15 "Role",
16 "You are a documentation assistant. When users ask questions, "
17 "search the documentation to find accurate answers. Always cite "
18 "the source document when providing information."
19 )
20
21 self.prompt_add_section(
22 "Instructions",
23 """
24 1. When asked a question, use search_docs to find relevant information
25 2. Review the search results carefully
26 3. Synthesize an answer from the results
27 4. Mention which document the information came from
28 5. If nothing relevant is found, say so honestly
29 """
30 )
31
32 # Add a simple search function for demonstration
33 # In production, use native_vector_search skill with a .swsearch index:
34 # self.add_skill("native_vector_search", index_file="./docs.swsearch")
35 self.define_tool(
36 name="search_docs",
37 description="Search the documentation for information",
38 parameters={
39 "type": "object",
40 "properties": {
41 "query": {"type": "string", "description": "Search query"}
42 },
43 "required": ["query"]
44 },
45 handler=self.search_docs
46 )
47
48 def search_docs(self, args, raw_data):
49 """Stub search function for demonstration"""
50 query = args.get("query", "")
51 return SwaigFunctionResult(
52 f"Search results for '{query}': This is a demonstration. "
53 "In production, use native_vector_search skill with a .swsearch index file."
54 )
55
56
57if __name__ == "__main__":
58 agent = DocumentationAgent()
59 agent.run()

This example uses a stub function for demonstration. In production, use the native_vector_search skill with a .swsearch index file built using sw-search.

Multiple Knowledge Bases

Add multiple search instances for different topics:

1## Product documentation
2self.add_skill(
3 "native_vector_search",
4 index_file="./products.swsearch",
5 tool_name="search_products",
6 tool_description="Search product catalog and specifications"
7)
8
9## Support articles
10self.add_skill(
11 "native_vector_search",
12 index_file="./support.swsearch",
13 tool_name="search_support",
14 tool_description="Search support articles and troubleshooting guides"
15)
16
17## API documentation
18self.add_skill(
19 "native_vector_search",
20 index_file="./api-docs.swsearch",
21 tool_name="search_api",
22 tool_description="Search API reference documentation"
23)

Understanding Embeddings

Search works by converting text into numerical vectors (embeddings) that capture semantic meaning. Similar concepts have similar vectors, enabling “meaning-based” search rather than just keyword matching.

How it works:

  1. At index time: Each document chunk is converted to a vector and stored
  2. At query time: The search query is converted to a vector
  3. Matching: Chunks with vectors closest to the query vector are returned

This means “return policy” will match documents about “refund process” or “merchandise exchange” even if they don’t contain those exact words.

Embedding quality matters:

  • Better embeddings = better search results
  • The SDK uses efficient embedding models optimized for search
  • Different chunking strategies affect how well content is embedded

Index Management

When to Rebuild Indexes

Rebuild your search index when:

  • Source documents are added, removed, or significantly changed
  • You change chunking strategy
  • You want to add or modify tags
  • Search quality degrades

Rebuilding is fast for small document sets. For large collections, consider incremental updates.

Keeping Indexes Updated

For production systems, automate index rebuilding:

$#!/bin/bash
$# rebuild_index.sh - Run on document updates
$
$sw-search ./docs \
> --chunking-strategy markdown \
> --output knowledge.swsearch.new
$
$# Atomic replacement
$mv knowledge.swsearch.new knowledge.swsearch
$
$echo "Index rebuilt at $(date)"

Index Size and Performance

Index size depends on:

  • Number of documents
  • Chunking strategy (more chunks = larger index)
  • Embedding dimensions

Rough sizing:

  • 100 documents (~50KB each) → ~10-20MB index
  • 1,000 documents → ~100-200MB index
  • 10,000+ documents → Consider pgvector for better performance

Query Optimization

Help the AI use search effectively by being specific in your prompt:

1self.prompt_add_section(
2 "Search Instructions",
3 """
4 When users ask questions:
5 1. First search the documentation using search_docs
6 2. Review all results before answering
7 3. Cite which document your answer came from
8 4. If results aren't relevant, try a different search query
9 5. If no results help, acknowledge you couldn't find the answer
10 """
11)

Tuning Search Parameters

Adjust these parameters based on your content and use case:

count: Number of results to return

  • count=3: Focused answers, faster response
  • count=5: Good balance (default)
  • count=10: More comprehensive, but may include less relevant results

similarity_threshold: Minimum relevance score (0.0 to 1.0)

  • 0.0: Return all results regardless of relevance
  • 0.3: Filter out clearly irrelevant results
  • 0.5+: Only high-confidence matches (may miss relevant content)

tags: Filter by document categories

1self.add_skill(
2 "native_vector_search",
3 index_file="./knowledge.swsearch",
4 tags=["policies", "returns"], # Only search these categories
5 tool_name="search_policies"
6)

Handling Poor Search Results

If search quality is low:

  1. Check chunking: Are chunks too large or too small?
  2. Review content: Is the source content well-written and searchable?
  3. Try different strategies: Markdown chunking for docs, sentence for prose
  4. Add metadata: Tags help filter irrelevant content
  5. Tune threshold: Too high filters good results, too low adds noise

Troubleshooting

”No results found”

  • Check that the index file exists and is readable
  • Verify the query is meaningful (not too short or generic)
  • Lower similarity_threshold if set too high
  • Ensure documents were actually indexed (check with sw-search validate)

Poor result relevance

  • Try different chunking strategies
  • Increase count to see more results
  • Review source documents for quality
  • Consider adding tags to filter by category

Slow search performance

  • For large indexes, use pgvector instead of SQLite
  • Reduce count if you don’t need many results
  • Consider a remote search server for shared access

Index file issues

  • Validate with sw-search validate knowledge.swsearch
  • Rebuild if corrupted
  • Check file permissions

Search Best Practices

Index Building

  • Use markdown chunking for documentation
  • Keep chunks reasonably sized (5-10 sentences)
  • Add meaningful tags for filtering
  • Rebuild indexes when source docs change
  • Test search quality after building
  • Version your indexes with your documentation

Agent Configuration

  • Set count=3-5 for most use cases
  • Use similarity_threshold to filter noise
  • Give descriptive tool_name and tool_description
  • Tell AI when/how to use search in the prompt
  • Handle “no results” gracefully in your prompt

Production

  • Use pgvector for high-volume deployments
  • Consider remote search server for shared indexes
  • Monitor search latency and result quality
  • Automate index rebuilding when docs change
  • Log search queries to understand user needs