Search and Knowledge

Knowledge search transforms your agent from a general-purpose assistant into a domain expert. By connecting your agent to documents — FAQs, product manuals, policies, API docs — it can answer questions based on your actual content rather than general knowledge.

This is called RAG (Retrieval-Augmented Generation): when asked a question, the agent first retrieves relevant documents, then uses them to generate an accurate response. The result is more accurate, verifiable answers grounded in your authoritative sources.

When to Use Knowledge Search

Good use cases:

Customer support with FAQ/knowledge base
Product information lookup
Policy and procedure questions
API documentation assistant
Internal knowledge management
Training and onboarding assistants

Not ideal for:

Real-time data (use APIs instead)
Transactional operations (use SWAIG functions)
Content that changes very frequently
Highly personalized information (use database lookups)

Search System Overview

Build Time:

Documents -> sw-search CLI -> .swsearch file (SQLite + vectors)

Runtime:

Agent -> native_vector_search skill -> SearchEngine -> Results

Backends:

Backend	Description
SQLite	`.swsearch` files - Local, portable, no infrastructure
pgvector	PostgreSQL extension for production deployments
Remote	Network mode for centralized search servers

Diagram showing the search flow from agent query through vector search to document results. — Search flow from query to results.

Building Search Indexes

Use the sw-search CLI to create search indexes:

$ ## Basic usage - index a directory
$ sw-search ./docs --output knowledge.swsearch
$ 
$ ## Multiple directories
$ sw-search ./docs ./examples --file-types md,txt,py
$ 
$ ## Specific files
$ sw-search README.md ./docs/guide.md
$ 
$ ## Mixed sources
$ sw-search ./docs README.md ./examples --file-types md,txt

Chunking Strategies

Strategy	Best For	Parameters
`sentence`	General text	`--max-sentences-per-chunk 5`
`paragraph`	Structured docs	(default)
`sliding`	Dense text	`--chunk-size 100 --overlap-size 20`
`page`	PDFs	(uses page boundaries)
`markdown`	Documentation	(header-aware, code detection)
`semantic`	Topic clustering	`--semantic-threshold 0.6`
`topic`	Long documents	`--topic-threshold 0.2`
`qa`	Q&A applications	(optimized for questions)

Markdown Chunking (Recommended for Docs)

$ sw-search ./docs \
>   --chunking-strategy markdown \
>   --file-types md \
>   --output docs.swsearch

This strategy:

Chunks at header boundaries
Detects code blocks and extracts language
Adds “code” tags to chunks containing code
Preserves section hierarchy in metadata

Sentence Chunking

$ sw-search ./docs \
>   --chunking-strategy sentence \
>   --max-sentences-per-chunk 10 \
>   --output knowledge.swsearch

Installing Search Dependencies

$ ## Query-only (smallest footprint)
$ pip install "signalwire-sdk[search-queryonly]"
$ 
$ ## Build indexes + vector search
$ pip install "signalwire-sdk[search]"
$ 
$ ## Full features (PDF, DOCX processing)
$ pip install "signalwire-sdk[search-full]"
$ 
$ ## All features including NLP
$ pip install "signalwire-sdk[search-all]"
$ 
$ ## PostgreSQL pgvector support
$ pip install "signalwire-sdk[pgvector]"

Using Search in Agents

Add the native_vector_search skill to enable search:

Language	Syntax
Python	`self.add_skill("native_vector_search", index_file="./knowledge.swsearch", count=5, tool_name="search_documents")`
TypeScript	`await agent.addSkillByName('native_vector_search', { index_file: './knowledge.swsearch', count: 5, tool_name: 'search_documents' })`

1 from signalwire import AgentBase
2 
3 class KnowledgeAgent(AgentBase):
4     def __init__(self):
5         super().__init__(name="knowledge-agent")
6         self.add_language("English", "en-US", "rime.spore")
7 
8         self.prompt_add_section(
9             "Role",
10             "You are a helpful assistant with access to company documentation. "
11             "Use the search_documents function to find relevant information."
12         )
13 
14         # Add search skill with local index
15         self.add_skill(
16             "native_vector_search",
17             index_file="./knowledge.swsearch",
18             count=5,  # Number of results
19             tool_name="search_documents",
20             tool_description="Search the company documentation"
21         )
22 
23 if __name__ == "__main__":
24     agent = KnowledgeAgent()
25     agent.run()

Skill Configuration Options

1 self.add_skill(
2     "native_vector_search",
3     # Index source (choose one)
4     index_file="./knowledge.swsearch",       # Local SQLite index
5     # OR
6     # remote_url="http://search-server:8001", # Remote search server
7     # index_name="default",
8 
9     # Search parameters
10     count=5,                    # Results to return (1-20)
11     similarity_threshold=0.0,     # Min score (0.0-1.0)
12     tags=["docs", "api"],       # Filter by tags
13 
14     # Tool configuration
15     tool_name="search_knowledge",
16     tool_description="Search the knowledge base for information"
17 )

pgvector Backend

For production deployments, use PostgreSQL with pgvector:

Python

TypeScript

1 self.add_skill(
2     "native_vector_search",
3     backend="pgvector",
4     connection_string="postgresql://user:pass@localhost/db",
5     collection_name="knowledge_base",
6     count=5,
7     tool_name="search_docs"
8 )

CLI Commands

Build Index

$ ## Basic build
$ sw-search ./docs --output knowledge.swsearch
$ 
$ ## With specific file types
$ sw-search ./docs --file-types md,txt,rst --output knowledge.swsearch
$ 
$ ## With chunking strategy
$ sw-search ./docs --chunking-strategy markdown --output knowledge.swsearch
$ 
$ ## With tags
$ sw-search ./docs --tags documentation,api --output knowledge.swsearch

Validate Index

$ sw-search validate knowledge.swsearch

Search Index

$ sw-search search knowledge.swsearch "how do I configure auth"

Complete Example

1 #!/usr/bin/env python3
2 ## documentation_agent.py - Agent that searches documentation
3 from signalwire import AgentBase
4 from signalwire.core.function_result import FunctionResult
5 
6 class DocumentationAgent(AgentBase):
7     """Agent that searches documentation to answer questions"""
8 
9     def __init__(self):
10         super().__init__(name="docs-agent")
11         self.add_language("English", "en-US", "rime.spore")
12 
13         self.prompt_add_section(
14             "Role",
15             "You are a documentation assistant. When users ask questions, "
16             "search the documentation to find accurate answers. Always cite "
17             "the source document when providing information."
18         )
19 
20         self.prompt_add_section(
21             "Instructions",
22             """
23             1. When asked a question, use search_docs to find relevant information
24             2. Review the search results carefully
25             3. Synthesize an answer from the results
26             4. Mention which document the information came from
27             5. If nothing relevant is found, say so honestly
28             """
29         )
30 
31         # Add a simple search function for demonstration
32         # In production, use native_vector_search skill with a .swsearch index:
33         # self.add_skill("native_vector_search", index_file="./docs.swsearch")
34         self.define_tool(
35             name="search_docs",
36             description="Search the documentation for information",
37             parameters={
38                 "type": "object",
39                 "properties": {
40                     "query": {"type": "string", "description": "Search query"}
41                 },
42                 "required": ["query"]
43             },
44             handler=self.search_docs
45         )
46 
47     def search_docs(self, args, raw_data):
48         """Stub search function for demonstration"""
49         query = args.get("query", "")
50         return FunctionResult(
51             f"Search results for '{query}': This is a demonstration. "
52             "In production, use native_vector_search skill with a .swsearch index file."
53         )
54 
55 if __name__ == "__main__":
56     agent = DocumentationAgent()
57     agent.run()

This example uses a stub function for demonstration. In production, use the native_vector_search skill with a .swsearch index file built using sw-search.

Multiple Knowledge Bases

Add multiple search instances for different topics:

1 ## Product documentation
2 self.add_skill(
3     "native_vector_search",
4     index_file="./products.swsearch",
5     tool_name="search_products",
6     tool_description="Search product catalog and specifications"
7 )
8 
9 ## Support articles
10 self.add_skill(
11     "native_vector_search",
12     index_file="./support.swsearch",
13     tool_name="search_support",
14     tool_description="Search support articles and troubleshooting guides"
15 )
16 
17 ## API documentation
18 self.add_skill(
19     "native_vector_search",
20     index_file="./api-docs.swsearch",
21     tool_name="search_api",
22     tool_description="Search API reference documentation"
23 )

Understanding Embeddings

Search works by converting text into numerical vectors (embeddings) that capture semantic meaning. Similar concepts have similar vectors, enabling “meaning-based” search rather than just keyword matching.

How it works:

At index time: Each document chunk is converted to a vector and stored
At query time: The search query is converted to a vector
Matching: Chunks with vectors closest to the query vector are returned

This means “return policy” will match documents about “refund process” or “merchandise exchange” even if they don’t contain those exact words.

Index Management

When to Rebuild Indexes

Rebuild your search index when:

Source documents are added, removed, or significantly changed
You change chunking strategy
You want to add or modify tags
Search quality degrades

Keeping Indexes Updated

For production systems, automate index rebuilding:

$ #!/bin/bash
$ # rebuild_index.sh - Run on document updates
$ 
$ sw-search ./docs \
>   --chunking-strategy markdown \
>   --output knowledge.swsearch.new
$ 
$ # Atomic replacement
$ mv knowledge.swsearch.new knowledge.swsearch
$ 
$ echo "Index rebuilt at $(date)"

Index Size and Performance

Rough sizing:

100 documents (~50KB each) — approximately 10-20MB index
1,000 documents — approximately 100-200MB index
10,000+ documents — Consider pgvector for better performance

Query Optimization

Writing Good Prompts for Search

1 self.prompt_add_section(
2     "Search Instructions",
3     """
4     When users ask questions:
5     1. First search the documentation using search_docs
6     2. Review all results before answering
7     3. Cite which document your answer came from
8     4. If results aren't relevant, try a different search query
9     5. If no results help, acknowledge you couldn't find the answer
10     """
11 )

Tuning Search Parameters

count: Number of results to return

count=3: Focused answers, faster response
count=5: Good balance (default)
count=10: More comprehensive, but may include less relevant results

similarity_threshold: Minimum relevance score (0.0 to 1.0) — higher is better.

0.0: Return all results regardless of relevance
0.3: Filter out clearly irrelevant results
0.5+: Only high-confidence matches (may miss relevant content)

tags: Filter by document categories

1 self.add_skill(
2     "native_vector_search",
3     index_file="./knowledge.swsearch",
4     tags=["policies", "returns"],  # Only search these categories
5     tool_name="search_policies"
6 )

Troubleshooting

”No results found”

Check that the index file exists and is readable
Verify the query is meaningful (not too short or generic)
Lower similarity_threshold if set too high
Ensure documents were actually indexed (check with sw-search validate)

Poor result relevance

Try different chunking strategies
Increase count to see more results
Review source documents for quality
Consider adding tags to filter by category

Slow search performance

For large indexes, use pgvector instead of SQLite
Reduce count if you don’t need many results
Consider a remote search server for shared access

Search Best Practices

Index Building

Use markdown chunking for documentation
Keep chunks reasonably sized (5-10 sentences)
Add meaningful tags for filtering
Rebuild indexes when source docs change
Test search quality after building
Version your indexes with your documentation

Agent Configuration

Set count=3-5 for most use cases
Use similarity_threshold to filter noise
Give descriptive tool_name and tool_description
Tell AI when/how to use search in the prompt
Handle “no results” gracefully in your prompt

Production

Use pgvector for high-volume deployments
Consider remote search server for shared indexes
Monitor search latency and result quality
Automate index rebuilding when docs change
Log search queries to understand user needs