sw-search | SignalWire

The sw-search command builds vector search indexes from documents, searches existing indexes, validates index integrity, migrates between storage backends, and queries remote search servers. Built indexes are used with the native_vector_search skill to give agents searchable knowledge bases.

Requires the search extras: pip install "signalwire-sdk[search]". For PDF/DOCX support use [search-full]. For advanced NLP use [search-nlp].

Command Modes

sw-search operates in five modes based on the first argument:

$ sw-search <sources...> [build-options]           # Build mode (default)
$ sw-search search <file> <query> [search-options] # Search mode
$ sw-search validate <file> [--verbose]            # Validate mode
$ sw-search migrate <file> [migrate-options]       # Migrate mode
$ sw-search remote <url> <query> [remote-options]  # Remote search mode

Build Mode

Build a vector search index from files and directories.

$ sw-search ./docs --output knowledge.swsearch
$ sw-search ./docs ./examples README.md --file-types md,txt,py

Build Options

sources

stringRequired

One or more source files or directories to index.

--output

string

Output file path (.swsearch) or collection name for pgvector. Defaults to sources.swsearch for single-source builds.

--output-dir

string

Output directory. For --output-format json, creates one file per source document. Mutually exclusive with --output.

--output-format

stringDefaults to index

Output format. Valid values:

"index" — Create a searchable .swsearch index (default)
"json" — Export chunks as JSON for review or external processing

--backend

stringDefaults to sqlite

Storage backend. Valid values:

"sqlite" — Portable .swsearch file (default)
"pgvector" — PostgreSQL with pgvector extension

--connection-string

string

PostgreSQL connection string. Required when --backend pgvector.

--overwrite

flag

Overwrite an existing pgvector collection.

--file-types

stringDefaults to md,txt,rst

Comma-separated file extensions to include when indexing directories.

--exclude

string

Comma-separated glob patterns to exclude (e.g., "**/test/**,**/__pycache__/**").

--languages

stringDefaults to en

Comma-separated language codes for the indexed content.

--model

stringDefaults to mini

Embedding model name or alias. Valid aliases:

"mini" — all-MiniLM-L6-v2 (384 dims, fastest, default)
"base" — all-mpnet-base-v2 (768 dims, balanced)
"large" — all-mpnet-base-v2 (768 dims, highest quality)

You can also pass a full model name (e.g., "sentence-transformers/all-mpnet-base-v2").

--tags

string

Comma-separated tags added to all chunks. Tags can be used to filter search results.

--index-nlp-backend

stringDefaults to nltk

NLP backend for document processing. Valid values:

"nltk" — Fast, good quality (default)
"spacy" — Better quality, slower. Requires [search-nlp] extras.

--validate

flag

Validate the index after building.

--verbose

flag

Enable detailed output during build.

Chunking Strategies

--chunking-strategy

stringDefaults to sentence

How documents are split into searchable chunks. Valid values:

"sentence" — Groups sentences together (default)
"sliding" — Fixed-size word windows with overlap
"paragraph" — Splits on double newlines
"page" — One chunk per page (best for PDFs)
"semantic" — Groups semantically similar sentences
"topic" — Detects topic boundaries
"qa" — Optimized for question-answering
"markdown" — Header-aware chunking with code block detection
"json" — Pre-chunked JSON input

Strategy-Specific Options

--max-sentences-per-chunk

intDefaults to 5

Maximum sentences per chunk. Used with sentence strategy.

--split-newlines

int

Split on this many consecutive newlines. Used with sentence strategy.

--chunk-size

intDefaults to 50

Chunk size in words. Used with sliding strategy.

--overlap-size

intDefaults to 10

Overlap size in words between consecutive chunks. Used with sliding strategy.

--semantic-threshold

floatDefaults to 0.5

Similarity threshold for grouping sentences. Used with semantic strategy. Lower values produce larger chunks.

--topic-threshold

floatDefaults to 0.3

Similarity threshold for detecting topic changes. Used with topic strategy. Lower values produce more fine-grained topic boundaries.

Use the markdown strategy for documentation with code blocks. It preserves header hierarchy, detects fenced code blocks, and adds language-specific tags for better search relevance.

Search Mode

Search an existing index with a natural language query.

$ sw-search search knowledge.swsearch "how to create an agent"
$ sw-search search knowledge.swsearch "API reference" --count 3 --verbose

Search Options

--count

intDefaults to 5

Number of results to return.

--distance-threshold

floatDefaults to 0.0

Minimum similarity score. Results below this threshold are excluded.

--tags

string

Comma-separated tags to filter results.

--query-nlp-backend

stringDefaults to nltk

NLP backend for query processing.

"nltk" — Fast, good quality (default)
"spacy" — Better quality, slower. Requires [search-nlp] extras.

--json

flag

Output results as JSON.

--no-content

flag

Show metadata only, hide chunk content.

--shell

flag

Start an interactive search shell. Load the index once and run multiple queries.

Validate Mode

Verify index integrity and display index metadata.

$ sw-search validate knowledge.swsearch
$ sw-search validate knowledge.swsearch --verbose

Output includes chunk count, file count, embedding model, dimensions, chunking strategy, and creation timestamp.

Migrate Mode

Migrate indexes between storage backends.

$ sw-search migrate --info ./docs.swsearch
$ sw-search migrate ./docs.swsearch --to-pgvector \
>   --connection-string "postgresql://user:pass@localhost/db" \
>   --collection-name docs_collection

Migrate Options

--info

flag

Show index information without migrating.

--to-pgvector

flag

Migrate a SQLite index to PostgreSQL pgvector.

--collection-name

string

Target collection name in PostgreSQL.

--batch-size

intDefaults to 100

Number of chunks per migration batch.

Remote Mode

Search via a remote search API endpoint.

$ sw-search remote http://localhost:8001 "how to create an agent" --index-name docs

Remote Options

--index-name

stringRequired

Name of the index to search on the remote server.

--timeout

intDefaults to 30

Request timeout in seconds.

The --count, --distance-threshold, --tags, --json, --no-content, and --verbose options from search mode also apply to remote searches.

Examples

Build and Search Workflow

$ # Build from documentation with markdown-aware chunking
$ sw-search ./docs \
>   --chunking-strategy markdown \
>   --file-types md \
>   --output knowledge.swsearch \
>   --verbose
$ 
$ # Validate the index
$ sw-search validate knowledge.swsearch
$ 
$ # Search interactively
$ sw-search search knowledge.swsearch --shell

Full Configuration Build

$ sw-search ./docs ./examples README.md \
>   --output ./knowledge.swsearch \
>   --chunking-strategy sentence \
>   --max-sentences-per-chunk 8 \
>   --file-types md,txt,rst,py \
>   --exclude "**/test/**,**/__pycache__/**" \
>   --languages en,es,fr \
>   --model base \
>   --tags documentation,api \
>   --index-nlp-backend nltk \
>   --validate \
>   --verbose

PostgreSQL pgvector Backend

$ # Build directly to pgvector
$ sw-search ./docs \
>   --backend pgvector \
>   --connection-string "postgresql://user:pass@localhost:5432/knowledge" \
>   --output docs_collection \
>   --chunking-strategy markdown
$ 
$ # Search in pgvector collection
$ sw-search search docs_collection "how to create an agent" \
>   --backend pgvector \
>   --connection-string "postgresql://user:pass@localhost/knowledge"

JSON Export and Re-import

$ # Export chunks for review
$ sw-search ./docs --output-format json --output all_chunks.json
$ 
$ # Build index from exported JSON
$ sw-search ./chunks/ \
>   --chunking-strategy json \
>   --file-types json \
>   --output final.swsearch

Using with an Agent

After building an index, add it to an agent via the native_vector_search skill:

1 from signalwire import AgentBase
2 
3 agent = AgentBase(name="search-agent")
4 agent.set_prompt_text("You are a helpful assistant.")
5 agent.add_skill("native_vector_search", {
6     "index_path": "./knowledge.swsearch",
7     "tool_name": "search_docs",
8     "tool_description": "Search the documentation",
9 })
10 
11 if __name__ == "__main__":
12     agent.run()