sw-search
sw-search
The sw-search command builds vector search indexes from documents, searches
existing indexes, validates index integrity, migrates between storage backends,
and queries remote search servers. Built indexes are used with the
native_vector_search skill to give agents searchable knowledge bases.
Requires the search extras: pip install "signalwire[search]".
For PDF/DOCX support use [search-full]. For advanced NLP use [search-nlp].
Command Modes
sw-search operates in five modes based on the first argument:
Build Mode
Build a vector search index from files and directories.
Build Options
sources
One or more source files or directories to index.
--output
Output file path (.swsearch) or collection name for pgvector. Defaults to
sources.swsearch for single-source builds.
--output-dir
Output directory. For --output-format json, creates one file per source document.
Mutually exclusive with --output.
--output-format
Output format. Valid values:
"index"— Create a searchable.swsearchindex (default)"json"— Export chunks as JSON for review or external processing
--backend
Storage backend. Valid values:
"sqlite"— Portable.swsearchfile (default)"pgvector"— PostgreSQL with pgvector extension
--connection-string
PostgreSQL connection string. Required when --backend pgvector.
--overwrite
Overwrite an existing pgvector collection.
--file-types
Comma-separated file extensions to include when indexing directories.
--exclude
Comma-separated glob patterns to exclude (e.g., "**/test/**,**/__pycache__/**").
--languages
Comma-separated language codes for the indexed content.
--model
Embedding model name or alias. Valid aliases:
"mini"—all-MiniLM-L6-v2(384 dims, fastest, default)"base"—all-mpnet-base-v2(768 dims, balanced)"large"—all-mpnet-base-v2(768 dims, highest quality)
You can also pass a full model name (e.g., "sentence-transformers/all-mpnet-base-v2").
--tags
Comma-separated tags added to all chunks. Tags can be used to filter search results.
--index-nlp-backend
NLP backend for document processing. Valid values:
"nltk"— Fast, good quality (default)"spacy"— Better quality, slower. Requires[search-nlp]extras.
--validate
Validate the index after building.
--verbose
Enable detailed output during build.
Chunking Strategies
--chunking-strategy
How documents are split into searchable chunks. Valid values:
"sentence"— Groups sentences together (default)"sliding"— Fixed-size word windows with overlap"paragraph"— Splits on double newlines"page"— One chunk per page (best for PDFs)"semantic"— Groups semantically similar sentences"topic"— Detects topic boundaries"qa"— Optimized for question-answering"markdown"— Header-aware chunking with code block detection"json"— Pre-chunked JSON input
Strategy-Specific Options
--max-sentences-per-chunk
Maximum sentences per chunk. Used with sentence strategy.
--split-newlines
Split on this many consecutive newlines. Used with sentence strategy.
--chunk-size
Chunk size in words. Used with sliding strategy.
--overlap-size
Overlap size in words between consecutive chunks. Used with sliding strategy.
--semantic-threshold
Similarity threshold for grouping sentences. Used with semantic strategy.
Lower values produce larger chunks.
--topic-threshold
Similarity threshold for detecting topic changes. Used with topic strategy.
Lower values produce more fine-grained topic boundaries.
Use the markdown strategy for documentation with code blocks. It preserves
header hierarchy, detects fenced code blocks, and adds language-specific tags
for better search relevance.
Search Mode
Search an existing index with a natural language query.
Search Options
--count
Number of results to return.
--distance-threshold
Minimum similarity score. Results below this threshold are excluded.
--tags
Comma-separated tags to filter results.
--query-nlp-backend
NLP backend for query processing.
"nltk"— Fast, good quality (default)"spacy"— Better quality, slower. Requires[search-nlp]extras.
--json
Output results as JSON.
--no-content
Show metadata only, hide chunk content.
--shell
Start an interactive search shell. Load the index once and run multiple queries.
Validate Mode
Verify index integrity and display index metadata.
Output includes chunk count, file count, embedding model, dimensions, chunking strategy, and creation timestamp.
Migrate Mode
Migrate indexes between storage backends.
Migrate Options
--info
Show index information without migrating.
--to-pgvector
Migrate a SQLite index to PostgreSQL pgvector.
--collection-name
Target collection name in PostgreSQL.
--batch-size
Number of chunks per migration batch.
Remote Mode
Search via a remote search API endpoint.
Remote Options
--index-name
Name of the index to search on the remote server.
--timeout
Request timeout in seconds.
The --count, --distance-threshold, --tags, --json, --no-content, and
--verbose options from search mode also apply to remote searches.
Examples
Build and Search Workflow
Full Configuration Build
PostgreSQL pgvector Backend
JSON Export and Re-import
Using with an Agent
After building an index, add it to an agent via the native_vector_search skill: