sw-search
sw-search
sw-search
The sw-search command builds vector search indexes from documents, searches
existing indexes, validates index integrity, migrates between storage backends,
and queries remote search servers. Built indexes are used with the
native_vector_search skill to give agents searchable knowledge bases.
Requires the search extras: pip install "signalwire-sdk[search]".
For PDF/DOCX support use [search-full]. For advanced NLP use [search-nlp].
sw-search operates in five modes based on the first argument:
Build a vector search index from files and directories.
One or more source files or directories to index.
Output file path (.swsearch) or collection name for pgvector. Defaults to
sources.swsearch for single-source builds.
Output directory. For --output-format json, creates one file per source document.
Mutually exclusive with --output.
Output format. Valid values:
"index" — Create a searchable .swsearch index (default)"json" — Export chunks as JSON for review or external processingStorage backend. Valid values:
"sqlite" — Portable .swsearch file (default)"pgvector" — PostgreSQL with pgvector extensionPostgreSQL connection string. Required when --backend pgvector.
Overwrite an existing pgvector collection.
Comma-separated file extensions to include when indexing directories.
Comma-separated glob patterns to exclude (e.g., "**/test/**,**/__pycache__/**").
Comma-separated language codes for the indexed content.
Embedding model name or alias. Valid aliases:
"mini" — all-MiniLM-L6-v2 (384 dims, fastest, default)"base" — all-mpnet-base-v2 (768 dims, balanced)"large" — all-mpnet-base-v2 (768 dims, highest quality)You can also pass a full model name (e.g., "sentence-transformers/all-mpnet-base-v2").
Comma-separated tags added to all chunks. Tags can be used to filter search results.
NLP backend for document processing. Valid values:
"nltk" — Fast, good quality (default)"spacy" — Better quality, slower. Requires [search-nlp] extras.Validate the index after building.
Enable detailed output during build.
How documents are split into searchable chunks. Valid values:
"sentence" — Groups sentences together (default)"sliding" — Fixed-size word windows with overlap"paragraph" — Splits on double newlines"page" — One chunk per page (best for PDFs)"semantic" — Groups semantically similar sentences"topic" — Detects topic boundaries"qa" — Optimized for question-answering"markdown" — Header-aware chunking with code block detection"json" — Pre-chunked JSON inputMaximum sentences per chunk. Used with sentence strategy.
Split on this many consecutive newlines. Used with sentence strategy.
Chunk size in words. Used with sliding strategy.
Overlap size in words between consecutive chunks. Used with sliding strategy.
Similarity threshold for grouping sentences. Used with semantic strategy.
Lower values produce larger chunks.
Similarity threshold for detecting topic changes. Used with topic strategy.
Lower values produce more fine-grained topic boundaries.
Use the markdown strategy for documentation with code blocks. It preserves
header hierarchy, detects fenced code blocks, and adds language-specific tags
for better search relevance.
Search an existing index with a natural language query.
Number of results to return.
Minimum similarity score. Results below this threshold are excluded.
Comma-separated tags to filter results.
NLP backend for query processing.
"nltk" — Fast, good quality (default)"spacy" — Better quality, slower. Requires [search-nlp] extras.Output results as JSON.
Show metadata only, hide chunk content.
Start an interactive search shell. Load the index once and run multiple queries.
Verify index integrity and display index metadata.
Output includes chunk count, file count, embedding model, dimensions, chunking strategy, and creation timestamp.
Migrate indexes between storage backends.
Show index information without migrating.
Migrate a SQLite index to PostgreSQL pgvector.
Target collection name in PostgreSQL.
Number of chunks per migration batch.
Search via a remote search API endpoint.
Name of the index to search on the remote server.
Request timeout in seconds.
The --count, --distance-threshold, --tags, --json, --no-content, and
--verbose options from search mode also apply to remote searches.
After building an index, add it to an agent via the native_vector_search skill: