*** title: sw-search slug: /reference/python/agents/cli/sw-search description: Build, search, and validate vector search indexes for AI agent knowledge bases. max-toc-depth: 3 --------------------- For a complete index of all SignalWire documentation pages, fetch https://signalwire.com/docs/llms.txt The `sw-search` command builds vector search indexes from documents, searches existing indexes, validates index integrity, migrates between storage backends, and queries remote search servers. Built indexes are used with the `native_vector_search` skill to give agents searchable knowledge bases. Requires the search extras: `pip install "signalwire[search]"`. For PDF/DOCX support use `[search-full]`. For advanced NLP use `[search-nlp]`. ## Command Modes `sw-search` operates in five modes based on the first argument: ```bash sw-search [build-options] # Build mode (default) sw-search search [search-options] # Search mode sw-search validate [--verbose] # Validate mode sw-search migrate [migrate-options] # Migrate mode sw-search remote [remote-options] # Remote search mode ``` *** ## Build Mode Build a vector search index from files and directories. ```bash sw-search ./docs --output knowledge.swsearch sw-search ./docs ./examples README.md --file-types md,txt,py ``` ### Build Options One or more source files or directories to index. Output file path (`.swsearch`) or collection name for pgvector. Defaults to `sources.swsearch` for single-source builds. Output directory. For `--output-format json`, creates one file per source document. Mutually exclusive with `--output`. Output format. Valid values: * `"index"` -- Create a searchable `.swsearch` index (default) * `"json"` -- Export chunks as JSON for review or external processing Storage backend. Valid values: * `"sqlite"` -- Portable `.swsearch` file (default) * `"pgvector"` -- PostgreSQL with pgvector extension PostgreSQL connection string. Required when `--backend pgvector`. Overwrite an existing pgvector collection. Comma-separated file extensions to include when indexing directories. Comma-separated glob patterns to exclude (e.g., `"**/test/**,**/__pycache__/**"`). Comma-separated language codes for the indexed content. Embedding model name or alias. Valid aliases: * `"mini"` -- `all-MiniLM-L6-v2` (384 dims, fastest, default) * `"base"` -- `all-mpnet-base-v2` (768 dims, balanced) * `"large"` -- `all-mpnet-base-v2` (768 dims, highest quality) You can also pass a full model name (e.g., `"sentence-transformers/all-mpnet-base-v2"`). Comma-separated tags added to all chunks. Tags can be used to filter search results. NLP backend for document processing. Valid values: * `"nltk"` -- Fast, good quality (default) * `"spacy"` -- Better quality, slower. Requires `[search-nlp]` extras. Validate the index after building. Enable detailed output during build. ## Chunking Strategies How documents are split into searchable chunks. Valid values: * `"sentence"` -- Groups sentences together (default) * `"sliding"` -- Fixed-size word windows with overlap * `"paragraph"` -- Splits on double newlines * `"page"` -- One chunk per page (best for PDFs) * `"semantic"` -- Groups semantically similar sentences * `"topic"` -- Detects topic boundaries * `"qa"` -- Optimized for question-answering * `"markdown"` -- Header-aware chunking with code block detection * `"json"` -- Pre-chunked JSON input ### Strategy-Specific Options Maximum sentences per chunk. Used with `sentence` strategy. Split on this many consecutive newlines. Used with `sentence` strategy. Chunk size in words. Used with `sliding` strategy. Overlap size in words between consecutive chunks. Used with `sliding` strategy. Similarity threshold for grouping sentences. Used with `semantic` strategy. Lower values produce larger chunks. Similarity threshold for detecting topic changes. Used with `topic` strategy. Lower values produce more fine-grained topic boundaries. Use the `markdown` strategy for documentation with code blocks. It preserves header hierarchy, detects fenced code blocks, and adds language-specific tags for better search relevance. *** ## Search Mode Search an existing index with a natural language query. ```bash sw-search search knowledge.swsearch "how to create an agent" sw-search search knowledge.swsearch "API reference" --count 3 --verbose ``` ### Search Options Number of results to return. Minimum similarity score. Results below this threshold are excluded. Comma-separated tags to filter results. NLP backend for query processing. * `"nltk"` -- Fast, good quality (default) * `"spacy"` -- Better quality, slower. Requires `[search-nlp]` extras. Output results as JSON. Show metadata only, hide chunk content. Start an interactive search shell. Load the index once and run multiple queries. ## Validate Mode Verify index integrity and display index metadata. ```bash sw-search validate knowledge.swsearch sw-search validate knowledge.swsearch --verbose ``` Output includes chunk count, file count, embedding model, dimensions, chunking strategy, and creation timestamp. *** ## Migrate Mode Migrate indexes between storage backends. ```bash sw-search migrate --info ./docs.swsearch sw-search migrate ./docs.swsearch --to-pgvector \ --connection-string "postgresql://user:pass@localhost/db" \ --collection-name docs_collection ``` ### Migrate Options Show index information without migrating. Migrate a SQLite index to PostgreSQL pgvector. Target collection name in PostgreSQL. Number of chunks per migration batch. ## Remote Mode Search via a remote search API endpoint. ```bash sw-search remote http://localhost:8001 "how to create an agent" --index-name docs ``` ### Remote Options Name of the index to search on the remote server. Request timeout in seconds. The `--count`, `--distance-threshold`, `--tags`, `--json`, `--no-content`, and `--verbose` options from search mode also apply to remote searches. *** ## Examples ### Build and Search Workflow ```bash # Build from documentation with markdown-aware chunking sw-search ./docs \ --chunking-strategy markdown \ --file-types md \ --output knowledge.swsearch \ --verbose # Validate the index sw-search validate knowledge.swsearch # Search interactively sw-search search knowledge.swsearch --shell ``` ### Full Configuration Build ```bash sw-search ./docs ./examples README.md \ --output ./knowledge.swsearch \ --chunking-strategy sentence \ --max-sentences-per-chunk 8 \ --file-types md,txt,rst,py \ --exclude "**/test/**,**/__pycache__/**" \ --languages en,es,fr \ --model base \ --tags documentation,api \ --index-nlp-backend nltk \ --validate \ --verbose ``` ### PostgreSQL pgvector Backend ```bash # Build directly to pgvector sw-search ./docs \ --backend pgvector \ --connection-string "postgresql://user:pass@localhost:5432/knowledge" \ --output docs_collection \ --chunking-strategy markdown # Search in pgvector collection sw-search search docs_collection "how to create an agent" \ --backend pgvector \ --connection-string "postgresql://user:pass@localhost/knowledge" ``` ### JSON Export and Re-import ```bash # Export chunks for review sw-search ./docs --output-format json --output all_chunks.json # Build index from exported JSON sw-search ./chunks/ \ --chunking-strategy json \ --file-types json \ --output final.swsearch ``` ### Using with an Agent After building an index, add it to an agent via the `native_vector_search` skill: ```python from signalwire import AgentBase agent = AgentBase(name="search-agent") agent.set_prompt_text("You are a helpful assistant.") agent.add_skill("native_vector_search", { "index_path": "./knowledge.swsearch", "tool_name": "search_docs", "tool_description": "Search the documentation", }) if __name__ == "__main__": agent.run() ```