AgentsCLI Tools

sw-search

View as MarkdownOpen in Claude

The sw-search command builds vector search indexes from documents, searches existing indexes, validates index integrity, migrates between storage backends, and queries remote search servers. Built indexes are used with the native_vector_search skill to give agents searchable knowledge bases.

Requires the search extras: pip install "signalwire[search]". For PDF/DOCX support use [search-full]. For advanced NLP use [search-nlp].

Command Modes

sw-search operates in five modes based on the first argument:

$sw-search <sources...> [build-options] # Build mode (default)
$sw-search search <file> <query> [search-options] # Search mode
$sw-search validate <file> [--verbose] # Validate mode
$sw-search migrate <file> [migrate-options] # Migrate mode
$sw-search remote <url> <query> [remote-options] # Remote search mode

Build Mode

Build a vector search index from files and directories.

$sw-search ./docs --output knowledge.swsearch
$sw-search ./docs ./examples README.md --file-types md,txt,py

Build Options

sources
stringRequired

One or more source files or directories to index.

--output
string

Output file path (.swsearch) or collection name for pgvector. Defaults to sources.swsearch for single-source builds.

--output-dir
string

Output directory. For --output-format json, creates one file per source document. Mutually exclusive with --output.

--output-format
stringDefaults to index

Output format. Valid values:

  • "index" — Create a searchable .swsearch index (default)
  • "json" — Export chunks as JSON for review or external processing
--backend
stringDefaults to sqlite

Storage backend. Valid values:

  • "sqlite" — Portable .swsearch file (default)
  • "pgvector" — PostgreSQL with pgvector extension
--connection-string
string

PostgreSQL connection string. Required when --backend pgvector.

--overwrite
flag

Overwrite an existing pgvector collection.

--file-types
stringDefaults to md,txt,rst

Comma-separated file extensions to include when indexing directories.

--exclude
string

Comma-separated glob patterns to exclude (e.g., "**/test/**,**/__pycache__/**").

--languages
stringDefaults to en

Comma-separated language codes for the indexed content.

--model
stringDefaults to mini

Embedding model name or alias. Valid aliases:

  • "mini"all-MiniLM-L6-v2 (384 dims, fastest, default)
  • "base"all-mpnet-base-v2 (768 dims, balanced)
  • "large"all-mpnet-base-v2 (768 dims, highest quality)

You can also pass a full model name (e.g., "sentence-transformers/all-mpnet-base-v2").

--tags
string

Comma-separated tags added to all chunks. Tags can be used to filter search results.

--index-nlp-backend
stringDefaults to nltk

NLP backend for document processing. Valid values:

  • "nltk" — Fast, good quality (default)
  • "spacy" — Better quality, slower. Requires [search-nlp] extras.
--validate
flag

Validate the index after building.

--verbose
flag

Enable detailed output during build.

Chunking Strategies

--chunking-strategy
stringDefaults to sentence

How documents are split into searchable chunks. Valid values:

  • "sentence" — Groups sentences together (default)
  • "sliding" — Fixed-size word windows with overlap
  • "paragraph" — Splits on double newlines
  • "page" — One chunk per page (best for PDFs)
  • "semantic" — Groups semantically similar sentences
  • "topic" — Detects topic boundaries
  • "qa" — Optimized for question-answering
  • "markdown" — Header-aware chunking with code block detection
  • "json" — Pre-chunked JSON input

Strategy-Specific Options

--max-sentences-per-chunk
intDefaults to 5

Maximum sentences per chunk. Used with sentence strategy.

--split-newlines
int

Split on this many consecutive newlines. Used with sentence strategy.

--chunk-size
intDefaults to 50

Chunk size in words. Used with sliding strategy.

--overlap-size
intDefaults to 10

Overlap size in words between consecutive chunks. Used with sliding strategy.

--semantic-threshold
floatDefaults to 0.5

Similarity threshold for grouping sentences. Used with semantic strategy. Lower values produce larger chunks.

--topic-threshold
floatDefaults to 0.3

Similarity threshold for detecting topic changes. Used with topic strategy. Lower values produce more fine-grained topic boundaries.

Use the markdown strategy for documentation with code blocks. It preserves header hierarchy, detects fenced code blocks, and adds language-specific tags for better search relevance.


Search Mode

Search an existing index with a natural language query.

$sw-search search knowledge.swsearch "how to create an agent"
$sw-search search knowledge.swsearch "API reference" --count 3 --verbose

Search Options

--count
intDefaults to 5

Number of results to return.

--distance-threshold
floatDefaults to 0.0

Minimum similarity score. Results below this threshold are excluded.

--tags
string

Comma-separated tags to filter results.

--query-nlp-backend
stringDefaults to nltk

NLP backend for query processing.

  • "nltk" — Fast, good quality (default)
  • "spacy" — Better quality, slower. Requires [search-nlp] extras.
--json
flag

Output results as JSON.

--no-content
flag

Show metadata only, hide chunk content.

--shell
flag

Start an interactive search shell. Load the index once and run multiple queries.

Validate Mode

Verify index integrity and display index metadata.

$sw-search validate knowledge.swsearch
$sw-search validate knowledge.swsearch --verbose

Output includes chunk count, file count, embedding model, dimensions, chunking strategy, and creation timestamp.


Migrate Mode

Migrate indexes between storage backends.

$sw-search migrate --info ./docs.swsearch
$sw-search migrate ./docs.swsearch --to-pgvector \
> --connection-string "postgresql://user:pass@localhost/db" \
> --collection-name docs_collection

Migrate Options

--info
flag

Show index information without migrating.

--to-pgvector
flag

Migrate a SQLite index to PostgreSQL pgvector.

--collection-name
string

Target collection name in PostgreSQL.

--batch-size
intDefaults to 100

Number of chunks per migration batch.

Remote Mode

Search via a remote search API endpoint.

$sw-search remote http://localhost:8001 "how to create an agent" --index-name docs

Remote Options

--index-name
stringRequired

Name of the index to search on the remote server.

--timeout
intDefaults to 30

Request timeout in seconds.

The --count, --distance-threshold, --tags, --json, --no-content, and --verbose options from search mode also apply to remote searches.


Examples

Build and Search Workflow

$# Build from documentation with markdown-aware chunking
$sw-search ./docs \
> --chunking-strategy markdown \
> --file-types md \
> --output knowledge.swsearch \
> --verbose
$
$# Validate the index
$sw-search validate knowledge.swsearch
$
$# Search interactively
$sw-search search knowledge.swsearch --shell

Full Configuration Build

$sw-search ./docs ./examples README.md \
> --output ./knowledge.swsearch \
> --chunking-strategy sentence \
> --max-sentences-per-chunk 8 \
> --file-types md,txt,rst,py \
> --exclude "**/test/**,**/__pycache__/**" \
> --languages en,es,fr \
> --model base \
> --tags documentation,api \
> --index-nlp-backend nltk \
> --validate \
> --verbose

PostgreSQL pgvector Backend

$# Build directly to pgvector
$sw-search ./docs \
> --backend pgvector \
> --connection-string "postgresql://user:pass@localhost:5432/knowledge" \
> --output docs_collection \
> --chunking-strategy markdown
$
$# Search in pgvector collection
$sw-search search docs_collection "how to create an agent" \
> --backend pgvector \
> --connection-string "postgresql://user:pass@localhost/knowledge"

JSON Export and Re-import

$# Export chunks for review
$sw-search ./docs --output-format json --output all_chunks.json
$
$# Build index from exported JSON
$sw-search ./chunks/ \
> --chunking-strategy json \
> --file-types json \
> --output final.swsearch

Using with an Agent

After building an index, add it to an agent via the native_vector_search skill:

1from signalwire import AgentBase
2
3agent = AgentBase(name="search-agent")
4agent.set_prompt_text("You are a helpful assistant.")
5agent.add_skill("native_vector_search", {
6 "index_path": "./knowledge.swsearch",
7 "tool_name": "search_docs",
8 "tool_description": "Search the documentation",
9})
10
11if __name__ == "__main__":
12 agent.run()