build_index_from_sources

View as MarkdownOpen in Claude

Build a complete search index from multiple source files and directories. This is the primary method for index construction. It handles file discovery, text extraction, chunking, embedding generation, and storage.

Parameters

sources
list[Path]Required

List of Path objects pointing to files and/or directories to index.

output_file
strRequired

Output path for the .swsearch file (SQLite backend) or collection name (pgvector).

file_types
list[str]Required

File extensions to include when scanning directories (e.g., ["md", "txt", "py"]).

exclude_patterns
Optional[list[str]]

Glob patterns for files to exclude (e.g., ["**/node_modules/**"]).

languages
Optional[list[str]]

List of language codes to support. Defaults to ["en"].

tags
Optional[list[str]]

Global tags to add to every chunk in the index.

overwrite
boolDefaults to false

For the pgvector backend, drop and recreate the collection if it already exists.

Returns

None

Example

1from pathlib import Path
2from signalwire.search import IndexBuilder
3
4builder = IndexBuilder(chunking_strategy="markdown", verbose=True)
5builder.build_index_from_sources(
6 sources=[Path("./docs"), Path("./examples")],
7 output_file="knowledge.swsearch",
8 file_types=["md", "txt", "py"],
9 exclude_patterns=["**/test_*"],
10 tags=["documentation"],
11)