***

title: build_index_from_sources
slug: /reference/python/agents/search/index-builder/build-index-from-sources
description: Build a complete search index from multiple source files and directories.
max-toc-depth: 3
---------------------

For a complete index of all SignalWire documentation pages, fetch https://signalwire.com/docs/llms.txt

Build a complete search index from multiple source files and directories.
This is the primary method for index construction. It handles file discovery,
text extraction, chunking, embedding generation, and storage.

## **Parameters**

<ParamField path="sources" type="list[Path]" required={true} toc={true}>
  List of `Path` objects pointing to files and/or directories to index.
</ParamField>

<ParamField path="output_file" type="str" required={true} toc={true}>
  Output path for the `.swsearch` file (SQLite backend) or collection name (pgvector).
</ParamField>

<ParamField path="file_types" type="list[str]" required={true} toc={true}>
  File extensions to include when scanning directories (e.g., `["md", "txt", "py"]`).
</ParamField>

<ParamField path="exclude_patterns" type="Optional[list[str]]" toc={true}>
  Glob patterns for files to exclude (e.g., `["**/node_modules/**"]`).
</ParamField>

<ParamField path="languages" type="Optional[list[str]]" toc={true}>
  List of language codes to support. Defaults to `["en"]`.
</ParamField>

<ParamField path="tags" type="Optional[list[str]]" toc={true}>
  Global tags to add to every chunk in the index.
</ParamField>

<ParamField path="overwrite" type="bool" default="false" toc={true}>
  For the pgvector backend, drop and recreate the collection if it already exists.
</ParamField>

## **Returns**

`None`

## **Example**

```python {5}
from pathlib import Path
from signalwire.search import IndexBuilder

builder = IndexBuilder(chunking_strategy="markdown", verbose=True)
builder.build_index_from_sources(
    sources=[Path("./docs"), Path("./examples")],
    output_file="knowledge.swsearch",
    file_types=["md", "txt", "py"],
    exclude_patterns=["**/test_*"],
    tags=["documentation"],
)
```