***

title: DocumentProcessor
slug: /reference/python/agents/search/document-processor
description: Process and chunk documents for search indexing with multiple strategies.
max-toc-depth: 3
---------------------

For a complete index of all SignalWire documentation pages, fetch https://signalwire.com/docs/llms.txt

[createchunks]: /docs/server-sdks/reference/python/agents/search/document-processor/create-chunks

`DocumentProcessor` handles document text extraction and chunking for search index
construction. It supports multiple file formats (PDF, DOCX, HTML, Markdown, Excel,
PowerPoint, RTF) and provides several chunking strategies optimized for different
content types and search use cases.

```python
from signalwire.search import DocumentProcessor
```

<Warning>
  Full document processing requires additional dependencies. Install with
  `pip install signalwire[search-full]` for PDF, DOCX, and other format support.
</Warning>

## **Properties**

<ParamField path="chunking_strategy" type="str" toc={true}>
  The active chunking strategy.
</ParamField>

<ParamField path="max_sentences_per_chunk" type="int" toc={true}>
  Maximum sentences per chunk when using the `sentence` strategy.
</ParamField>

<ParamField path="chunk_size" type="int" toc={true}>
  Word count per chunk when using the `sliding` strategy.
</ParamField>

<ParamField path="chunk_overlap" type="int" toc={true}>
  Word overlap between chunks when using the `sliding` strategy.
</ParamField>

<ParamField path="split_newlines" type="int | None" toc={true}>
  Number of consecutive newlines that trigger a split before sentence tokenization
  in the `sentence` strategy. `None` when not explicitly set.
</ParamField>

<ParamField path="semantic_threshold" type="float" toc={true}>
  Similarity threshold for the `semantic` chunking strategy.
</ParamField>

<ParamField path="topic_threshold" type="float" toc={true}>
  Similarity threshold for the `topic` chunking strategy.
</ParamField>

## **Methods**

<CardGroup cols={2}>
  <Card title="create_chunks" href="/docs/server-sdks/reference/python/agents/search/document-processor/create-chunks">
    Split document content into chunks using the configured chunking strategy.
  </Card>
</CardGroup>