*** title: DocumentProcessor slug: /reference/python/agents/search/document-processor description: Process and chunk documents for search indexing with multiple strategies. max-toc-depth: 3 --------------------- For a complete index of all SignalWire documentation pages, fetch https://signalwire.com/docs/llms.txt [createchunks]: /docs/server-sdks/reference/python/agents/search/document-processor/create-chunks `DocumentProcessor` handles document text extraction and chunking for search index construction. It supports multiple file formats (PDF, DOCX, HTML, Markdown, Excel, PowerPoint, RTF) and provides several chunking strategies optimized for different content types and search use cases. ```python from signalwire.search import DocumentProcessor ``` Full document processing requires additional dependencies. Install with `pip install signalwire[search-full]` for PDF, DOCX, and other format support. ## **Properties** The active chunking strategy. Maximum sentences per chunk when using the `sentence` strategy. Word count per chunk when using the `sliding` strategy. Word overlap between chunks when using the `sliding` strategy. Number of consecutive newlines that trigger a split before sentence tokenization in the `sentence` strategy. `None` when not explicitly set. Similarity threshold for the `semantic` chunking strategy. Similarity threshold for the `topic` chunking strategy. ## **Methods** Split document content into chunks using the configured chunking strategy.