create_chunks | SignalWire

Split document content into chunks using the configured chunking strategy. Each chunk includes metadata about its source file, section, and position within the original document.

The content parameter should be the actual text content of the document, not a file path. Use the appropriate extraction method first for binary formats.

Parameters

content

strRequired

Document text content to chunk.

filename

strRequired

Name of the source file, used for metadata in each chunk.

file_type

strRequired

File extension or type (e.g., "md", "py", "txt").

Returns

list[dict] — A list of chunk dictionaries, each containing:

content (str) — the chunk text
filename (str) — source filename
section (str | None) — section name or hierarchy path
start_line (int | None) — starting line number in the source
end_line (int | None) — ending line number in the source
metadata (dict) — additional metadata (file type, word count, chunk method, etc.)

Example

1 from signalwire.search import DocumentProcessor
2 
3 processor = DocumentProcessor(chunking_strategy="paragraph")
4 
5 with open("README.md") as f:
6     content = f.read()
7 
8 chunks = processor.create_chunks(content, "README.md", "md")
9 for chunk in chunks:
10     print(f"[{chunk['section']}] {len(chunk['content'])} chars")