create_chunks

View as MarkdownOpen in Claude

Split document content into chunks using the configured chunking strategy. Each chunk includes metadata about its source file, section, and position within the original document.

The content parameter should be the actual text content of the document, not a file path. Use the appropriate extraction method first for binary formats.

Parameters

content
strRequired

Document text content to chunk.

filename
strRequired

Name of the source file, used for metadata in each chunk.

file_type
strRequired

File extension or type (e.g., "md", "py", "txt").

Returns

list[dict] — A list of chunk dictionaries, each containing:

  • content (str) — the chunk text
  • filename (str) — source filename
  • section (str | None) — section name or hierarchy path
  • start_line (int | None) — starting line number in the source
  • end_line (int | None) — ending line number in the source
  • metadata (dict) — additional metadata (file type, word count, chunk method, etc.)

Example

1from signalwire.search import DocumentProcessor
2
3processor = DocumentProcessor(chunking_strategy="paragraph")
4
5with open("README.md") as f:
6 content = f.read()
7
8chunks = processor.create_chunks(content, "README.md", "md")
9for chunk in chunks:
10 print(f"[{chunk['section']}] {len(chunk['content'])} chars")