*** title: Helper Functions & Constants slug: /reference/python/agents/search/helpers description: Query preprocessing, document preprocessing, model alias resolution, and constants. max-toc-depth: 3 --------------------- For a complete index of all SignalWire documentation pages, fetch https://signalwire.com/docs/llms.txt [searchservice]: /docs/server-sdks/reference/python/agents/search/search-service [searchengine]: /docs/server-sdks/reference/python/agents/search/search-engine [searchengine-search]: /docs/server-sdks/reference/python/agents/search/search-engine/search [indexbuilder]: /docs/server-sdks/reference/python/agents/search/index-builder The Search module provides standalone helper functions for query preprocessing, document content preprocessing, and embedding model alias resolution, along with constants for model configuration. ```python from signalwire.search import ( preprocess_query, preprocess_document_content, resolve_model_alias, MODEL_ALIASES, DEFAULT_MODEL, ) ``` These functions require search dependencies. Install with `pip install signalwire[search]`. *** ## Functions ### preprocess\_query **preprocess\_query**(`query`, `language="en"`, `pos_to_expand=None`, `max_synonyms=5`, `debug=False`, `vector=False`, `query_nlp_backend="nltk"`, `model_name=None`, `preserve_original=True`) -> `dict[str, Any]` Preprocess a search query with language detection, tokenization, stop word removal, POS tagging, synonym expansion, stemming, and optional vectorization. This function is used internally by [`SearchService`][searchservice] and [`SearchEngine`][searchengine] but can also be called directly for custom search pipelines. #### Parameters Input query string. Language code (e.g., `"en"`, `"es"`, `"fr"`) or `"auto"` for automatic detection. POS tags to expand with synonyms. Defaults to `["NOUN", "VERB", "ADJ"]`. Maximum number of synonyms to add per word. Enable debug logging output. Include a vector embedding of the query in the output. Set to `True` when passing the result to [`SearchEngine.search()`][searchengine-search]. NLP backend for query processing. Valid values: * `"nltk"` -- fast, lightweight (default) * `"spacy"` -- better quality, requires spaCy models Sentence transformer model name for vectorization. Must match the model used to build the index being searched. If not specified, uses the default model. Keep the original query terms in the enhanced text alongside expanded synonyms and stems. #### Returns `dict[str, Any]` -- A dictionary containing: * `input` (str) -- the original query string as passed in * `enhanced_text` (str) -- the preprocessed query text with synonyms and stems * `language` (str) -- detected or specified language code * `POS` (dict) -- POS tag analysis results * `vector` (list\[float]) -- embedding vector (only when `vector=True`) #### Example ```python from signalwire.search import preprocess_query # Basic preprocessing result = preprocess_query("How do I configure voice agents?") print(result["enhanced_text"]) # With vectorization for search result = preprocess_query( "How do I configure voice agents?", vector=True, language="auto", ) query_vector = result["vector"] enhanced_text = result["enhanced_text"] ``` *** ### preprocess\_document\_content **preprocess\_document\_content**(`content`, `language="en"`, `index_nlp_backend="nltk"`) -> `dict[str, Any]` Preprocess document content for indexing. Uses less aggressive synonym expansion than query preprocessing to keep document representations focused. This function is called internally by [`IndexBuilder`][indexbuilder] during index construction. #### Parameters Document text content to preprocess. Language code for processing. NLP backend for processing. `"nltk"` or `"spacy"`. #### Returns `dict[str, Any]` -- A dictionary containing: * `enhanced_text` (str) -- the preprocessed document text * `keywords` (list\[str]) -- up to 20 extracted keywords (stop words removed) * `language` (str) -- the language used for processing * `pos_analysis` (dict) -- POS tag analysis #### Example ```python from signalwire.search import preprocess_document_content result = preprocess_document_content( "SignalWire agents can be configured with custom prompts and tools.", language="en", ) print(result["keywords"]) # ['signalwire', 'agents', 'configured', 'custom', 'prompts', 'tools'] ``` *** ### resolve\_model\_alias **resolve\_model\_alias**(`model_name`) -> `str` Resolve a short model alias to its full model name. If the input is not a known alias, it is returned unchanged. #### Parameters A model alias or full model name. Known aliases: * `"mini"` -- `sentence-transformers/all-MiniLM-L6-v2` (384 dims, fastest) * `"base"` -- `sentence-transformers/all-mpnet-base-v2` (768 dims, balanced) * `"large"` -- `sentence-transformers/all-mpnet-base-v2` (768 dims, same as base) #### Returns `str` -- The full sentence transformer model name. #### Example ```python from signalwire.search import resolve_model_alias print(resolve_model_alias("mini")) # "sentence-transformers/all-MiniLM-L6-v2" print(resolve_model_alias("sentence-transformers/all-mpnet-base-v2")) # "sentence-transformers/all-mpnet-base-v2" (unchanged) ``` *** ## Constants ### MODEL\_ALIASES ```python from signalwire.search import MODEL_ALIASES print(MODEL_ALIASES) # dict[str, str] ``` Dictionary mapping short model aliases to full sentence transformer model names. | Alias | Full Model Name | Dimensions | | --------- | ----------------------------------------- | ---------- | | `"mini"` | `sentence-transformers/all-MiniLM-L6-v2` | 384 | | `"base"` | `sentence-transformers/all-mpnet-base-v2` | 768 | | `"large"` | `sentence-transformers/all-mpnet-base-v2` | 768 | *** ### DEFAULT\_MODEL ```python from signalwire.search import DEFAULT_MODEL print(DEFAULT_MODEL) # "sentence-transformers/all-MiniLM-L6-v2" ``` The default embedding model used for new indexes. This is the `"mini"` model, chosen for its smaller size and faster inference. Use the `"base"` alias or specify a full model name when higher embedding quality is needed.