AgentsSearch

Helper Functions & Constants

View as MarkdownOpen in Claude

The Search module provides standalone helper functions for query preprocessing, document content preprocessing, and embedding model alias resolution, along with constants for model configuration.

1from signalwire.search import (
2 preprocess_query,
3 preprocess_document_content,
4 resolve_model_alias,
5 MODEL_ALIASES,
6 DEFAULT_MODEL,
7)

These functions require search dependencies. Install with pip install signalwire[search].


Functions

preprocess_query

preprocess_query(query, language="en", pos_to_expand=None, max_synonyms=5, debug=False, vector=False, query_nlp_backend="nltk", model_name=None, preserve_original=True) -> dict[str, Any]

Preprocess a search query with language detection, tokenization, stop word removal, POS tagging, synonym expansion, stemming, and optional vectorization. This function is used internally by SearchService and SearchEngine but can also be called directly for custom search pipelines.

Parameters

query
strRequired

Input query string.

language
strDefaults to en

Language code (e.g., "en", "es", "fr") or "auto" for automatic detection.

pos_to_expand
Optional[list[str]]

POS tags to expand with synonyms. Defaults to ["NOUN", "VERB", "ADJ"].

max_synonyms
intDefaults to 5

Maximum number of synonyms to add per word.

debug
boolDefaults to false

Enable debug logging output.

vector
boolDefaults to false

Include a vector embedding of the query in the output. Set to True when passing the result to SearchEngine.search().

query_nlp_backend
strDefaults to nltk

NLP backend for query processing. Valid values:

  • "nltk" — fast, lightweight (default)
  • "spacy" — better quality, requires spaCy models
model_name
Optional[str]

Sentence transformer model name for vectorization. Must match the model used to build the index being searched. If not specified, uses the default model.

preserve_original
boolDefaults to true

Keep the original query terms in the enhanced text alongside expanded synonyms and stems.

Returns

dict[str, Any] — A dictionary containing:

  • input (str) — the original query string as passed in
  • enhanced_text (str) — the preprocessed query text with synonyms and stems
  • language (str) — detected or specified language code
  • POS (dict) — POS tag analysis results
  • vector (list[float]) — embedding vector (only when vector=True)

Example

1from signalwire.search import preprocess_query
2
3# Basic preprocessing
4result = preprocess_query("How do I configure voice agents?")
5print(result["enhanced_text"])
6
7# With vectorization for search
8result = preprocess_query(
9 "How do I configure voice agents?",
10 vector=True,
11 language="auto",
12)
13query_vector = result["vector"]
14enhanced_text = result["enhanced_text"]

preprocess_document_content

preprocess_document_content(content, language="en", index_nlp_backend="nltk") -> dict[str, Any]

Preprocess document content for indexing. Uses less aggressive synonym expansion than query preprocessing to keep document representations focused.

This function is called internally by IndexBuilder during index construction.

Parameters

content
strRequired

Document text content to preprocess.

language
strDefaults to en

Language code for processing.

index_nlp_backend
strDefaults to nltk

NLP backend for processing. "nltk" or "spacy".

Returns

dict[str, Any] — A dictionary containing:

  • enhanced_text (str) — the preprocessed document text
  • keywords (list[str]) — up to 20 extracted keywords (stop words removed)
  • language (str) — the language used for processing
  • pos_analysis (dict) — POS tag analysis

Example

1from signalwire.search import preprocess_document_content
2
3result = preprocess_document_content(
4 "SignalWire agents can be configured with custom prompts and tools.",
5 language="en",
6)
7print(result["keywords"])
8# ['signalwire', 'agents', 'configured', 'custom', 'prompts', 'tools']

resolve_model_alias

resolve_model_alias(model_name) -> str

Resolve a short model alias to its full model name. If the input is not a known alias, it is returned unchanged.

Parameters

model_name
strRequired

A model alias or full model name. Known aliases:

  • "mini"sentence-transformers/all-MiniLM-L6-v2 (384 dims, fastest)
  • "base"sentence-transformers/all-mpnet-base-v2 (768 dims, balanced)
  • "large"sentence-transformers/all-mpnet-base-v2 (768 dims, same as base)

Returns

str — The full sentence transformer model name.

Example

1from signalwire.search import resolve_model_alias
2
3print(resolve_model_alias("mini"))
4# "sentence-transformers/all-MiniLM-L6-v2"
5
6print(resolve_model_alias("sentence-transformers/all-mpnet-base-v2"))
7# "sentence-transformers/all-mpnet-base-v2" (unchanged)

Constants

MODEL_ALIASES

1from signalwire.search import MODEL_ALIASES
2
3print(MODEL_ALIASES) # dict[str, str]

Dictionary mapping short model aliases to full sentence transformer model names.

AliasFull Model NameDimensions
"mini"sentence-transformers/all-MiniLM-L6-v2384
"base"sentence-transformers/all-mpnet-base-v2768
"large"sentence-transformers/all-mpnet-base-v2768

DEFAULT_MODEL

1from signalwire.search import DEFAULT_MODEL
2
3print(DEFAULT_MODEL) # "sentence-transformers/all-MiniLM-L6-v2"

The default embedding model used for new indexes. This is the "mini" model, chosen for its smaller size and faster inference. Use the "base" alias or specify a full model name when higher embedding quality is needed.