*** title: spider slug: /reference/python/agents/skills/spider description: Fast web scraping and crawling. --------------------- For a complete index of all SignalWire documentation pages, fetch https://signalwire.com/docs/llms.txt Fast web scraping and crawling. Fetches web pages and extracts content optimized for token efficiency. **Tools:** `scrape_url`, `crawl_site`, `extract_structured_data` **Requirements:** `lxml` **Multi-instance:** Yes Delay between requests in seconds. Number of concurrent requests allowed (1–20). Request timeout in seconds (1–60). Maximum number of pages to scrape (1–100). Maximum crawl depth. `0` means single page only (0–5). Content extraction method: `"fast_text"`, `"clean_text"`, `"full_text"`, `"html"`, or `"custom"`. Maximum text length to return (100–100000). Whether to clean extracted text by collapsing whitespace. Custom CSS or XPath selectors for structured data extraction. Keys are field names, values are selector strings. URL patterns (regex strings) to follow when crawling. Only links matching at least one pattern are followed. User agent string sent with each request. Additional HTTP headers to include with each request. Whether to respect `robots.txt` rules when crawling. Whether to cache scraped pages in memory to avoid re-fetching. ```python from signalwire import AgentBase class MyAgent(AgentBase): def __init__(self): super().__init__(name="assistant", route="/assistant") self.set_prompt_text("You are a helpful assistant.") self.add_skill("spider", { "timeout": 10, "concurrent_requests": 3, "max_pages": 5 }) agent = MyAgent() agent.serve() ```