AgentsSkills
Fast web scraping and crawling. Fetches web pages and extracts content optimized for token efficiency.
Tools: scrape_url, crawl_site, extract_structured_data
Requirements: lxml
Multi-instance: Yes
Delay between requests in seconds.
Number of concurrent requests allowed (1–20).
Request timeout in seconds (1–60).
Maximum number of pages to scrape (1–100).
Maximum crawl depth. 0 means single page only (0–5).
Content extraction method: "fast_text", "clean_text", "full_text", "html", or "custom".
Maximum text length to return (100–100000).
Whether to clean extracted text by collapsing whitespace.
Custom CSS or XPath selectors for structured data extraction. Keys are field names, values are selector strings.
URL patterns (regex strings) to follow when crawling. Only links matching at least one pattern are followed.
User agent string sent with each request.
Additional HTTP headers to include with each request.
Whether to respect robots.txt rules when crawling.
Whether to cache scraped pages in memory to avoid re-fetching.