For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Log inSign up
Support
GuidesReference
GuidesReference
    • Core
      • Overview
    • Agents
      • Overview
      • AgentBase
      • AgentServer
      • Configuration
      • ContextBuilder
      • DataMap
      • FunctionResult
      • Helper Functions & Utilities
      • LiveWire
      • PomBuilder
      • Prefabs
      • SkillBase
      • SkillManager
      • SkillRegistry
      • Skills
        • ApiNinjasTriviaSkill
        • AskClaudeSkill
        • ClaudeSkillsSkill
        • CustomSkillsSkill
        • DataSphereServerlessSkill
        • DataSphereSkill
        • DateTimeSkill
        • GoogleMapsSkill
        • InfoGathererSkill
        • JokeSkill
        • MathSkill
        • McpGatewaySkill
        • NativeVectorSearchSkill
        • PlayBackgroundFileSkill
        • SpiderSkill
        • SwmlTransferSkill
        • WeatherApiSkill
        • WebSearchSkill
        • WikipediaSearchSkill
      • SwaigFunction
      • SwmlBuilder
      • SWMLService
    • RELAY
      • Overview
      • Actions
      • Call
      • Constants
      • Events
      • Message
      • RelayClient
      • RelayError
    • REST Client
      • Overview
      • Addresses
      • Calling
      • ChatResource
      • Compat
      • Datasphere
      • Fabric
      • ImportedNumbersResource
      • Logs
      • LookupResource
      • MFA
      • Number Groups
      • Phone Numbers
      • Project
      • PubSubResource
      • Queues
      • Recordings
      • Registry
      • RestClient
      • RestError
      • Short Codes
      • SIP Profile
      • Verified Callers
      • Video
LogoLogoSignalWire Docs
Log inSign up
Support
On this page
  • tool_name
  • delay
  • concurrent_requests
  • timeout
  • max_pages
  • max_depth
  • extract_type
  • max_text_length
  • clean_text
  • selectors
  • follow_patterns
  • user_agent
  • headers
  • follow_robots_txt
  • cache_enabled
  • Example
AgentsSkills

SpiderSkill

|View as Markdown|Open in Claude|
Was this page helpful?
Edit this page
Previous

SwmlTransferSkill

Next
Built with

Fast web scraping and crawling. Extracts text, markdown, or structured data from any public URL, optionally following links up to a bounded depth. Uses cheerio for parsing and enforces an SSRF guard on crawl hops.

Class: SpiderSkill

Tools: scrape_url, crawl_site, extract_structured_data (each is prefixed with <tool_name>_ when tool_name is set).

Required packages: cheerio

Env vars: SWML_ALLOW_PRIVATE_URLS=true relaxes the SSRF guard for local testing.

Multi-instance: yes — set a distinct tool_name per instance.

tool_name
string

Prefix prepended to each emitted tool name (e.g., tool_name="news" gives news_scrape_url, news_crawl_site, news_extract_structured_data). Required when registering multiple instances on the same agent.

delay
numberDefaults to 0.1

Delay between requests in seconds (minimum 0).

concurrent_requests
integerDefaults to 5

Number of concurrent requests allowed (range 1-20).

timeout
integerDefaults to 5

Per-request timeout in seconds (range 1-60).

max_pages
integerDefaults to 1

Maximum number of pages to scrape (range 1-100).

max_depth
integerDefaults to 0

Maximum crawl depth. 0 restricts to a single page; range 0-5.

extract_type
stringDefaults to fast_text

Content extraction method. One of "fast_text", "clean_text", "full_text", "html", "markdown", "structured", "custom". Only fast_text, markdown, and structured are wired through the handlers in the TypeScript port; the others fall back to fast_text.

max_text_length
integerDefaults to 3000

Maximum extracted text length in characters (range 100-100000).

clean_text
booleanDefaults to true

Whether to clean extracted text (trim whitespace, collapse runs, etc.).

selectors
Record<string, string>Defaults to {}

Map of name → CSS selector used for structured extraction.

follow_patterns
string[]Defaults to []

URL patterns to follow when crawling.

user_agent
string

User-Agent header for outbound requests. Defaults to a Chrome-compatible UA string.

headers
Record<string, string>Defaults to {}

Additional HTTP headers sent with each request.

follow_robots_txt
booleanDefaults to false

Whether to respect robots.txt. Defaults to false to match Python’s runtime behavior.

cache_enabled
booleanDefaults to true

Whether to cache scraped pages in memory.

Example

1import { AgentBase, SpiderSkill } from '@signalwire/sdk';
2
3const agent = new AgentBase({ name: 'assistant', route: '/assistant' });
4agent.setPromptText('You are a research assistant.');
5
6await agent.addSkill(new SpiderSkill({
7 extract_type: 'markdown',
8 max_pages: 5,
9 max_depth: 1,
10 follow_patterns: ['/docs/'],
11}));
12
13agent.run();