A TypeScript package providing robust web scraping and crawling tools for AI SDK agents. Supports multiple providers (Exa, Firecrawl, Cheerio) with automatic fallback on rate limits.
npm install ai-sdk-agents-universal-scraper-toolimport { scrapeTool } from "ai-sdk-agents-universal-scraper-tool";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
const result = await generateText({
model: openai("gpt-4o-mini"),
prompt: "Scrape and summarize the content from https://example.com",
tools: {
scrapeTool,
},
});import { createScrapeTool } from "ai-sdk-agents-universal-scraper-tool";
// Create a tool with default provider and settings
const customScrapeTool = createScrapeTool({
defaultProvider: "exa",
defaultMaxChars: 5000,
defaultMarkdown: true,
});import {
crawlTool,
crawlBatchTool,
} from "ai-sdk-agents-universal-scraper-tool";
// Crawl a single page with subpage discovery
const crawlResult = await crawlTool.execute({
url: "https://example.com",
maxSubpages: 5,
maxDepth: 2,
});
// Crawl multiple URLs in batch
const batchResult = await crawlBatchTool.execute({
urls: ["https://example.com", "https://another.com"],
maxSubpages: 3,
});- Multiple Providers: Supports Exa, Firecrawl, and Cheerio
- Automatic Fallback: Automatically falls back to alternative providers on rate limits
- Flexible Output: Returns markdown, HTML, or plain text
- Subpage Discovery: Crawl tools can discover and crawl linked subpages
- Configurable: Customize defaults for provider, max characters, format, and more
- Clone the repository
- Install dependencies:
pnpm install- Create a
.env.localfile with your API keys:
# Exa API key (optional)
EXA_API_KEY=your_exa_api_key
# Firecrawl API key (optional)
FIRECRAWL_API_KEY=your_firecrawl_api_keyNote: The tools will automatically fall back to Cheerio (no API key required) if other providers are unavailable.
Test your tool locally:
pnpm testBuild the package:
pnpm buildBefore publishing, update the package name in package.json to your desired package name.
The package automatically builds before publishing:
pnpm publish.
├── src/
│ ├── index.ts # Tool exports
│ ├── scraper-tool.ts # Scraping tool implementation
│ ├── crawler-tool.ts # Crawling tool implementation
│ ├── scraper.ts # Scraping logic
│ ├── crawler.ts # Crawling logic
│ └── *.test.ts # Test files
├── dist/ # Build output (generated)
├── package.json
├── tsconfig.json
└── README.md
scrapeTool- Default scraping tool instancecreateScrapeTool()- Create a custom scraping tool with defaultscrawlTool- Default crawling tool instance (single URL)createCrawlTool()- Create a custom crawling tool with defaultscrawlBatchTool- Default batch crawling tool instance (multiple URLs)createCrawlBatchTool()- Create a custom batch crawling tool with defaults
- Requires
EXA_API_KEYenvironment variable - Supports live crawling with configurable preferences
- High-quality content extraction
- Requires
FIRECRAWL_API_KEYenvironment variable - Supports caching with configurable max age
- Good for structured content extraction
- No API key required (local processing)
- Fast and reliable fallback option
- Supports subpage crawling with configurable concurrency
ISC