ScraperAPI LlamaIndex Tools Integration

This tool connects to ScraperAPI, a web scraping API that handles proxies, browsers, and CAPTCHAs, enabling your LlamaIndex agent to scrape web pages and extract structured data from Amazon, Google, eBay, Walmart, and Redfin.

Installation

pip install llama-index-tools-scraperapi

Usage

import asyncio
import os
from llama_index.tools.scraperapi import ScraperAPIToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

async def main():
    scraper_tool = ScraperAPIToolSpec(
        api_key=os.environ["SCRAPERAPI_API_KEY"],
    )
    agent = FunctionAgent(
        tools=scraper_tool.to_tool_list(),
        llm=OpenAI(model="gpt-4.1"),
    )

    response = await agent.run(
        "Scrape https://example.com and summarize the content"
    )
    print(response)

asyncio.run(main())

Scrape a Web Page

from llama_index.tools.scraperapi import ScraperAPIToolSpec

tool = ScraperAPIToolSpec(api_key=os.environ["SCRAPERAPI_API_KEY"])

# Returns markdown content by default
content = tool.scrape("https://example.com")
print(content)

# Get plain text instead
content = tool.scrape("https://example.com", output_format="text")

# Enable JS rendering for dynamic pages
content = tool.scrape("https://example.com", render=True)

Amazon

# Product details by ASIN
product = tool.amazon_product(asin="B07FTKQ97Q")

# Search products
results = tool.amazon_search(query="wireless headphones")

# All seller offers for a product
offers = tool.amazon_offers(asin="B07FTKQ97Q")

Google

# Web search (structured SERP)
results = tool.google_search(query="Python web scraping tutorial")

# Shopping results
shopping = tool.google_shopping(query="laptop")

# News articles
news = tool.google_news(query="AI", tbs="w")  # past week

# Maps / places search
places = tool.google_maps_search(query="pizza", latitude=40.7128, longitude=-74.0060)

# Job listings
jobs = tool.google_jobs(query="python developer", gl="us")

eBay

# Product details
product = tool.ebay_product(product_id="166619046796")

# Search with filters
results = tool.ebay_search(query="vintage watch", sort_by="price_lowest", condition="used")

Walmart

# Product details
product = tool.walmart_product(product_id="5253396052")

# Search
results = tool.walmart_search(query="laptop")

# Browse category
items = tool.walmart_category(category="3944_1089430_37807")

# Product reviews
reviews = tool.walmart_reviews(product_id="5253396052", sort="helpful")

Redfin

# Search listings
listings = tool.redfin_search(url="https://www.redfin.com/city/30749/CA/San-Francisco")

# Agent details
agent = tool.redfin_agent(url="https://www.redfin.com/real-estate-agents/agent-name")

# For-sale listing
listing = tool.redfin_forsale(url="https://www.redfin.com/CA/San-Francisco/123-Main-St")

# For-rent listing
rental = tool.redfin_forrent(url="https://www.redfin.com/CA/San-Francisco/456-Oak-Ave")

Geo-targeted Scraping

tool = ScraperAPIToolSpec(
    api_key=os.environ["SCRAPERAPI_API_KEY"],
    country_code="uk",
)

# All requests will use UK proxies by default
content = tool.scrape("https://example.co.uk")

# Override per request
content = tool.scrape("https://example.de", country_code="de")

Available Tools

Scraping:

scrape: Scrape any web page and return content as markdown, text, or JSON.

Amazon (Structured Data):

amazon_product: Get product details by ASIN.
amazon_search: Search Amazon products.
amazon_offers: Get all seller offers for a product.

Google (Structured Data):

google_search: Google SERP search results.
google_shopping: Google Shopping product results.
google_news: Google News articles.
google_maps_search: Google Maps places search.
google_jobs: Google Jobs listings.

eBay (Structured Data):

ebay_product: Get product details by product ID.
ebay_search: Search eBay listings.

Redfin (Structured Data):

redfin_search: Search Redfin listings.
redfin_agent: Get agent profile details.
redfin_forsale: Get for-sale listing details.
redfin_forrent: Get for-rent listing details.

Walmart (Structured Data):

walmart_product: Get product details by product ID.
walmart_search: Search Walmart products.
walmart_category: Browse a Walmart category.
walmart_reviews: Get product reviews.

Error Handling

All API errors raise ScraperAPIError, so you can handle them specifically:

from llama_index.tools.scraperapi import ScraperAPIToolSpec, ScraperAPIError

tool = ScraperAPIToolSpec(api_key=os.environ["SCRAPERAPI_API_KEY"])

try:
    result = tool.scrape("https://example.com")
except ScraperAPIError as e:
    print(f"Scraping failed: {e}")

Configuration

Parameter	Type	Default	Description
`api_key`	`str`	required	ScraperAPI key
`render`	`bool`	`False`	Enable JS rendering by default
`country_code`	`str`	`None`	Default geo-targeting country code
`device_type`	`str`	`None`	`"desktop"` or `"mobile"`
`timeout`	`int`	`70`	Request timeout in seconds

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
llama_index/tools/scraperapi		llama_index/tools/scraperapi
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScraperAPI LlamaIndex Tools Integration

Installation

Usage

Scrape a Web Page

Amazon

Google

eBay

Walmart

Redfin

Geo-targeted Scraping

Available Tools

Error Handling

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ScraperAPI LlamaIndex Tools Integration

Installation

Usage

Scrape a Web Page

Amazon

Google

eBay

Walmart

Redfin

Geo-targeted Scraping

Available Tools

Error Handling

Configuration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages