Skip to content

scraperapi/llama-index-tools-scraperapi

Repository files navigation

ScraperAPI LlamaIndex Tools Integration

This tool connects to ScraperAPI, a web scraping API that handles proxies, browsers, and CAPTCHAs, enabling your LlamaIndex agent to scrape web pages and extract structured data from Amazon, Google, eBay, Walmart, and Redfin.

Installation

pip install llama-index-tools-scraperapi

Usage

import asyncio
import os
from llama_index.tools.scraperapi import ScraperAPIToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

async def main():
    scraper_tool = ScraperAPIToolSpec(
        api_key=os.environ["SCRAPERAPI_API_KEY"],
    )
    agent = FunctionAgent(
        tools=scraper_tool.to_tool_list(),
        llm=OpenAI(model="gpt-4.1"),
    )

    response = await agent.run(
        "Scrape https://example.com and summarize the content"
    )
    print(response)

asyncio.run(main())

Scrape a Web Page

from llama_index.tools.scraperapi import ScraperAPIToolSpec

tool = ScraperAPIToolSpec(api_key=os.environ["SCRAPERAPI_API_KEY"])

# Returns markdown content by default
content = tool.scrape("https://example.com")
print(content)

# Get plain text instead
content = tool.scrape("https://example.com", output_format="text")

# Enable JS rendering for dynamic pages
content = tool.scrape("https://example.com", render=True)

Amazon

# Product details by ASIN
product = tool.amazon_product(asin="B07FTKQ97Q")

# Search products
results = tool.amazon_search(query="wireless headphones")

# All seller offers for a product
offers = tool.amazon_offers(asin="B07FTKQ97Q")

Google

# Web search (structured SERP)
results = tool.google_search(query="Python web scraping tutorial")

# Shopping results
shopping = tool.google_shopping(query="laptop")

# News articles
news = tool.google_news(query="AI", tbs="w")  # past week

# Maps / places search
places = tool.google_maps_search(query="pizza", latitude=40.7128, longitude=-74.0060)

# Job listings
jobs = tool.google_jobs(query="python developer", gl="us")

eBay

# Product details
product = tool.ebay_product(product_id="166619046796")

# Search with filters
results = tool.ebay_search(query="vintage watch", sort_by="price_lowest", condition="used")

Walmart

# Product details
product = tool.walmart_product(product_id="5253396052")

# Search
results = tool.walmart_search(query="laptop")

# Browse category
items = tool.walmart_category(category="3944_1089430_37807")

# Product reviews
reviews = tool.walmart_reviews(product_id="5253396052", sort="helpful")

Redfin

# Search listings
listings = tool.redfin_search(url="https://www.redfin.com/city/30749/CA/San-Francisco")

# Agent details
agent = tool.redfin_agent(url="https://www.redfin.com/real-estate-agents/agent-name")

# For-sale listing
listing = tool.redfin_forsale(url="https://www.redfin.com/CA/San-Francisco/123-Main-St")

# For-rent listing
rental = tool.redfin_forrent(url="https://www.redfin.com/CA/San-Francisco/456-Oak-Ave")

Geo-targeted Scraping

tool = ScraperAPIToolSpec(
    api_key=os.environ["SCRAPERAPI_API_KEY"],
    country_code="uk",
)

# All requests will use UK proxies by default
content = tool.scrape("https://example.co.uk")

# Override per request
content = tool.scrape("https://example.de", country_code="de")

Available Tools

Scraping:

  • scrape: Scrape any web page and return content as markdown, text, or JSON.

Amazon (Structured Data):

  • amazon_product: Get product details by ASIN.
  • amazon_search: Search Amazon products.
  • amazon_offers: Get all seller offers for a product.

Google (Structured Data):

  • google_search: Google SERP search results.
  • google_shopping: Google Shopping product results.
  • google_news: Google News articles.
  • google_maps_search: Google Maps places search.
  • google_jobs: Google Jobs listings.

eBay (Structured Data):

  • ebay_product: Get product details by product ID.
  • ebay_search: Search eBay listings.

Redfin (Structured Data):

  • redfin_search: Search Redfin listings.
  • redfin_agent: Get agent profile details.
  • redfin_forsale: Get for-sale listing details.
  • redfin_forrent: Get for-rent listing details.

Walmart (Structured Data):

  • walmart_product: Get product details by product ID.
  • walmart_search: Search Walmart products.
  • walmart_category: Browse a Walmart category.
  • walmart_reviews: Get product reviews.

Error Handling

All API errors raise ScraperAPIError, so you can handle them specifically:

from llama_index.tools.scraperapi import ScraperAPIToolSpec, ScraperAPIError

tool = ScraperAPIToolSpec(api_key=os.environ["SCRAPERAPI_API_KEY"])

try:
    result = tool.scrape("https://example.com")
except ScraperAPIError as e:
    print(f"Scraping failed: {e}")

Configuration

Parameter Type Default Description
api_key str required ScraperAPI key
render bool False Enable JS rendering by default
country_code str None Default geo-targeting country code
device_type str None "desktop" or "mobile"
timeout int 70 Request timeout in seconds

About

LlamaIndex tools integration for ScraperAPI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors