Skip to content

Commit d017581

Browse files
committed
docs: add readme for firecrawl shopping scraper
1 parent bdf2ee5 commit d017581

1 file changed

Lines changed: 152 additions & 0 deletions

File tree

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Firecrawl Shopping Scraper
2+
3+
A product extraction agent built with the **Upsonic AI Agent Framework** and **FirecrawlTools**. Point it at any shopping website and it scrapes the page, extracts product names, prices, and descriptions, and returns the results as a clean, sorted table.
4+
5+
The example targets [books.toscrape.com](http://books.toscrape.com), a publicly available scraping-safe demo bookstore, but the same pattern works for any publicly accessible e-commerce site.
6+
7+
## Features
8+
9+
- **Single-page scraping**: Fetches a shop page and converts it to clean Markdown via Firecrawl
10+
- **LLM-powered extraction**: The agent reads the Markdown and pulls out structured product data without custom parsers or CSS selectors
11+
- **Minimal tool surface**: Only `scrape_url` is enabled so the agent cannot accidentally crawl, search, or batch-scrape
12+
- **Sorted output**: Products are returned as a Markdown table ordered by price descending, with a summary line showing total count and price range
13+
- **Extensible**: Switch to `crawl_website` for multi-page crawling or `extract_data` for schema-driven JSON extraction
14+
15+
## Prerequisites
16+
17+
- Python 3.10+
18+
- Firecrawl API key (sign up for free at [firecrawl.dev](https://firecrawl.dev))
19+
- Anthropic API key (or swap the model for any Upsonic-supported provider)
20+
21+
## Installation
22+
23+
1. Navigate to this directory:
24+
25+
```bash
26+
cd examples/firecrawl_shopping_scraper
27+
```
28+
29+
2. Create and activate a virtual environment:
30+
31+
```bash
32+
# With uv (recommended)
33+
uv venv && source .venv/bin/activate
34+
35+
# With pip
36+
python3 -m venv .venv && source .venv/bin/activate
37+
```
38+
39+
3. Install dependencies:
40+
41+
```bash
42+
# With uv
43+
uv pip install -r requirements.txt
44+
45+
# With pip
46+
pip install -r requirements.txt
47+
```
48+
49+
4. Set up your environment variables:
50+
51+
```bash
52+
cp .env.example .env
53+
```
54+
55+
Then open `.env` and fill in your keys:
56+
57+
```bash
58+
FIRECRAWL_API_KEY=fc-your-key-here
59+
ANTHROPIC_API_KEY=your-anthropic-key-here
60+
```
61+
62+
## Usage
63+
64+
Run the agent:
65+
66+
```bash
67+
python main.py
68+
# or
69+
uv run main.py
70+
```
71+
72+
Example output:
73+
74+
```
75+
Found 20 products · Price range: £10.00 - £59.69
76+
77+
| # | Book Title | Price | Rating |
78+
|----|----------------------------------------------|--------|--------|
79+
| 1 | Libertarianism for Beginners | £59.69 | Two |
80+
| 2 | It's Only the Himalayas | £52.29 | Two |
81+
| 3 | The Black Maria | £52.15 | One |
82+
| 4 | Starving Hearts (Triangular Trade Trilogy...) | £13.99 | Two |
83+
...
84+
```
85+
86+
To target a different shop, change the URL in the task description inside `main.py`:
87+
88+
```python
89+
task = Task(
90+
description="""
91+
Scrape https://your-target-shop.com and extract all visible products.
92+
For each product return name, price, and a short description (1-2 sentences).
93+
Format as a Markdown table sorted by price descending.
94+
"""
95+
)
96+
```
97+
98+
## Project Structure
99+
100+
```
101+
firecrawl_shopping_scraper/
102+
├── main.py # Agent setup and task definition
103+
├── requirements.txt # Python dependencies
104+
├── .env.example # Environment variable template
105+
└── README.md # This file
106+
```
107+
108+
## How It Works
109+
110+
1. **FirecrawlTools is configured** with only `scrape_url` enabled. This keeps the agent focused and prevents it from issuing unnecessary crawl or search calls.
111+
112+
2. **The task description** tells the agent what page to scrape and exactly what to extract. No custom parser is needed; the LLM reads the Markdown Firecrawl returns and identifies product blocks by structure and context.
113+
114+
3. **Firecrawl fetches the page** and returns it as clean Markdown, stripping navigation, ads, and boilerplate so the LLM gets a compact, structured representation of the content.
115+
116+
4. **The agent extracts and formats** each product row into a Markdown table, sorts by price descending, and prepends a summary line.
117+
118+
### Extending the example
119+
120+
To crawl multiple pages instead of just the homepage, enable `crawl_website`:
121+
122+
```python
123+
firecrawl = FirecrawlTools(
124+
enable_scrape=False,
125+
enable_crawl=True,
126+
enable_crawl_management=True,
127+
)
128+
129+
task = Task(
130+
description="""
131+
Crawl http://books.toscrape.com up to 5 pages and extract every product:
132+
name, price, and rating. Return a single Markdown table sorted by price descending.
133+
"""
134+
)
135+
```
136+
137+
To get structured JSON output directly from Firecrawl's LLM extraction layer, enable `extract_data`:
138+
139+
```python
140+
firecrawl = FirecrawlTools(
141+
enable_scrape=False,
142+
enable_extract=True,
143+
)
144+
145+
task = Task(
146+
description="""
147+
Use extract_data on http://books.toscrape.com/* with this schema:
148+
{"products": [{"name": "string", "price": "string", "rating": "string"}]}
149+
Return the raw result.
150+
"""
151+
)
152+
```

0 commit comments

Comments
 (0)