Skip to content

Commit 39f7c15

Browse files
Refactor into production-grade pip-installable library
- Remove hardcoded proxy credentials, use META_ADS_PROXY env var - Add pyproject.toml with package metadata, CLI entry point, and dev deps - Move CLI from main.py into meta_ads_collector/cli.py with __main__.py - Move example.py to examples/basic_usage.py - Add custom exception hierarchy (MetaAdsError, AuthenticationError, etc.) - Extract magic numbers into constants.py module - Add input validation for all public API parameters - Add 62 unit tests covering models, client, collector, CLI, and exceptions - Add README.md, LICENSE (MIT), .env.example - Add GitHub Actions CI (Python 3.9-3.13) and PyPI publish workflow - Add py.typed marker for PEP 561 type checking support - Delete committed ads.json sample data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 93870fb commit 39f7c15

22 files changed

Lines changed: 1244 additions & 81 deletions

.env.example

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Meta Ads Collector Configuration
2+
# Copy this file to .env and fill in your values.
3+
4+
# Proxy in format host:port:user:pass or host:port (optional)
5+
# META_ADS_PROXY=host:port:user:pass
6+
7+
# Request timeout in seconds (default: 30)
8+
# META_ADS_TIMEOUT=30
9+
10+
# Delay between requests in seconds (default: 2.0)
11+
# META_ADS_DELAY=2.0

.github/workflows/ci.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v5
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
24+
- name: Install dependencies
25+
run: |
26+
python -m pip install --upgrade pip
27+
pip install -e ".[dev]"
28+
29+
- name: Lint with ruff
30+
run: ruff check .
31+
32+
- name: Type check with mypy
33+
run: mypy meta_ads_collector/ --ignore-missing-imports
34+
35+
- name: Run tests
36+
run: pytest --cov=meta_ads_collector --cov-report=term-missing

.github/workflows/publish.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Publish to PyPI
2+
3+
on:
4+
release:
5+
types: [published]
6+
7+
jobs:
8+
publish:
9+
runs-on: ubuntu-latest
10+
permissions:
11+
id-token: write
12+
13+
steps:
14+
- uses: actions/checkout@v4
15+
16+
- name: Set up Python
17+
uses: actions/setup-python@v5
18+
with:
19+
python-version: "3.12"
20+
21+
- name: Install build tools
22+
run: pip install build
23+
24+
- name: Build package
25+
run: python -m build
26+
27+
- name: Publish to PyPI
28+
uses: pypa/gh-action-pypi-publish@release/v1

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Yossef
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# meta-ads-collector
2+
3+
A Python library and CLI tool for collecting ads from the [Meta Ad Library](https://www.facebook.com/ads/library/) (Facebook/Instagram).
4+
5+
Supports searching, filtering, pagination, and exporting ads to JSON, CSV, or JSONL.
6+
7+
## Installation
8+
9+
```bash
10+
pip install meta-ads-collector
11+
```
12+
13+
Or install from source:
14+
15+
```bash
16+
git clone https://github.com/Yossef/meta-ads-collector.git
17+
cd meta-ads-collector
18+
pip install -e ".[dev]"
19+
```
20+
21+
## Quick Start
22+
23+
### Python API
24+
25+
```python
26+
from meta_ads_collector import MetaAdsCollector
27+
28+
with MetaAdsCollector(proxy="host:port:user:pass") as collector:
29+
for ad in collector.search(
30+
query="real estate",
31+
country="US",
32+
ad_type=MetaAdsCollector.AD_TYPE_HOUSING,
33+
max_results=10,
34+
):
35+
print(f"{ad.page.name}: {ad.id}")
36+
print(f" Impressions: {ad.impressions}")
37+
print(f" Spend: {ad.spend}")
38+
```
39+
40+
### CLI
41+
42+
```bash
43+
# Search for ads and export to JSON
44+
meta-ads-collector --query "real estate" --country US --output ads.json
45+
46+
# Political ads from Egypt as CSV
47+
meta-ads-collector --country EG --ad-type political --output egypt.csv
48+
49+
# Limit results and use exact phrase matching
50+
meta-ads-collector --query "buy now" --search-type exact --max-results 500 --output results.jsonl
51+
```
52+
53+
## Configuration
54+
55+
### Proxy
56+
57+
Set the `META_ADS_PROXY` environment variable or pass `--proxy` on the CLI:
58+
59+
```bash
60+
export META_ADS_PROXY="host:port:user:pass"
61+
```
62+
63+
Or create a `.env` file (see `.env.example`).
64+
65+
### CLI Options
66+
67+
| Flag | Description | Default |
68+
|------|-------------|---------|
69+
| `-q, --query` | Search query string | `""` (all ads) |
70+
| `-c, --country` | ISO 3166-1 alpha-2 country code | `US` |
71+
| `-t, --ad-type` | `all`, `political`, `housing`, `employment`, `credit` | `all` |
72+
| `-s, --status` | `active`, `inactive`, `all` | `active` |
73+
| `--search-type` | `keyword`, `exact`, `page` | `keyword` |
74+
| `--sort-by` | `relevancy`, `impressions` | `impressions` |
75+
| `-o, --output` | Output file path (`.json`, `.csv`, `.jsonl`) | **required** |
76+
| `-n, --max-results` | Max ads to collect | unlimited |
77+
| `--proxy` | Proxy (`host:port:user:pass`) | `META_ADS_PROXY` env |
78+
| `--no-proxy` | Disable proxy | `false` |
79+
| `--delay` | Delay between requests (seconds) | `2.0` |
80+
| `--timeout` | Request timeout (seconds) | `30` |
81+
| `-v, --verbose` | Debug logging | `false` |
82+
83+
## Python API Reference
84+
85+
### MetaAdsCollector
86+
87+
```python
88+
from meta_ads_collector import MetaAdsCollector
89+
90+
collector = MetaAdsCollector(
91+
proxy="host:port:user:pass", # optional
92+
rate_limit_delay=2.0, # seconds between requests
93+
timeout=30, # request timeout
94+
)
95+
```
96+
97+
**Methods:**
98+
99+
- `search(...)` - Iterator yielding `Ad` objects
100+
- `collect(...)` - Returns `list[Ad]`
101+
- `collect_to_json(output_path, ...)` - Save to JSON file
102+
- `collect_to_csv(output_path, ...)` - Save to CSV file
103+
- `collect_to_jsonl(output_path, ...)` - Save to JSONL file
104+
- `get_stats()` - Collection statistics
105+
106+
### Ad Model
107+
108+
Each `Ad` object contains:
109+
110+
- `id`, `page` (PageInfo), `is_active`, `ad_status`
111+
- `creatives` (list of AdCreative with body, title, image_url, video_url, etc.)
112+
- `impressions` (ImpressionRange), `spend` (SpendRange), `currency`
113+
- `publisher_platforms`, `languages`, `funding_entity`, `disclaimer`
114+
- `delivery_start_time`, `delivery_stop_time`
115+
- `age_gender_distribution`, `region_distribution`
116+
117+
### Exceptions
118+
119+
All exceptions inherit from `MetaAdsError`:
120+
121+
| Exception | When |
122+
|-----------|------|
123+
| `AuthenticationError` | Session init or token extraction fails |
124+
| `RateLimitError` | API rate limit hit |
125+
| `SessionExpiredError` | Session expired and refresh failed |
126+
| `ProxyError` | Invalid proxy format or unreachable proxy |
127+
| `InvalidParameterError` | Invalid parameter value passed to API |
128+
129+
## Export Formats
130+
131+
- **JSON** - Full metadata + ads array, pretty-printed
132+
- **CSV** - Flattened schema (24 columns), one row per ad
133+
- **JSONL** - One JSON object per line, streaming-friendly
134+
135+
## Development
136+
137+
```bash
138+
# Install with dev dependencies
139+
pip install -e ".[dev]"
140+
141+
# Run tests
142+
pytest
143+
144+
# Lint
145+
ruff check .
146+
147+
# Type check
148+
mypy meta_ads_collector/
149+
```
150+
151+
## License
152+
153+
[MIT](LICENSE)

example.py renamed to examples/basic_usage.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
"""
77

88
import logging
9+
import os
910
from meta_ads_collector import MetaAdsCollector, Ad
1011

1112
# Configure logging
@@ -14,8 +15,8 @@
1415
format="%(asctime)s - %(levelname)s - %(message)s"
1516
)
1617

17-
# Your proxy configuration
18-
PROXY = "REDACTED_PROXY_CREDENTIALS"
18+
# Load proxy from environment variable (set META_ADS_PROXY in your .env)
19+
PROXY = os.environ.get("META_ADS_PROXY")
1920

2021

2122
def example_basic_search():

meta_ads_collector/__init__.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,34 @@
1-
"""Meta Ads Library Collector - Scrapes ads from Facebook Ad Library"""
1+
"""Meta Ads Library Collector - Collect ads from the Facebook Ad Library."""
22

3-
from .models import Ad, AdCreative, AudienceDistribution, SpendRange, ImpressionRange
3+
from .models import Ad, AdCreative, AudienceDistribution, SpendRange, ImpressionRange, SearchResult
44
from .client import MetaAdsClient
55
from .collector import MetaAdsCollector
6+
from .exceptions import (
7+
MetaAdsError,
8+
AuthenticationError,
9+
RateLimitError,
10+
SessionExpiredError,
11+
ProxyError,
12+
InvalidParameterError,
13+
)
614

715
__version__ = "1.0.0"
816
__all__ = [
17+
# Models
918
"Ad",
1019
"AdCreative",
1120
"AudienceDistribution",
1221
"SpendRange",
1322
"ImpressionRange",
23+
"SearchResult",
24+
# Client & Collector
1425
"MetaAdsClient",
1526
"MetaAdsCollector",
27+
# Exceptions
28+
"MetaAdsError",
29+
"AuthenticationError",
30+
"RateLimitError",
31+
"SessionExpiredError",
32+
"ProxyError",
33+
"InvalidParameterError",
1634
]

meta_ads_collector/__main__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""Allow running the package directly: python -m meta_ads_collector"""
2+
3+
import sys
4+
5+
from .cli import main
6+
7+
sys.exit(main())

main.py renamed to meta_ads_collector/cli.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
11
#!/usr/bin/env python3
22
"""
3-
Meta Ads Library Collector - Main Entry Point
3+
Meta Ads Library Collector - CLI Entry Point
44
55
Usage:
6-
python main.py --query "real estate" --country US --max-results 100 --output ads.json
7-
python main.py --query "climate" --ad-type political --output political_ads.csv
6+
meta-ads-collector --query "real estate" --country US --max-results 100 --output ads.json
7+
python -m meta_ads_collector --query "climate" --ad-type political --output political_ads.csv
88
"""
99

1010
import argparse
1111
import logging
12+
import os
1213
import sys
1314
from pathlib import Path
1415

15-
from meta_ads_collector import MetaAdsCollector
16+
from . import MetaAdsCollector
1617

1718

1819
def setup_logging(verbose: bool = False) -> None:
@@ -33,16 +34,16 @@ def parse_args() -> argparse.Namespace:
3334
epilog="""
3435
Examples:
3536
# Search for real estate ads in the US
36-
python main.py --query "real estate" --country US --output ads.json
37+
meta-ads-collector --query "real estate" --country US --output ads.json
3738
3839
# Collect political ads from Egypt
39-
python main.py --country EG --ad-type political --output egypt_political.json
40+
meta-ads-collector --country EG --ad-type political --output egypt_political.json
4041
4142
# Export to CSV with a limit
42-
python main.py --query "loans" --max-results 500 --output loans.csv
43+
meta-ads-collector --query "loans" --max-results 500 --output loans.csv
4344
4445
# Use exact phrase matching
45-
python main.py --query "buy now" --search-type exact --output buy_now.json
46+
meta-ads-collector --query "buy now" --search-type exact --output buy_now.json
4647
""",
4748
)
4849

@@ -114,8 +115,8 @@ def parse_args() -> argparse.Namespace:
114115
# Connection options
115116
parser.add_argument(
116117
"--proxy",
117-
default="REDACTED_PROXY_CREDENTIALS",
118-
help="Proxy in format host:port:user:pass (default: configured proxy)",
118+
default=os.environ.get("META_ADS_PROXY"),
119+
help="Proxy in format host:port:user:pass (or set META_ADS_PROXY env var)",
119120
)
120121
parser.add_argument(
121122
"--timeout",
@@ -180,7 +181,7 @@ def map_search_type(search_type: str) -> str:
180181
def map_sort(sort_by: str):
181182
"""Map CLI sort to API constant."""
182183
mapping = {
183-
"relevancy": MetaAdsCollector.SORT_RELEVANCY, # None = server default
184+
"relevancy": MetaAdsCollector.SORT_RELEVANCY,
184185
"impressions": MetaAdsCollector.SORT_IMPRESSIONS,
185186
}
186187
return mapping.get(sort_by, MetaAdsCollector.SORT_IMPRESSIONS)

0 commit comments

Comments
 (0)