|
| 1 | +# meta-ads-collector |
| 2 | + |
| 3 | +A Python library and CLI tool for collecting ads from the [Meta Ad Library](https://www.facebook.com/ads/library/) (Facebook/Instagram). |
| 4 | + |
| 5 | +Supports searching, filtering, pagination, and exporting ads to JSON, CSV, or JSONL. |
| 6 | + |
| 7 | +## Installation |
| 8 | + |
| 9 | +```bash |
| 10 | +pip install meta-ads-collector |
| 11 | +``` |
| 12 | + |
| 13 | +Or install from source: |
| 14 | + |
| 15 | +```bash |
| 16 | +git clone https://github.com/Yossef/meta-ads-collector.git |
| 17 | +cd meta-ads-collector |
| 18 | +pip install -e ".[dev]" |
| 19 | +``` |
| 20 | + |
| 21 | +## Quick Start |
| 22 | + |
| 23 | +### Python API |
| 24 | + |
| 25 | +```python |
| 26 | +from meta_ads_collector import MetaAdsCollector |
| 27 | + |
| 28 | +with MetaAdsCollector(proxy="host:port:user:pass") as collector: |
| 29 | + for ad in collector.search( |
| 30 | + query="real estate", |
| 31 | + country="US", |
| 32 | + ad_type=MetaAdsCollector.AD_TYPE_HOUSING, |
| 33 | + max_results=10, |
| 34 | + ): |
| 35 | + print(f"{ad.page.name}: {ad.id}") |
| 36 | + print(f" Impressions: {ad.impressions}") |
| 37 | + print(f" Spend: {ad.spend}") |
| 38 | +``` |
| 39 | + |
| 40 | +### CLI |
| 41 | + |
| 42 | +```bash |
| 43 | +# Search for ads and export to JSON |
| 44 | +meta-ads-collector --query "real estate" --country US --output ads.json |
| 45 | + |
| 46 | +# Political ads from Egypt as CSV |
| 47 | +meta-ads-collector --country EG --ad-type political --output egypt.csv |
| 48 | + |
| 49 | +# Limit results and use exact phrase matching |
| 50 | +meta-ads-collector --query "buy now" --search-type exact --max-results 500 --output results.jsonl |
| 51 | +``` |
| 52 | + |
| 53 | +## Configuration |
| 54 | + |
| 55 | +### Proxy |
| 56 | + |
| 57 | +Set the `META_ADS_PROXY` environment variable or pass `--proxy` on the CLI: |
| 58 | + |
| 59 | +```bash |
| 60 | +export META_ADS_PROXY="host:port:user:pass" |
| 61 | +``` |
| 62 | + |
| 63 | +Or create a `.env` file (see `.env.example`). |
| 64 | + |
| 65 | +### CLI Options |
| 66 | + |
| 67 | +| Flag | Description | Default | |
| 68 | +|------|-------------|---------| |
| 69 | +| `-q, --query` | Search query string | `""` (all ads) | |
| 70 | +| `-c, --country` | ISO 3166-1 alpha-2 country code | `US` | |
| 71 | +| `-t, --ad-type` | `all`, `political`, `housing`, `employment`, `credit` | `all` | |
| 72 | +| `-s, --status` | `active`, `inactive`, `all` | `active` | |
| 73 | +| `--search-type` | `keyword`, `exact`, `page` | `keyword` | |
| 74 | +| `--sort-by` | `relevancy`, `impressions` | `impressions` | |
| 75 | +| `-o, --output` | Output file path (`.json`, `.csv`, `.jsonl`) | **required** | |
| 76 | +| `-n, --max-results` | Max ads to collect | unlimited | |
| 77 | +| `--proxy` | Proxy (`host:port:user:pass`) | `META_ADS_PROXY` env | |
| 78 | +| `--no-proxy` | Disable proxy | `false` | |
| 79 | +| `--delay` | Delay between requests (seconds) | `2.0` | |
| 80 | +| `--timeout` | Request timeout (seconds) | `30` | |
| 81 | +| `-v, --verbose` | Debug logging | `false` | |
| 82 | + |
| 83 | +## Python API Reference |
| 84 | + |
| 85 | +### MetaAdsCollector |
| 86 | + |
| 87 | +```python |
| 88 | +from meta_ads_collector import MetaAdsCollector |
| 89 | + |
| 90 | +collector = MetaAdsCollector( |
| 91 | + proxy="host:port:user:pass", # optional |
| 92 | + rate_limit_delay=2.0, # seconds between requests |
| 93 | + timeout=30, # request timeout |
| 94 | +) |
| 95 | +``` |
| 96 | + |
| 97 | +**Methods:** |
| 98 | + |
| 99 | +- `search(...)` - Iterator yielding `Ad` objects |
| 100 | +- `collect(...)` - Returns `list[Ad]` |
| 101 | +- `collect_to_json(output_path, ...)` - Save to JSON file |
| 102 | +- `collect_to_csv(output_path, ...)` - Save to CSV file |
| 103 | +- `collect_to_jsonl(output_path, ...)` - Save to JSONL file |
| 104 | +- `get_stats()` - Collection statistics |
| 105 | + |
| 106 | +### Ad Model |
| 107 | + |
| 108 | +Each `Ad` object contains: |
| 109 | + |
| 110 | +- `id`, `page` (PageInfo), `is_active`, `ad_status` |
| 111 | +- `creatives` (list of AdCreative with body, title, image_url, video_url, etc.) |
| 112 | +- `impressions` (ImpressionRange), `spend` (SpendRange), `currency` |
| 113 | +- `publisher_platforms`, `languages`, `funding_entity`, `disclaimer` |
| 114 | +- `delivery_start_time`, `delivery_stop_time` |
| 115 | +- `age_gender_distribution`, `region_distribution` |
| 116 | + |
| 117 | +### Exceptions |
| 118 | + |
| 119 | +All exceptions inherit from `MetaAdsError`: |
| 120 | + |
| 121 | +| Exception | When | |
| 122 | +|-----------|------| |
| 123 | +| `AuthenticationError` | Session init or token extraction fails | |
| 124 | +| `RateLimitError` | API rate limit hit | |
| 125 | +| `SessionExpiredError` | Session expired and refresh failed | |
| 126 | +| `ProxyError` | Invalid proxy format or unreachable proxy | |
| 127 | +| `InvalidParameterError` | Invalid parameter value passed to API | |
| 128 | + |
| 129 | +## Export Formats |
| 130 | + |
| 131 | +- **JSON** - Full metadata + ads array, pretty-printed |
| 132 | +- **CSV** - Flattened schema (24 columns), one row per ad |
| 133 | +- **JSONL** - One JSON object per line, streaming-friendly |
| 134 | + |
| 135 | +## Development |
| 136 | + |
| 137 | +```bash |
| 138 | +# Install with dev dependencies |
| 139 | +pip install -e ".[dev]" |
| 140 | + |
| 141 | +# Run tests |
| 142 | +pytest |
| 143 | + |
| 144 | +# Lint |
| 145 | +ruff check . |
| 146 | + |
| 147 | +# Type check |
| 148 | +mypy meta_ads_collector/ |
| 149 | +``` |
| 150 | + |
| 151 | +## License |
| 152 | + |
| 153 | +[MIT](LICENSE) |
0 commit comments