|
| 1 | +# Changelog |
| 2 | + |
| 3 | +All notable changes to this project will be documented in this file. |
| 4 | + |
| 5 | +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), |
| 6 | +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). |
| 7 | + |
| 8 | +## [1.0.0] - 2026-02-08 |
| 9 | + |
| 10 | +### Added |
| 11 | + |
| 12 | +#### Core |
| 13 | +- `MetaAdsCollector` high-level interface with `search()`, `collect()`, and export methods |
| 14 | +- `MetaAdsClient` low-level HTTP client with session management and GraphQL request handling |
| 15 | +- Browser fingerprint randomization across Chrome versions, platforms, viewports, and DPR values |
| 16 | +- Dynamic `doc_id` extraction from Ad Library page HTML with hardcoded fallbacks |
| 17 | +- Token extraction (LSD, CSRF, session IDs) with verification and fallback generation |
| 18 | +- Automatic session refresh on 403 responses with configurable max refresh attempts |
| 19 | +- Session staleness detection with 30-minute max age |
| 20 | +- Challenge/verification handling for Facebook's bot detection |
| 21 | + |
| 22 | +#### Search & Collection |
| 23 | +- Keyword search, exact phrase search, and page-level search modes |
| 24 | +- Page-level collection by URL (`collect_by_page_url`), by name (`collect_by_page_name`), and by ID (`collect_by_page_id`) |
| 25 | +- Typeahead page search (`search_pages`) for resolving page names to IDs |
| 26 | +- URL parser for extracting page IDs from Ad Library URLs, profile URLs, and numeric paths |
| 27 | +- Pagination with cursor-based traversal |
| 28 | +- Configurable page size, max results, sort order, and country |
| 29 | +- Ad type filtering: all, political, housing, employment, credit |
| 30 | +- Status filtering: active, inactive, all |
| 31 | +- Ad enrichment via detail/snapshot endpoint (`enrich_ad`) |
| 32 | +- Stream mode yielding lifecycle events alongside ads (`stream`) |
| 33 | + |
| 34 | +#### Filtering |
| 35 | +- `FilterConfig` dataclass with 11 filter fields |
| 36 | +- Impression range filters (min/max using conservative bound logic) |
| 37 | +- Spend range filters (min/max) |
| 38 | +- Date range filters (start_date, end_date) |
| 39 | +- Media type filter (image, video, meme, none) |
| 40 | +- Publisher platform filter (facebook, instagram, messenger, audience_network) |
| 41 | +- Language filter |
| 42 | +- Boolean filters: has_video, has_image |
| 43 | +- AND logic across all filters with missing-data-inclusive policy |
| 44 | + |
| 45 | +#### Deduplication |
| 46 | +- `DeduplicationTracker` with two modes: in-memory and persistent (SQLite) |
| 47 | +- `has_seen()` and `mark_seen()` for ad ID tracking |
| 48 | +- `get_last_collection_time()` and `update_collection_time()` for incremental collection |
| 49 | +- Context manager protocol with automatic save on exit |
| 50 | +- `count()` and `clear()` utility methods |
| 51 | + |
| 52 | +#### Media Downloads |
| 53 | +- `MediaDownloader` for downloading images, videos, and thumbnails from ad creatives |
| 54 | +- `MediaDownloadResult` frozen dataclass with success/failure details |
| 55 | +- File extension detection from URL path and Content-Type headers |
| 56 | +- Retry with exponential backoff on download failures |
| 57 | +- Skip-existing-file optimization |
| 58 | +- `collect_with_media()` convenience method on the collector |
| 59 | +- `download_ad_media()` for single-ad media downloads |
| 60 | + |
| 61 | +#### Events & Webhooks |
| 62 | +- `EventEmitter` with synchronous callback dispatch and exception isolation |
| 63 | +- 7 lifecycle event types: collection_started, ad_collected, page_fetched, error_occurred, rate_limited, session_refreshed, collection_finished |
| 64 | +- `Event` dataclass with event_type, data payload, and UTC timestamp |
| 65 | +- Convenience callback registration via `callbacks` parameter on collector init |
| 66 | +- `WebhookSender` for POSTing ad data to external HTTP endpoints |
| 67 | +- Retry with exponential backoff on webhook failures |
| 68 | +- Optional batch mode for webhook sends |
| 69 | + |
| 70 | +#### Async Support |
| 71 | +- `AsyncMetaAdsClient` using httpx for non-blocking HTTP |
| 72 | +- `AsyncMetaAdsCollector` mirroring the sync API with `async for` generators |
| 73 | +- Async `search()`, `collect()`, `collect_to_json()`, `collect_to_csv()`, `search_pages()` |
| 74 | +- Optional dependency: `pip install meta-ads-collector[async]` |
| 75 | + |
| 76 | +#### Proxy Support |
| 77 | +- Single proxy configuration (host:port or host:port:user:pass) |
| 78 | +- `ProxyPool` with round-robin selection across multiple proxies |
| 79 | +- Per-proxy failure tracking with configurable max failures threshold |
| 80 | +- Dead proxy cooldown with automatic revival |
| 81 | +- `ProxyPool.from_file()` for loading proxies from text files |
| 82 | +- Proxy URL format detection (plain, URL, SOCKS5) |
| 83 | + |
| 84 | +#### Export |
| 85 | +- JSON export with metadata envelope (query, country, stats, timestamps) |
| 86 | +- CSV export with 25-column flattened schema |
| 87 | +- JSONL export (one JSON object per line) |
| 88 | +- Export methods: `collect_to_json()`, `collect_to_csv()`, `collect_to_jsonl()` |
| 89 | + |
| 90 | +#### Logging & Reporting |
| 91 | +- `setup_logging()` with text or JSON format selection |
| 92 | +- `JSONFormatter` producing single-line JSON log records |
| 93 | +- Optional file handler with automatic directory creation |
| 94 | +- `CollectionReport` dataclass with throughput metrics |
| 95 | +- `format_report()` for human-readable summary text |
| 96 | +- `format_report_json()` for machine-readable JSON output |
| 97 | + |
| 98 | +#### Data Models |
| 99 | +- `Ad` dataclass with 30+ fields covering all Ad Library data |
| 100 | +- `AdCreative` with body, title, description, link URL, image/video URLs, CTA |
| 101 | +- `PageInfo` with ID, name, profile picture, URL, likes, verification status |
| 102 | +- `PageSearchResult` for typeahead search results |
| 103 | +- `ImpressionRange` and `SpendRange` with lower/upper bounds |
| 104 | +- `AudienceDistribution` for demographic and regional data |
| 105 | +- `SearchResult` for paginated result sets |
| 106 | +- `Ad.from_graphql_response()` parser handling multiple response formats |
| 107 | + |
| 108 | +#### CLI |
| 109 | +- Full CLI with 35+ flags via argparse |
| 110 | +- All search parameters, filtering, proxy, dedup, media, enrichment, webhook, logging, and reporting flags |
| 111 | +- `python -m meta_ads_collector` entry point |
| 112 | +- `meta-ads-collector` console script |
| 113 | +- Page search mode (`--search-pages`) |
| 114 | +- Page collection modes (`--page-url`, `--page-name`) |
| 115 | + |
| 116 | +#### Exceptions |
| 117 | +- `MetaAdsError` base exception |
| 118 | +- `AuthenticationError` for session/token failures |
| 119 | +- `RateLimitError` with retry_after attribute |
| 120 | +- `SessionExpiredError` for unrecoverable session failures |
| 121 | +- `ProxyError` for proxy configuration issues |
| 122 | +- `InvalidParameterError` with param name, value, and allowed values |
| 123 | + |
| 124 | +#### Infrastructure |
| 125 | +- PEP 561 `py.typed` marker for type checking support |
| 126 | +- CI pipeline with Python 3.9-3.13 matrix testing |
| 127 | +- Automated PyPI publishing on GitHub release |
| 128 | +- 642 tests covering all modules |
| 129 | + |
| 130 | +[1.0.0]: https://github.com/Yossef/meta-ads-collector/releases/tag/v1.0.0 |
0 commit comments