All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- curl_cffi is now the sole HTTP dependency. Removed
requestsandhttpxentirely. Facebook blocks any HTTP client without Chrome-like TLS fingerprints, socurl_cffi(withimpersonate="chrome") is the only backend that actually works. Bothrequestsandhttpxwere dead-weight fallbacks that failed with HTTP 403. - Simplified installation.
pip install meta-ads-collectornow includes everything -- no more[stealth]or[async]extras needed. - Async client simplified. Removed the
_AsyncResponsewrapper and httpx fallback code paths. The async client now usescurl_cffi.AsyncSessiondirectly. - Renamed
ProxyPool.get_requests_proxies()toget_proxy_dict().
requestsdependency (was required, now removed)httpxdependency and[async]optional extra[stealth]optional extra (curl_cffi is now always installed)types-requestsfrom dev dependencies
- Async client: Rewrote to use
curl_cffi.AsyncSessionfor TLS fingerprint impersonation, fixing 403 blocks from Facebook. Falls back tohttpx.AsyncClientwhen curl_cffi is not installed. - Async client: Added 403 verification challenge handling (same as sync client), enabling the async client to work without proxies.
- Political ads parsing: Fixed
'str' object has no attribute 'get'crash when parsing political ads wherespendis a string (e.g.,"$9K-$10K") instead of a dict. - Impression text parsing: Handle
impressions_with_indexformat ({"impressions_text": ">1M", "impressions_index": 39}) returned for political ads. - Publisher platform: Added
publisher_platform(singular) key lookup alongside pluralpublisher_platforms. - Delivery dates: Added
start_date/end_datekey lookups for delivery times. - Reach parsing: Added
_parse_reachclassmethod to handlereach/reach_estimatein string and dict formats. - Audience distributions: Added
isinstance(item, dict)guards for demographic and region distribution parsing. - Estimated audience size: Added
isinstance(dict)check before calling.get().
- Python 3.9 compatibility: Added
from __future__ import annotationstomodels.pyand modernized all type annotations fromOptional[X]toX | None. - Async transport: The async client now prefers
curl_cffi.AsyncSessionoverhttpx.AsyncClientwhen both are installed, matching the sync client's TLS impersonation behavior.
- Version bump for PyPI release.
1.0.0 - 2026-02-08
MetaAdsCollectorhigh-level interface withsearch(),collect(), and export methodsMetaAdsClientlow-level HTTP client with session management and GraphQL request handling- Browser fingerprint randomization across Chrome versions, platforms, viewports, and DPR values
- Dynamic
doc_idextraction from Ad Library page HTML with hardcoded fallbacks - Token extraction (LSD, CSRF, session IDs) with verification and fallback generation
- Automatic session refresh on 403 responses with configurable max refresh attempts
- Session staleness detection with 30-minute max age
- Challenge/verification handling for Facebook's bot detection
- Keyword search, exact phrase search, and page-level search modes
- Page-level collection by URL (
collect_by_page_url), by name (collect_by_page_name), and by ID (collect_by_page_id) - Typeahead page search (
search_pages) for resolving page names to IDs - URL parser for extracting page IDs from Ad Library URLs, profile URLs, and numeric paths
- Pagination with cursor-based traversal
- Configurable page size, max results, sort order, and country
- Ad type filtering: all, political, housing, employment, credit
- Status filtering: active, inactive, all
- Ad enrichment via detail/snapshot endpoint (
enrich_ad) - Stream mode yielding lifecycle events alongside ads (
stream)
FilterConfigdataclass with 11 filter fields- Impression range filters (min/max using conservative bound logic)
- Spend range filters (min/max)
- Date range filters (start_date, end_date)
- Media type filter (image, video, meme, none)
- Publisher platform filter (facebook, instagram, messenger, audience_network)
- Language filter
- Boolean filters: has_video, has_image
- AND logic across all filters with missing-data-inclusive policy
DeduplicationTrackerwith two modes: in-memory and persistent (SQLite)has_seen()andmark_seen()for ad ID trackingget_last_collection_time()andupdate_collection_time()for incremental collection- Context manager protocol with automatic save on exit
count()andclear()utility methods
MediaDownloaderfor downloading images, videos, and thumbnails from ad creativesMediaDownloadResultfrozen dataclass with success/failure details- File extension detection from URL path and Content-Type headers
- Retry with exponential backoff on download failures
- Skip-existing-file optimization
collect_with_media()convenience method on the collectordownload_ad_media()for single-ad media downloads
EventEmitterwith synchronous callback dispatch and exception isolation- 7 lifecycle event types: collection_started, ad_collected, page_fetched, error_occurred, rate_limited, session_refreshed, collection_finished
Eventdataclass with event_type, data payload, and UTC timestamp- Convenience callback registration via
callbacksparameter on collector init WebhookSenderfor POSTing ad data to external HTTP endpoints- Retry with exponential backoff on webhook failures
- Optional batch mode for webhook sends
AsyncMetaAdsClientwithcurl_cffi.AsyncSessionAsyncMetaAdsCollectormirroring the sync API withasync forgenerators- Async
search(),collect(),collect_to_json(),collect_to_csv(),search_pages()
- Single proxy configuration (host:port or host:port:user:pass)
ProxyPoolwith round-robin selection across multiple proxies- Per-proxy failure tracking with configurable max failures threshold
- Dead proxy cooldown with automatic revival
ProxyPool.from_file()for loading proxies from text files- Proxy URL format detection (plain, URL, SOCKS5)
- JSON export with metadata envelope (query, country, stats, timestamps)
- CSV export with 25-column flattened schema
- JSONL export (one JSON object per line)
- Export methods:
collect_to_json(),collect_to_csv(),collect_to_jsonl()
setup_logging()with text or JSON format selectionJSONFormatterproducing single-line JSON log records- Optional file handler with automatic directory creation
CollectionReportdataclass with throughput metricsformat_report()for human-readable summary textformat_report_json()for machine-readable JSON output
Addataclass with 30+ fields covering all Ad Library dataAdCreativewith body, title, description, link URL, image/video URLs, CTAPageInfowith ID, name, profile picture, URL, likes, verification statusPageSearchResultfor typeahead search resultsImpressionRangeandSpendRangewith lower/upper boundsAudienceDistributionfor demographic and regional dataSearchResultfor paginated result setsAd.from_graphql_response()parser handling multiple response formats
- Full CLI with 35+ flags via argparse
- All search parameters, filtering, proxy, dedup, media, enrichment, webhook, logging, and reporting flags
python -m meta_ads_collectorentry pointmeta-ads-collectorconsole script- Page search mode (
--search-pages) - Page collection modes (
--page-url,--page-name)
MetaAdsErrorbase exceptionAuthenticationErrorfor session/token failuresRateLimitErrorwith retry_after attributeSessionExpiredErrorfor unrecoverable session failuresProxyErrorfor proxy configuration issuesInvalidParameterErrorwith param name, value, and allowed values
- PEP 561
py.typedmarker for type checking support - CI pipeline with Python 3.9-3.13 matrix testing
- Automated PyPI publishing on GitHub release
- 642 tests covering all modules