promisingcoder
diff --git a/‎.github/ISSUE_TEMPLATE/bug_report.md‎
Lines changed: 50 additions & 0 deletions b/‎.github/ISSUE_TEMPLATE/bug_report.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎.github/ISSUE_TEMPLATE/feature_request.md‎
Lines changed: 25 additions & 0 deletions b/‎.github/ISSUE_TEMPLATE/feature_request.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 29 additions & 0 deletions b/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 11 additions & 3 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 11 additions & 3 deletions
diff --git a/‎.github/workflows/publish.yml‎
Lines changed: 5 additions & 0 deletions b/‎.github/workflows/publish.yml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 41 additions & 7 deletions b/‎.gitignore‎
Lines changed: 41 additions & 7 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 130 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 130 additions & 0 deletions
diff --git a/‎MANIFEST.in‎
Lines changed: 6 additions & 0 deletions b/‎MANIFEST.in‎
Lines changed: 6 additions & 0 deletions
@@ -0,0 +1,50 @@
+---
+name: Bug Report
+about: Report a bug in meta-ads-collector
+title: "[Bug] "
+labels: bug
+assignees: ""
+---
+
+## Describe the bug
+
+A clear and concise description of what the bug is.
+
+## To reproduce
+
+Steps to reproduce the behavior:
+
+1. Install version `...`
+2. Run this code / command:
+
+```python
+# Your code here
+```
+
+or
+
+```bash
+# Your CLI command here
+```
+
+3. See error
+
+## Expected behavior
+
+What you expected to happen.
+
+## Actual behavior
+
+What actually happened. Include the full error message or traceback if applicable.
+
+## Environment
+
+- OS: [e.g., Windows 11, Ubuntu 22.04, macOS 14]
+- Python version: [e.g., 3.12.1]
+- meta-ads-collector version: [e.g., 1.0.0]
+- Using proxy: [yes/no]
+- Using async: [yes/no]
+
+## Additional context
+
+Add any other context about the problem here (log output, screenshots, etc.).
@@ -0,0 +1,25 @@
+---
+name: Feature Request
+about: Suggest a new feature for meta-ads-collector
+title: "[Feature] "
+labels: enhancement
+assignees: ""
+---
+
+## Problem
+
+A clear and concise description of the problem this feature would solve.
+
+Example: "I want to be able to ..."
+
+## Proposed solution
+
+Describe the solution you'd like to see.
+
+## Alternatives considered
+
+Describe any alternative solutions or workarounds you've considered.
+
+## Additional context
+
+Any other context, examples, or references that would help understand the request.
@@ -0,0 +1,29 @@
+## Summary
+
+Brief description of what this PR does.
+
+## Changes
+
+- Change 1
+- Change 2
+
+## Type of change
+
+- [ ] Bug fix (non-breaking change that fixes an issue)
+- [ ] New feature (non-breaking change that adds functionality)
+- [ ] Breaking change (fix or feature that would cause existing functionality to change)
+- [ ] Documentation update
+- [ ] Refactoring (no functional changes)
+
+## Checklist
+
+- [ ] My code follows the project's style guidelines (`ruff check .` passes)
+- [ ] I have added tests that cover my changes
+- [ ] All new and existing tests pass (`pytest` passes)
+- [ ] Type checking passes (`mypy meta_ads_collector/ --ignore-missing-imports`)
+- [ ] I have updated the documentation if needed
+- [ ] I have updated `CHANGELOG.md` if this is a user-facing change
+
+## Test plan
+
+How was this tested?
@@ -10,6 +10,7 @@ jobs:
   test:
     runs-on: ubuntu-latest
     strategy:
+      fail-fast: false
       matrix:
         python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
 
@@ -24,13 +25,20 @@ jobs:
       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip
-          pip install -e ".[dev]"
+          pip install -e ".[dev,async]"
 
       - name: Lint with ruff
         run: ruff check .
 
       - name: Type check with mypy
         run: mypy meta_ads_collector/ --ignore-missing-imports
 
-      - name: Run tests
-        run: pytest --cov=meta_ads_collector --cov-report=term-missing
+      - name: Run tests with coverage
+        run: pytest --cov=meta_ads_collector --cov-report=term-missing --cov-report=xml
+
+      - name: Upload coverage
+        if: matrix.python-version == '3.12'
+        uses: actions/upload-artifact@v4
+        with:
+          name: coverage-report
+          path: coverage.xml
@@ -24,5 +24,10 @@ jobs:
       - name: Build package
         run: python -m build
 
+      - name: Verify package
+        run: |
+          pip install dist/*.whl
+          python -c "import meta_ads_collector; print(meta_ads_collector.__version__)"
+
       - name: Publish to PyPI
         uses: pypa/gh-action-pypi-publish@release/v1
@@ -1,25 +1,40 @@
 # Python
 __pycache__/
 *.py[cod]
+*$py.class
+*.so
 *.egg-info/
+*.egg
 dist/
 build/
-*.egg
+sdist/
+wheels/
 
 # Virtual environments
 venv/
 .venv/
 env/
+.env/
 
-# Output data
-*.json
-*.csv
-*.jsonl
-output/
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+coverage.xml
+*.cover
+
+# Type checking
+.mypy_cache/
+
+# Linting
+.ruff_cache/
 
 # IDE
 .vscode/
 .idea/
+*.swp
+*.swo
+*~
 
 # Claude Code
 .claude/
@@ -28,6 +43,25 @@ output/
 NUL
 Thumbs.db
 .DS_Store
+Desktop.ini
 
-# Environment
+# Environment files
 .env
+.env.local
+.env.*.local
+
+# Output data
+*.json
+!.github/**/*.json
+*.csv
+*.jsonl
+output/
+ad_media/
+
+# SQLite state files
+*.db
+*.sqlite
+*.sqlite3
+
+# Log files
+*.log
@@ -0,0 +1,130 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [1.0.0] - 2026-02-08
+
+### Added
+
+#### Core
+- `MetaAdsCollector` high-level interface with `search()`, `collect()`, and export methods
+- `MetaAdsClient` low-level HTTP client with session management and GraphQL request handling
+- Browser fingerprint randomization across Chrome versions, platforms, viewports, and DPR values
+- Dynamic `doc_id` extraction from Ad Library page HTML with hardcoded fallbacks
+- Token extraction (LSD, CSRF, session IDs) with verification and fallback generation
+- Automatic session refresh on 403 responses with configurable max refresh attempts
+- Session staleness detection with 30-minute max age
+- Challenge/verification handling for Facebook's bot detection
+
+#### Search & Collection
+- Keyword search, exact phrase search, and page-level search modes
+- Page-level collection by URL (`collect_by_page_url`), by name (`collect_by_page_name`), and by ID (`collect_by_page_id`)
+- Typeahead page search (`search_pages`) for resolving page names to IDs
+- URL parser for extracting page IDs from Ad Library URLs, profile URLs, and numeric paths
+- Pagination with cursor-based traversal
+- Configurable page size, max results, sort order, and country
+- Ad type filtering: all, political, housing, employment, credit
+- Status filtering: active, inactive, all
+- Ad enrichment via detail/snapshot endpoint (`enrich_ad`)
+- Stream mode yielding lifecycle events alongside ads (`stream`)
+
+#### Filtering
+- `FilterConfig` dataclass with 11 filter fields
+- Impression range filters (min/max using conservative bound logic)
+- Spend range filters (min/max)
+- Date range filters (start_date, end_date)
+- Media type filter (image, video, meme, none)
+- Publisher platform filter (facebook, instagram, messenger, audience_network)
+- Language filter
+- Boolean filters: has_video, has_image
+- AND logic across all filters with missing-data-inclusive policy
+
+#### Deduplication
+- `DeduplicationTracker` with two modes: in-memory and persistent (SQLite)
+- `has_seen()` and `mark_seen()` for ad ID tracking
+- `get_last_collection_time()` and `update_collection_time()` for incremental collection
+- Context manager protocol with automatic save on exit
+- `count()` and `clear()` utility methods
+
+#### Media Downloads
+- `MediaDownloader` for downloading images, videos, and thumbnails from ad creatives
+- `MediaDownloadResult` frozen dataclass with success/failure details
+- File extension detection from URL path and Content-Type headers
+- Retry with exponential backoff on download failures
+- Skip-existing-file optimization
+- `collect_with_media()` convenience method on the collector
+- `download_ad_media()` for single-ad media downloads
+
+#### Events & Webhooks
+- `EventEmitter` with synchronous callback dispatch and exception isolation
+- 7 lifecycle event types: collection_started, ad_collected, page_fetched, error_occurred, rate_limited, session_refreshed, collection_finished
+- `Event` dataclass with event_type, data payload, and UTC timestamp
+- Convenience callback registration via `callbacks` parameter on collector init
+- `WebhookSender` for POSTing ad data to external HTTP endpoints
+- Retry with exponential backoff on webhook failures
+- Optional batch mode for webhook sends
+
+#### Async Support
+- `AsyncMetaAdsClient` using httpx for non-blocking HTTP
+- `AsyncMetaAdsCollector` mirroring the sync API with `async for` generators
+- Async `search()`, `collect()`, `collect_to_json()`, `collect_to_csv()`, `search_pages()`
+- Optional dependency: `pip install meta-ads-collector[async]`
+
+#### Proxy Support
+- Single proxy configuration (host:port or host:port:user:pass)
+- `ProxyPool` with round-robin selection across multiple proxies
+- Per-proxy failure tracking with configurable max failures threshold
+- Dead proxy cooldown with automatic revival
+- `ProxyPool.from_file()` for loading proxies from text files
+- Proxy URL format detection (plain, URL, SOCKS5)
+
+#### Export
+- JSON export with metadata envelope (query, country, stats, timestamps)
+- CSV export with 25-column flattened schema
+- JSONL export (one JSON object per line)
+- Export methods: `collect_to_json()`, `collect_to_csv()`, `collect_to_jsonl()`
+
+#### Logging & Reporting
+- `setup_logging()` with text or JSON format selection
+- `JSONFormatter` producing single-line JSON log records
+- Optional file handler with automatic directory creation
+- `CollectionReport` dataclass with throughput metrics
+- `format_report()` for human-readable summary text
+- `format_report_json()` for machine-readable JSON output
+
+#### Data Models
+- `Ad` dataclass with 30+ fields covering all Ad Library data
+- `AdCreative` with body, title, description, link URL, image/video URLs, CTA
+- `PageInfo` with ID, name, profile picture, URL, likes, verification status
+- `PageSearchResult` for typeahead search results
+- `ImpressionRange` and `SpendRange` with lower/upper bounds
+- `AudienceDistribution` for demographic and regional data
+- `SearchResult` for paginated result sets
+- `Ad.from_graphql_response()` parser handling multiple response formats
+
+#### CLI
+- Full CLI with 35+ flags via argparse
+- All search parameters, filtering, proxy, dedup, media, enrichment, webhook, logging, and reporting flags
+- `python -m meta_ads_collector` entry point
+- `meta-ads-collector` console script
+- Page search mode (`--search-pages`)
+- Page collection modes (`--page-url`, `--page-name`)
+
+#### Exceptions
+- `MetaAdsError` base exception
+- `AuthenticationError` for session/token failures
+- `RateLimitError` with retry_after attribute
+- `SessionExpiredError` for unrecoverable session failures
+- `ProxyError` for proxy configuration issues
+- `InvalidParameterError` with param name, value, and allowed values
+
+#### Infrastructure
+- PEP 561 `py.typed` marker for type checking support
+- CI pipeline with Python 3.9-3.13 matrix testing
+- Automated PyPI publishing on GitHub release
+- 642 tests covering all modules
+
+[1.0.0]: https://github.com/Yossef/meta-ads-collector/releases/tag/v1.0.0
@@ -0,0 +1,6 @@
+include LICENSE
+include README.md
+include CHANGELOG.md
+include pyproject.toml
+include meta_ads_collector/py.typed
+recursive-include docs *.md