Skip to content

Latest commit

 

History

History
63 lines (50 loc) · 4.65 KB

File metadata and controls

63 lines (50 loc) · 4.65 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

2.0.0 - 2025-02-09

Added

  • Async API: async_collect(), async_collect_v2(), and stream_collect_v2() methods on GMapsExtractor for non-blocking collection with httpx.AsyncClient.
  • AsyncGMapsClient: Standalone async HTTP client for direct Google Maps requests without FastAPI server dependency.
  • GMapsClient: Sync direct HTTP client that bypasses the localhost FastAPI server entirely. Core installs no longer require FastAPI or uvicorn.
  • GMapsSettings: Centralized, composable settings dataclass replacing fragile module-level global monkey-patching.
  • Event system: EventEmitter and EventType for lifecycle hooks (collection start, cell complete, business found, errors, completion).
  • ProgressReporter: Pluggable progress output that attaches to the event system, replacing hardcoded print statements.
  • Async collector and enrichment: async_collector.py and async_enrichment.py for fully async collection pipelines.
  • Optional FastAPI: FastAPI server is now an optional dependency (pip install gmaps-extractor[server]). Core library works without it.
  • use_server parameter: Explicit opt-in for the legacy server-based architecture on GMapsExtractor.
  • Async context manager: async with GMapsExtractor(...) as ext: for async workflows.
  • Streaming collection: stream_collect_v2() yields businesses as they are found via async generator.
  • Comprehensive test suite: 570+ tests covering parsers, decoders, geo modules, events, progress, client, settings, async client, async collector, and async enrichment.
  • Cookie lifecycle improvements: Auto-retry with cookie refresh on 429/consent redirects, proactive refresh after 500 requests, fresh SOCS cookie generation.
  • Request freshness: Rotating User-Agent pool, full browser-like headers, epoch timestamp in search protobuf, anti-cache headers.
  • PEP 561 type marker: py.typed marker file for typed package support.
  • CI/CD pipelines: GitHub Actions workflows for linting, testing (Python 3.9-3.12 matrix), and PyPI publishing with trusted publishers.
  • Ruff linter and formatter: Configured for Python 3.9+ with pycodestyle, pyflakes, isort, pep8-naming, pyupgrade, bugbear, and simplify rules.
  • Mypy type checking: Configured with strict mode for public API modules (extractor.py, exceptions.py).
  • Structured logging: Library uses logging.NullHandler by default; verbose=True adds a StreamHandler at INFO level without stacking handlers.

Changed

  • Breaking: GMapsExtractor defaults to use_server=False (direct HTTP requests). Set use_server=True for the legacy FastAPI server architecture.
  • Breaking: auto_start_server parameter is deprecated in favor of use_server.
  • Minimum httpx version bumped to >=0.25.0.
  • Updated pyproject.toml to PEP 621 format with proper dependency groups (server, dev, docs).
  • URLs updated to point to github.com/promisingcoder/GoogleMapsCollector.

Fixed

  • Verbose handler stacking: shutdown() now removes the logging handler added by verbose=True, preventing duplicate log lines on repeated instantiation.
  • Config race condition: Lazy imports in __init__.py defer collector module import until after config.apply(), ensuring user-configured values are captured.

1.0.0 - 2024-12-01

Added

  • Initial release as pip-installable library (gmaps-extractor on PyPI).
  • GMapsExtractor class with collect() and collect_v2() methods.
  • CollectionResult wrapper with iteration, indexing, and slicing support.
  • Custom exception hierarchy (GMapsExtractorError, ServerError, BoundaryError, ConfigurationError, RateLimitError, AuthenticationError).
  • Console scripts: gmaps-collect, gmaps-collect-v2, gmaps-enrich-reviews, gmaps-server.
  • V1 collector with parallel grid search and deduplication.
  • V2 collector with checkpoint/resume, adaptive rate limiting, parallel enrichment, JSONL streaming, and retry queue.
  • FastAPI server with endpoints for search, place details, and reviews.
  • Protobuf decoder for Google Maps internal pb parameter format.
  • Nominatim integration for geographic boundary resolution and subdivision.
  • Automatic cookie management with consent flow.
  • Config shim for pip-only installs via _config_defaults.py.