All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
2.0.0 - 2025-02-09
- Async API:
async_collect(),async_collect_v2(), andstream_collect_v2()methods onGMapsExtractorfor non-blocking collection withhttpx.AsyncClient. - AsyncGMapsClient: Standalone async HTTP client for direct Google Maps requests without FastAPI server dependency.
- GMapsClient: Sync direct HTTP client that bypasses the localhost FastAPI server entirely. Core installs no longer require FastAPI or uvicorn.
- GMapsSettings: Centralized, composable settings dataclass replacing fragile module-level global monkey-patching.
- Event system:
EventEmitterandEventTypefor lifecycle hooks (collection start, cell complete, business found, errors, completion). - ProgressReporter: Pluggable progress output that attaches to the event system, replacing hardcoded print statements.
- Async collector and enrichment:
async_collector.pyandasync_enrichment.pyfor fully async collection pipelines. - Optional FastAPI: FastAPI server is now an optional dependency (
pip install gmaps-extractor[server]). Core library works without it. use_serverparameter: Explicit opt-in for the legacy server-based architecture onGMapsExtractor.- Async context manager:
async with GMapsExtractor(...) as ext:for async workflows. - Streaming collection:
stream_collect_v2()yields businesses as they are found via async generator. - Comprehensive test suite: 570+ tests covering parsers, decoders, geo modules, events, progress, client, settings, async client, async collector, and async enrichment.
- Cookie lifecycle improvements: Auto-retry with cookie refresh on 429/consent redirects, proactive refresh after 500 requests, fresh SOCS cookie generation.
- Request freshness: Rotating User-Agent pool, full browser-like headers, epoch timestamp in search protobuf, anti-cache headers.
- PEP 561 type marker:
py.typedmarker file for typed package support. - CI/CD pipelines: GitHub Actions workflows for linting, testing (Python 3.9-3.12 matrix), and PyPI publishing with trusted publishers.
- Ruff linter and formatter: Configured for Python 3.9+ with pycodestyle, pyflakes, isort, pep8-naming, pyupgrade, bugbear, and simplify rules.
- Mypy type checking: Configured with strict mode for public API modules (
extractor.py,exceptions.py). - Structured logging: Library uses
logging.NullHandlerby default;verbose=Trueadds a StreamHandler at INFO level without stacking handlers.
- Breaking:
GMapsExtractordefaults touse_server=False(direct HTTP requests). Setuse_server=Truefor the legacy FastAPI server architecture. - Breaking:
auto_start_serverparameter is deprecated in favor ofuse_server. - Minimum
httpxversion bumped to>=0.25.0. - Updated
pyproject.tomlto PEP 621 format with proper dependency groups (server,dev,docs). - URLs updated to point to
github.com/promisingcoder/GoogleMapsCollector.
- Verbose handler stacking:
shutdown()now removes the logging handler added byverbose=True, preventing duplicate log lines on repeated instantiation. - Config race condition: Lazy imports in
__init__.pydefer collector module import until afterconfig.apply(), ensuring user-configured values are captured.
1.0.0 - 2024-12-01
- Initial release as pip-installable library (
gmaps-extractoron PyPI). GMapsExtractorclass withcollect()andcollect_v2()methods.CollectionResultwrapper with iteration, indexing, and slicing support.- Custom exception hierarchy (
GMapsExtractorError,ServerError,BoundaryError,ConfigurationError,RateLimitError,AuthenticationError). - Console scripts:
gmaps-collect,gmaps-collect-v2,gmaps-enrich-reviews,gmaps-server. - V1 collector with parallel grid search and deduplication.
- V2 collector with checkpoint/resume, adaptive rate limiting, parallel enrichment, JSONL streaming, and retry queue.
- FastAPI server with endpoints for search, place details, and reviews.
- Protobuf decoder for Google Maps internal
pbparameter format. - Nominatim integration for geographic boundary resolution and subdivision.
- Automatic cookie management with consent flow.
- Config shim for pip-only installs via
_config_defaults.py.