The Airbus Harvester is a component of the EODHP (Earth Observation Data Hub Platform) project, designed to regularly collect and process archive imagery metadata from Airbus APIs. This harvester supports both optical (Pléiades, Pléiades Neo, SPOT) and radar (SAR) datasets.
On each run, the harvester queries the relevant Airbus API endpoints, compares the current catalogue with the previous run, and identifies new, updated, or deleted items. It converts the API responses into STAC (SpatioTemporal Asset Catalog) format, storing the resulting STAC items in S3. To track changes efficiently, it maintains a hash of the metadata for each file.
After updating the catalogue, the harvester sends a message to the upstream "harvested" Pulsar topic, enabling downstream components in the EODHP pipeline to react to new or changed data. The harvester also updates the STAC catalogue and collection summaries, including temporal and spatial intervals, to provide a comprehensive and up-to-date view of the available Airbus imagery.
- Regularly harvests Airbus archive imagery metadata (optical and radar).
- Detects new, updated, and deleted items by comparing metadata hashes with previous runs.
- Converts API responses to STAC-compliant items and collections.
- Stores STAC items in S3.
- Publishes messages to a Pulsar topic for downstream processing.
- Maintains metadata hashes for efficient change tracking.
- Maintains an overarching STAC catalogue and collection.
- Python 3.13
- uv
- GNU Make
- AWS credentials (for S3 access)
- Access to Pulsar (for messaging)
- Access to Airbus APIs
Clone the repository and run the setup using the Makefile:
git clone https://github.com/EO-DataHub/airbus-harvester.git
cd airbus-harvester
make setupThis will:
- Install dependencies via
uv sync - Install pre-commit hooks
You can safely run make setup repeatedly; it will only update things if needed.
Configuration is managed via config.json.
You can specify which dataset to harvest by setting the HARVESTER_CONFIG_KEY environment variable (e.g., SPOT, PNEO, PHR, SAR).
Each dataset configuration in config.json controls how the harvester interacts with the corresponding Airbus API. You can adjust:
- API endpoints and authentication: Change the
urlandauth_envto point to different Airbus API environments or endpoints. - Request parameters: Modify the
bodyandrequest_methodto control how data is requested (e.g., filtering by constellation, pagination settings). - STAC mapping: Update
stac_properties_mapto map API response fields to STAC properties, or add new mappings as needed. - External URLs: Add or change entries in
external_urlsto include additional links or assets in the output STAC items, and control whether they are proxied. - Extensions and metadata: Specify which STAC extensions to include in the resulting items, and set collection-level metadata.
See config_schema.json for config structure.
Environment Variables:
HARVESTER_CONFIG_KEY: Selects the dataset config.AIRBUS_API_KEY: Your Airbus API key.PULSAR_URL: Pulsar broker URL.PROXY_BASE_URL: Base URL for asset href redirects via a proxy.MINIMUM_MESSAGE_ENTRIES: Minimum number of entries before sending a message (default: 100).MAX_API_RETRIES: Maximum API retry attempts (default: 5).COMMERCIAL_CATALOGUE_ROOT: Root path for catalogue storage (default: "commercial").TOPIC: Optional append to the Pulsar output topic, used to separate large harvests such as this from more time-sensitive messages (default: None).
Run the harvester from the command line:
python -m airbus_harvester <workspace_name> <catalog> <s3_bucket>Example:
python -m airbus_harvester default_workspace catalog catalogue-population-eodhpcatalogis not used, it is included to preserve structure with other harvestersworkspace_nameshould bedefault_workspace, to harvest items into a public catalogue in the EODH.
- Code is in
airbus_harvester. - Formatting and linting: Ruff.
- Type checking: Pyright.
- Pre-commit checks are installed with
make setup.
Useful Makefile targets:
make test: Run tests continuously (via pytest-watcher)make testonce: Run tests oncemake format: Auto-format and fix lint issuesmake check: Run all checks (ruff, pyright, validate-pyproject)make dockerbuild: Build a Docker imagemake dockerpush: Push a Docker image
Run all tests with:
make testonceTests use pytest, moto for AWS mocking, and requests-mock.
- Authentication errors: Check your
AIRBUS_API_KEYand AWS credentials. - Pulsar connection issues: Ensure
PULSAR_URLis set and reachable. - S3 upload failures: Verify bucket permissions and region.
- API rate limits: Adjust
MAX_API_RETRIESas needed.
Check logs for detailed error messages.
The release process is fully automated and handled through GitHub Actions.
On every push to main or when a new tag is created, the following checks and steps are run automatically:
- QA checks (ruff, pyright)
- Security scanning
- Unit tests
- Docker image build and push to the configured registry
Versioned releases are handled through the Releases page in GitHub.
See .github/workflows/actions.yaml for details.
This project is licensed under the United Kingdom Research and Innovation BSD Licence. See LICENSE for details.