This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
env-agents is a semantics-centered framework for discovering, fetching, and harmonizing public environmental data via uniform adapters. It returns tidy, analysis-ready tables with rich, machine-readable metadata using ontology-aware adapters.
Version 2.0 Features:
- Meta-service pattern for systematic asset discovery (Earth Engine: 100+ assets)
- Metadata refresh system with freshness tracking for scraped services
- Rate limiting with exponential backoff for API-constrained services
- Uniform interface for both unitary and meta-services
- Enhanced error handling and type checking throughout the system
# Install in development mode
pip install -e .
# Install dependencies
pip install pandas>=2.0 pyarrow>=15 requests>=2.32 pydantic>=2.0 shapely>=2.0# Run tests
python -m pytest tests/
# Run specific test
python -m pytest tests/test_contract.py# List available adapters
ea list
# Show capabilities for all adapters
ea caps
# Use specific base directory
ea --base-dir /path/to/project caps# Run the smoke test example
cd examples/
python smoke_router.py- EnvRouter (
core/router.py): Central orchestration component that registers adapters, manages semantics, and provides unified fetch interface - BaseAdapter (
adapters/base.py): Abstract base class that all environmental data adapters inherit from - TermBroker (
core/term_broker.py): Semantic matching engine that maps native parameters to canonical variables - RegistryManager (
core/registry.py): Manages canonical variable registry and ontology mappings
- Adapter Registration: Adapters register with EnvRouter via
router.register(adapter) - Capabilities Discovery: Call
adapter.capabilities()oradapter.harvest()to discover available variables - Semantic Mapping: TermBroker maps native parameters to canonical variables using rule packs
- Data Fetching:
router.fetch(dataset, spec)returns standardized DataFrame with core columns - Metadata Attachment: Router attaches semantics (URIs, units, conversions) to returned data
All adapters return DataFrames with these core columns:
- Identity:
observation_id, dataset, source_url, source_version, license, retrieval_timestamp - Spatial:
geometry_type, latitude, longitude, geom_wkt, spatial_id, site_name, admin, elevation_m - Temporal:
time, temporal_coverage - Values:
variable, value, unit, depth_top_cm, depth_bottom_cm, qc_flag - Metadata:
attributes, provenance
The registry system (registry/ directory) manages semantic mappings:
registry_seed.json: Curated canonical variables with URIs and preferred unitsregistry_overrides.json: Accepted mappings from native to canonical variablesregistry_delta.json: Suggested mappings pending review (confidence 0.60-0.89)harvest/: Auto-harvested catalogs from each adapter
The TermBroker scores mappings (0-1) using multiple signals:
- Exact ID matches from service rule packs: +0.95
- Label hints from service rules: +0.70
- Generic label matching: +0.10-0.25
- Unit compatibility: +0.01-0.03
- Auto-accepts mappings ≥0.90, suggests 0.60-0.89 for review
- Inherit from
BaseAdapterinadapters/base.py - Set class constants:
DATASET, SOURCE_URL, SOURCE_VERSION, LICENSE - Implement required methods:
capabilities(): Return metadata about available variables_fetch_rows(): Return list of dicts matching core schema
- Optionally implement
harvest()for better semantic discovery - Add rule pack in
adapters/{service}/rules.pywith mappings and unit aliases
class MyAdapter(BaseAdapter):
DATASET = "MY_SERVICE"
SOURCE_URL = "https://api.example.com"
def capabilities(self):
return {
"variables": [...], # List of available parameters
"attributes_schema": {...}, # Native attribute schema
"rate_limits": {...} # API limits if any
}
def _fetch_rows(self, spec: RequestSpec):
# Fetch from upstream API
# Return list of dicts with core column keys
return [...]Add service-specific rule packs (adapters/{service}/rules.py):
CANONICAL_MAP = {
"native_param_id": "canonical:variable:name"
}
UNIT_ALIASES = {
"deg C": "degC",
"cubic feet per second": "ft3/s"
}
LABEL_HINTS = {
"temperature": ["temp", "temperature", "air_temp"]
}- All adapters must return data conforming to the core column schema
- Native parameter traceability is maintained via
attributes["terms"] - Observation IDs are computed deterministically from core identifying fields
- Registry refresh should be run periodically to discover new upstream parameters
- High-confidence semantic mappings (≥0.90) are auto-accepted to reduce curation overhead
- The system gracefully handles missing semantics by preserving native variable names
# Refresh capabilities and update semantic mappings
from env_agents.core.router_ext import refresh_capabilities
report = refresh_capabilities(router, auto_accept_threshold=0.9)
# Review pending mappings
ea list-unknowns
# Promote specific mappings
ea promote --dataset USGS_NWIS --native 00095 --canonical water:specific_conductance_us_cm