Skip to content

Memoize CRS construction in grid_mappings#646

Merged
dcherian merged 2 commits into
mainfrom
memoize-crs-construction
Jun 12, 2026
Merged

Memoize CRS construction in grid_mappings#646
dcherian merged 2 commits into
mainfrom
memoize-crs-construction

Conversation

@dcherian

Copy link
Copy Markdown
Contributor

Summary

GridMapping.crs is built via pyproj.CRS.from_cf(var.attrs) inside _create_grid_mapping, which is called once per grid mapping every time .cf.grid_mappings is accessed. pyproj.CRS.from_cf re-parses the datum/ellipsoid from scratch on each call — cheap for EPSG-coded CRSs, but expensive for grid mappings carrying explicit ellipsoid parameters (semi_major_axis/semi_minor_axis/inverse_flattening), e.g. geostationary.

In a downstream tile server, the cold path of a metadata endpoint that probes every data variable's grid (guess_grid_metadatads.cf.grid_mappings) spent ~70s entirely in Datum.__new__ via from_cf — N data variables × repeated detection, all rebuilding the same CRS:

_guess_grid_mappings_and_crs   (caller)
  grid_mappings                cf_xarray/accessor.py
    _create_grid_mapping       cf_xarray/accessor.py
      from_cf                  pyproj/crs/crs.py
        _horizontal_datum_from_params   pyproj/_cf1x8.py   ← ~70s / 7032 samples

Change

Memoize the CRS construction on a hashable form of the grid-mapping variable's attrs (_crs_from_cf_attrs + _hashable_attrs). A dataset references the same grid mapping from many variables and the property may be accessed repeatedly, so each distinct grid mapping is now built once. This matches the existing caching pattern in this module (_parse_grid_mapping_attribute).

Only the generic from_cf path is cached (the demonstrated hotspot); the from_json_dict HEALPix / reduced-gaussian branches are left as-is.

Test

test_grid_mappings_crs_construction_is_cached spies on pyproj.CRS.from_cf and asserts it's called exactly once per distinct grid mapping (3 for hrrrds) across repeated Dataset and DataArray property accesses. Full test_accessor.py suite passes (209 passed, 1 skipped).

🤖 Generated with Claude Code

dcherian and others added 2 commits June 11, 2026 23:08
pyproj.CRS.from_cf re-parses the datum/ellipsoid on every call, which is
expensive for grid mappings carrying explicit ellipsoid parameters (e.g.
geostationary). A dataset references the same grid mapping from many
variables and .cf.grid_mappings may be accessed repeatedly, so cache the
constructed CRS on the grid-mapping variable's attrs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dcherian dcherian enabled auto-merge (rebase) June 12, 2026 05:09
@dcherian dcherian merged commit 93511ee into main Jun 12, 2026
11 checks passed
@dcherian dcherian deleted the memoize-crs-construction branch June 12, 2026 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant