Skip to content

Avoid DataArray construction when scanning for cell_measures#647

Merged
dcherian merged 1 commit into
mainfrom
avoid-dataarray-construction-in-attr-scan
Jun 12, 2026
Merged

Avoid DataArray construction when scanning for cell_measures#647
dcherian merged 1 commit into
mainfrom
avoid-dataarray-construction-in-attr-scan

Conversation

@dcherian

Copy link
Copy Markdown
Contributor

Summary

CFAccessor.cell_measures scans every variable's cell_measures attribute by iterating obj.coords.values() and obj.data_vars.values():

all_attrs = [
    ChainMap(da.attrs, da.encoding).get("cell_measures", "")
    for da in obj.coords.values()
]
...
all_attrs += [
    ChainMap(da.attrs, da.encoding).get("cell_measures", "")
    for da in obj.data_vars.values()
]

Iterating those mappings constructs a full DataArray (resolving its coordinates) per variable just to read one attribute. Because cell_measures is reached from __getitem___get_all_cell_measures, anything that does ds.cf[[var]] pays an O(nvars) DataArray-construction cost, and code that loops over variables calling ds.cf[[var]] (e.g. grid/style probing) ends up O(nvars²).

A downstream profile of a metadata endpoint over a many-variable dataset showed this cell_measures attr-scan as a top contributor to Dataset._construct_dataarray time:

ds.cf[[var]] → _getitem → _get_all_cell_measures → cell_measures
  → for da in obj.data_vars.values()  →  _construct_dataarray  (per variable)

Change

Read the cell_measures attribute off the bare Variable objects (obj.variables for a Dataset; obj.coords.variables + obj.variable for a DataArray), which carry .attrs/.encoding without constructing a DataArray. This mirrors how standard_names already reads from self._obj._variables.

Behavior is unchanged (verified against airds for both the Dataset and DataArray paths). Full test_accessor.py passes (208 passed, 1 skipped).

🤖 Generated with Claude Code

cell_measures iterated obj.coords.values()/obj.data_vars.values() to read
the cell_measures attribute, constructing (and resolving coordinates for) a
DataArray per variable on every access. Read the attribute off the bare
Variables instead, matching standard_names.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dcherian dcherian merged commit 5d01fd2 into main Jun 12, 2026
11 checks passed
@dcherian dcherian deleted the avoid-dataarray-construction-in-attr-scan branch June 12, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant