Skip to content

from_template ergonomics: coregister your data onto the grid + neighborhood-friendly dask chunks#3562

Merged
brendancol merged 3 commits into
mainfrom
issue-3561
Jun 27, 2026
Merged

from_template ergonomics: coregister your data onto the grid + neighborhood-friendly dask chunks#3562
brendancol merged 3 commits into
mainfrom
issue-3561

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Summary

Two ergonomic additions to from_template, so users go from a region name to an analysis-ready grid with less friction.

  • template.xrs.coregister(data): one call to put your own data on the grid. A raster DataArray is reprojected onto the template's exact CRS, bounds, and shape; a GeoDataFrame is rasterized onto it (the rasterize(coregister=True) path). The result shares the template's y/x coords, so layers stack cell-for-cell. This fills the gap left by the existing coregister paths, which covered vectors, points, and on-disk GeoTIFFs but not in-memory rasters.
  • Neighborhood-friendly dask chunks: chunks='auto' (the default for dask templates) now tiles into even, square blocks (~2048 cells per side) instead of one giant byte-sized block or thin ragged edge slivers. Downstream slope/focal/etc. inherit a parallel, overlap-friendly graph. Small grids stay a single chunk; the block grows for very large grids so the default never trips the existing chunk-count cap (from_template: guard against task-graph explosion on dask with very fine resolution #3557).
  • Docstring "Recipes" section and a templates reference note showing the pick-a-grid, coregister, run-a-tool path.

Measured before/after for the chunking: from_template('conus', resolution=1000, backend='dask') went from a single 3105x5865 chunk to 6 balanced ~1553x1955 blocks; conus@250m from 15 blocks with 291-px slivers to 66 even ~2070x2132 blocks.

Backend coverage

coregister runs on whatever backends the underlying reproject/rasterize support (numpy and dask verified; a dask source returns a dask result). The chunk change is dask-only. No new compute kernels.

Test plan

  • coregister: raster lands on grid (exact coords, constant field preserved), vector path equals rasterize(coregister=True), missing-CRS and bad-type errors, nearest resampling, dask source stays dask
  • chunks: default tiling balanced and square, small template stays one chunk, block grows for huge grids (stays under the cap), explicit chunks bypass the tiling
  • full test_templates.py and test_accessor.py suites pass; reproject and rasterize suites pass
  • integration: dask template -> coregister dask raster -> slope stays lazy and computes
  • flake8 clean

Closes #3561

https://claude.ai/code/session_01TEXdRkRqYJKvATjquz26Pa

…nks (#3561)

Add template.xrs.coregister(data): reproject a raster DataArray, or
rasterize a GeoDataFrame, onto the caller grid's exact CRS, bounds, and
shape. One call to land your own data on a template and run a tool. The
result shares the template's y/x coords, so layers stack cell-for-cell.

Make the default dask tiling neighborhood-friendly: chunks='auto' (the
default) now tiles into even, square-ish blocks (~2048 cells) instead of
one giant byte-sized block or thin ragged edges, so downstream slope/
focal/etc. parallelize cleanly through map_overlap. Small grids stay one
chunk; the block grows for huge grids so 'auto' never trips the chunk cap.

Document the pick-a-grid, coregister, run-a-tool path in the from_template
docstring and the templates reference page.

Closes #3561

Claude-Session: https://claude.ai/code/session_01TEXdRkRqYJKvATjquz26Pa
reproject always emits north-up (descending y, ascending x), so blindly
relabeling with the caller's coords would flip the data for a grid stored
in a different axis order. Flip to the caller's directions first, then
snap. from_template grids are already north-up, so this only matters for
arbitrary caller rasters, but it removes a silent-flip footgun.

Claude-Session: https://claude.ai/code/session_01TEXdRkRqYJKvATjquz26Pa

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: from_template ergonomics (coregister + neighborhood dask chunks)

I read the full diff from the PR worktree and exercised both features, including a few cases the tests do not cover (non-constant gradient orientation, ascending-y target, huge-grid chunk cap).

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • accessor.py _reproject_onto_accessor: **kwargs is forwarded to reproject, which already gets bounds, width, height, and resampling as explicit args. A caller passing any of those through coregister(..., bounds=...) would hit a "multiple values for keyword argument" TypeError instead of a friendly message. Low odds, since those are exactly what coregister computes for you. Not worth guarding unless it comes up.
  • accessor.py ~line 895: the 2x2-cell minimum raises even when attrs['res'] is present and could anchor a 1x1 grid. Coregistering onto a single pixel is degenerate, so the clear error is fine, just slightly stricter than necessary.

What looks good

  • Orientation is correct, which was the real risk. reproject always emits north-up (descending y, ascending x) no matter the source order, and the follow-up commit flips to the caller's axis directions before the coordinate snap, so an ascending-y target no longer flips silently. Verified with a north-south gradient: the north row stays north.
  • Raster path data is byte-identical to a raw reproject with the same derived bounds; the coordinate snap only removes float drift, it does not move data.
  • Vector path is exactly rasterize(coregister=True), so it inherits that path's CRS handling and tests.
  • The chunk default is self-limiting: the block grows for very large grids so chunks='auto' stays under the existing 1e6 cap (the #3557 contract holds). conus@1m lands at ~200k blocks instead of erroring.
  • Small grids still come back as a single chunk, so the change is invisible for the common small-template case.
  • CRS attrs (crs int, crs_wkt, grid_mapping_name) are carried from the template onto the result, so it is a real drop-in rather than keeping reproject's WKT-string crs.

Checklist

  • coregister output lands on the template grid (exact coords, CRS carried over)
  • No coordinate flip (orientation verified with a gradient and an ascending-y target)
  • Raster data identical to raw reproject; vector path identical to rasterize(coregister=True)
  • Dask source returns a dask result; default tiling balanced and square
  • chunks='auto' stays under the chunk cap at extreme resolution; explicit chunks bypass the tiling
  • Docstrings present (coregister, from_template Recipes) and a templates reference note
  • README feature matrix: N/A (no new top-level function)
  • Benchmark: N/A (thin wrapper over already-benchmarked reproject/rasterize; chunk change is in grid construction, not a hot path)

@brendancol brendancol merged commit a6357aa into main Jun 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

from_template: coregister your own data onto the grid + neighborhood-friendly dask chunks

1 participant