Background
PR #1699 fixed issue #1694 by resampling source pixel data when a SimpleSource's <SrcRect> size differs from its <DstRect> size. The fix reads the full <SrcRect> from the source file before resampling and clipping to the destination window.
Concern
For needs_resample sources, read_vrt now always reads the full SrcRect even when the caller requests a tiny window= (e.g., a 1x1 window). For large SrcRect this can cause a big decode/memory hit.
See xrspatial/geotiff/_vrt.py around the needs_resample branch (currently lines 552-556), where the read window is set to the full SrcRect:
if needs_resample:
read_r0 = sr.y_off
read_c0 = sr.x_off
read_r1 = sr.y_off + sr.y_size
read_c1 = sr.x_off + sr.x_size
Proposal
Compute the subset of SrcRect that maps to the requested window (via the inverse of the nearest-neighbour mapping used by _resample_nearest) and read only that subset.
Care needed:
- The inverse mapping must include all source pixels that any output pixel within
window could sample from. Off-by-one boundary errors here would drop pixels at the window edge.
- Both integer-ratio fast paths and the general non-integer path must agree on the subset.
- The clip math at the end of the
needs_resample branch must adjust to work with a partial src_arr whose origin is no longer at (sr.y_off, sr.x_off).
Workaround
Until this is optimized, callers that pass a small window= on a VRT with large-SrcRect SimpleSources will see the full SrcRect read into memory. Reading without window= and slicing the result has the same cost.
Originally raised by Copilot in the review of #1699.
Background
PR #1699 fixed issue #1694 by resampling source pixel data when a SimpleSource's
<SrcRect>size differs from its<DstRect>size. The fix reads the full<SrcRect>from the source file before resampling and clipping to the destination window.Concern
For
needs_resamplesources,read_vrtnow always reads the fullSrcRecteven when the caller requests a tinywindow=(e.g., a 1x1 window). For largeSrcRectthis can cause a big decode/memory hit.See
xrspatial/geotiff/_vrt.pyaround theneeds_resamplebranch (currently lines 552-556), where the read window is set to the fullSrcRect:Proposal
Compute the subset of
SrcRectthat maps to the requestedwindow(via the inverse of the nearest-neighbour mapping used by_resample_nearest) and read only that subset.Care needed:
windowcould sample from. Off-by-one boundary errors here would drop pixels at the window edge.needs_resamplebranch must adjust to work with a partialsrc_arrwhose origin is no longer at(sr.y_off, sr.x_off).Workaround
Until this is optimized, callers that pass a small
window=on a VRT with large-SrcRectSimpleSources will see the fullSrcRectread into memory. Reading withoutwindow=and slicing the result has the same cost.Originally raised by Copilot in the review of #1699.