Describe the bug
read_vrt (also reachable via open_geotiff for .vrt paths) does not honour the nodata sentinel for integer source rasters. It sets attrs['nodata'] on the returned DataArray but leaves the integer sentinel value (e.g. 65535) intact in the pixel array. All other read paths (open_geotiff eager, read_geotiff_dask, read_geotiff_gpu, and the dask+cupy combination) promote integer rasters with a nodata sentinel to float64 and replace the sentinel with NaN.
The result is a silent backend divergence: code that follows the convention "attrs['nodata'] is set ⇒ values have been NaN-masked" (the convention codified in _apply_nodata_mask_gpu and used by all non-VRT reads) will treat the literal sentinel as a real value when the input route happens to be a VRT.
Repro
import numpy as np, xarray as xr, tempfile, os
from xrspatial.geotiff import to_geotiff, open_geotiff, read_vrt, write_vrt
with tempfile.TemporaryDirectory() as d:
arr = np.array([[1, 2, 3], [65535, 5, 6]], dtype=np.uint16)
da = xr.DataArray(
arr, dims=['y','x'],
coords={'y': np.arange(2), 'x': np.arange(3)},
attrs={'crs': 4326, 'nodata': 65535},
)
tif = os.path.join(d, 'src.tif')
to_geotiff(da, tif, compression='none', nodata=65535)
print(open_geotiff(tif).dtype, open_geotiff(tif).values)
# float64
# [[ 1. 2. 3.]
# [nan 5. 6.]]
vrt = os.path.join(d, 'src.vrt')
write_vrt(vrt, [tif])
via_vrt = read_vrt(vrt)
print(via_vrt.dtype, via_vrt.attrs.get('nodata'), via_vrt.values)
# uint16 65535.0
# [[ 1 2 3]
# [65535 5 6]]
Float-with-nodata via VRT works correctly. Only the integer branch diverges.
Expected behavior
read_vrt should mirror the post-decode nodata handling from open_geotiff: when the VRT band carries a nodata sentinel and the source dtype is integer, promote the assembled array to float64 and replace the sentinel with NaN. Float arrays already get NaN-masked inside _vrt._read_data, so the fix is symmetric on the integer branch.
Why this matters
Pipelines that read a VRT and feed it into NaN-aware spatial ops (slope, hillshade, etc.) silently treat the sentinel as elevation data when the source is integer-typed. The same operations would do the right thing when the same files are opened directly without the VRT indirection.
Scope
xrspatial/geotiff/_vrt.py::_read_data (integer-source branch) or xrspatial/geotiff/__init__.py::read_vrt (post-decode masking), whichever is cleaner.
Additional context
Found during the metadata propagation sweep on the geotiff subpackage. Adjacent to #1548 and #1547 (read-path attrs/nodata parity).
Describe the bug
read_vrt(also reachable viaopen_geotifffor.vrtpaths) does not honour the nodata sentinel for integer source rasters. It setsattrs['nodata']on the returned DataArray but leaves the integer sentinel value (e.g.65535) intact in the pixel array. All other read paths (open_geotiffeager,read_geotiff_dask,read_geotiff_gpu, and the dask+cupy combination) promote integer rasters with a nodata sentinel tofloat64and replace the sentinel with NaN.The result is a silent backend divergence: code that follows the convention "
attrs['nodata']is set ⇒ values have been NaN-masked" (the convention codified in_apply_nodata_mask_gpuand used by all non-VRT reads) will treat the literal sentinel as a real value when the input route happens to be a VRT.Repro
Float-with-nodata via VRT works correctly. Only the integer branch diverges.
Expected behavior
read_vrtshould mirror the post-decode nodata handling fromopen_geotiff: when the VRT band carries a nodata sentinel and the source dtype is integer, promote the assembled array tofloat64and replace the sentinel with NaN. Float arrays already get NaN-masked inside_vrt._read_data, so the fix is symmetric on the integer branch.Why this matters
Pipelines that read a VRT and feed it into NaN-aware spatial ops (slope, hillshade, etc.) silently treat the sentinel as elevation data when the source is integer-typed. The same operations would do the right thing when the same files are opened directly without the VRT indirection.
Scope
xrspatial/geotiff/_vrt.py::_read_data(integer-source branch) orxrspatial/geotiff/__init__.py::read_vrt(post-decode masking), whichever is cleaner.Additional context
Found during the metadata propagation sweep on the geotiff subpackage. Adjacent to #1548 and #1547 (read-path attrs/nodata parity).