Skip to content

geotiff: read_vrt leaves integer nodata sentinel in pixels, divergent from open_geotiff #1564

@brendancol

Description

@brendancol

Describe the bug

read_vrt (also reachable via open_geotiff for .vrt paths) does not honour the nodata sentinel for integer source rasters. It sets attrs['nodata'] on the returned DataArray but leaves the integer sentinel value (e.g. 65535) intact in the pixel array. All other read paths (open_geotiff eager, read_geotiff_dask, read_geotiff_gpu, and the dask+cupy combination) promote integer rasters with a nodata sentinel to float64 and replace the sentinel with NaN.

The result is a silent backend divergence: code that follows the convention "attrs['nodata'] is set ⇒ values have been NaN-masked" (the convention codified in _apply_nodata_mask_gpu and used by all non-VRT reads) will treat the literal sentinel as a real value when the input route happens to be a VRT.

Repro

import numpy as np, xarray as xr, tempfile, os
from xrspatial.geotiff import to_geotiff, open_geotiff, read_vrt, write_vrt

with tempfile.TemporaryDirectory() as d:
    arr = np.array([[1, 2, 3], [65535, 5, 6]], dtype=np.uint16)
    da = xr.DataArray(
        arr, dims=['y','x'],
        coords={'y': np.arange(2), 'x': np.arange(3)},
        attrs={'crs': 4326, 'nodata': 65535},
    )
    tif = os.path.join(d, 'src.tif')
    to_geotiff(da, tif, compression='none', nodata=65535)

    print(open_geotiff(tif).dtype, open_geotiff(tif).values)
    # float64
    # [[ 1.  2.  3.]
    #  [nan  5.  6.]]

    vrt = os.path.join(d, 'src.vrt')
    write_vrt(vrt, [tif])
    via_vrt = read_vrt(vrt)
    print(via_vrt.dtype, via_vrt.attrs.get('nodata'), via_vrt.values)
    # uint16 65535.0
    # [[    1     2     3]
    #  [65535     5     6]]

Float-with-nodata via VRT works correctly. Only the integer branch diverges.

Expected behavior

read_vrt should mirror the post-decode nodata handling from open_geotiff: when the VRT band carries a nodata sentinel and the source dtype is integer, promote the assembled array to float64 and replace the sentinel with NaN. Float arrays already get NaN-masked inside _vrt._read_data, so the fix is symmetric on the integer branch.

Why this matters

Pipelines that read a VRT and feed it into NaN-aware spatial ops (slope, hillshade, etc.) silently treat the sentinel as elevation data when the source is integer-typed. The same operations would do the right thing when the same files are opened directly without the VRT indirection.

Scope

xrspatial/geotiff/_vrt.py::_read_data (integer-source branch) or xrspatial/geotiff/__init__.py::read_vrt (post-decode masking), whichever is cleaner.

Additional context

Found during the metadata propagation sweep on the geotiff subpackage. Adjacent to #1548 and #1547 (read-path attrs/nodata parity).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions