Describe the bug
open_geotiff(path, overview_level=N) for N >= 1 drops attrs['nodata'], attrs['gdal_metadata'], and attrs['gdal_metadata_xml'] from the returned DataArray, even though the level-0 IFD carries them and the COG writer wrote them. The pixel data in the overview IFDs still contains the sentinel value (the writer rewrites NaN to sentinel before reducing), but the reader returns a DataArray with attrs['nodata'] == None and no NaN masking applied.
Downstream code reading the overview sees pixels at the sentinel value as ordinary data and folds them into statistics, threshold checks, and plots.
Related: #1640 added inheritance of transform and crs from level 0. #1613 / #1623 fixed the writer so the on-disk overview pixels carry the sentinel. The reader-side inheritance was added for the georef fields but not for nodata and the other tags that only appear on level 0.
All four backends are affected (numpy, dask, cupy, dask+cupy) because they all go through extract_geo_info_with_overview_inheritance, which only inherits CRS-side fields.
Reproduction
import numpy as np
import xarray as xr
from xrspatial.geotiff import open_geotiff, to_geotiff
arr = np.full((64, 64), 100.0, dtype=np.float32)
arr[0:16, 0:16] = -9999.0
da = xr.DataArray(arr, dims=['y', 'x'],
coords={'y': np.arange(64, dtype=np.float64),
'x': np.arange(64, dtype=np.float64)})
to_geotiff(da, 'cog.tif', cog=True, tile_size=16,
overview_levels=[2], nodata=-9999.0,
overview_resampling='nearest')
# Level 0: correct
d0 = open_geotiff('cog.tif', overview_level=0)
print(d0.attrs['nodata']) # -9999.0
print(int(np.isnan(d0.values).sum())) # 256
# Level 1: nodata dropped
d1 = open_geotiff('cog.tif', overview_level=1)
print(d1.attrs.get('nodata')) # None <-- BUG
print(int(np.isnan(d1.values).sum())) # 0 <-- pixels not masked
print(int((d1.values == -9999.0).sum())) # 64 <-- sentinel survives
print(d1.values.mean()) # -531 (should be 100)
Expected behavior
When the overview IFD lacks its own GDAL_NODATA tag but the level-0 IFD carries it, the reader should inherit the sentinel from level 0 (the same way extract_geo_info_with_overview_inheritance already inherits crs_epsg, crs_wkt, etc.), apply the standard NaN-mask substitution, and set attrs['nodata']. The same applies to attrs['gdal_metadata'] and attrs['gdal_metadata_xml'] since those are also written only on level 0.
Scope
- Affects
read_to_array (CPU eager), read_geotiff_dask, and read_geotiff_gpu (so all four backend permutations) when reading overview_level >= 1 from a COG produced by to_geotiff or any GDAL-compatible writer that only emits GDAL_NODATA on the level-0 IFD.
- The reader emits no warning, so the corruption can sit undetected in any pipeline that pulls a COG overview for fast preview / downsampled analysis.
- Workaround: read level 0 and downsample manually, or copy
nodata from a separate read of level 0.
Fix sketch
In xrspatial/geotiff/_geotags.py::extract_geo_info_with_overview_inheritance, when the overview IFD lacks the relevant attribute and the level-0 IFD carries it, inherit:
nodata (and the underlying ifd.nodata_str)
gdal_metadata, gdal_metadata_xml
extra_tags, image_description, extra_samples
x_resolution, y_resolution, resolution_unit
colormap
The existing helper already short-circuits when the overview has its own georef. Follow the same pattern: only inherit when the overview IFD does not have its own value.
A 4-backend regression test should write a COG with nodata set, read each overview level on each backend, and assert that attrs['nodata'] is preserved and that pixels equal to the sentinel come back as NaN.
Describe the bug
open_geotiff(path, overview_level=N)for N >= 1 dropsattrs['nodata'],attrs['gdal_metadata'], andattrs['gdal_metadata_xml']from the returned DataArray, even though the level-0 IFD carries them and the COG writer wrote them. The pixel data in the overview IFDs still contains the sentinel value (the writer rewrites NaN to sentinel before reducing), but the reader returns a DataArray withattrs['nodata'] == Noneand no NaN masking applied.Downstream code reading the overview sees pixels at the sentinel value as ordinary data and folds them into statistics, threshold checks, and plots.
Related: #1640 added inheritance of
transformandcrsfrom level 0. #1613 / #1623 fixed the writer so the on-disk overview pixels carry the sentinel. The reader-side inheritance was added for the georef fields but not fornodataand the other tags that only appear on level 0.All four backends are affected (numpy, dask, cupy, dask+cupy) because they all go through
extract_geo_info_with_overview_inheritance, which only inherits CRS-side fields.Reproduction
Expected behavior
When the overview IFD lacks its own
GDAL_NODATAtag but the level-0 IFD carries it, the reader should inherit the sentinel from level 0 (the same wayextract_geo_info_with_overview_inheritancealready inheritscrs_epsg,crs_wkt, etc.), apply the standard NaN-mask substitution, and setattrs['nodata']. The same applies toattrs['gdal_metadata']andattrs['gdal_metadata_xml']since those are also written only on level 0.Scope
read_to_array(CPU eager),read_geotiff_dask, andread_geotiff_gpu(so all four backend permutations) when readingoverview_level >= 1from a COG produced byto_geotiffor any GDAL-compatible writer that only emitsGDAL_NODATAon the level-0 IFD.nodatafrom a separate read of level 0.Fix sketch
In
xrspatial/geotiff/_geotags.py::extract_geo_info_with_overview_inheritance, when the overview IFD lacks the relevant attribute and the level-0 IFD carries it, inherit:nodata(and the underlyingifd.nodata_str)gdal_metadata,gdal_metadata_xmlextra_tags,image_description,extra_samplesx_resolution,y_resolution,resolution_unitcolormapThe existing helper already short-circuits when the overview has its own georef. Follow the same pattern: only inherit when the overview IFD does not have its own value.
A 4-backend regression test should write a COG with
nodataset, read each overview level on each backend, and assert thatattrs['nodata']is preserved and that pixels equal to the sentinel come back as NaN.