Skip to content

geotiff: overview reads silently drop attrs['nodata'] from level 0; pixel sentinel survives #1739

@brendancol

Description

@brendancol

Describe the bug

open_geotiff(path, overview_level=N) for N >= 1 drops attrs['nodata'], attrs['gdal_metadata'], and attrs['gdal_metadata_xml'] from the returned DataArray, even though the level-0 IFD carries them and the COG writer wrote them. The pixel data in the overview IFDs still contains the sentinel value (the writer rewrites NaN to sentinel before reducing), but the reader returns a DataArray with attrs['nodata'] == None and no NaN masking applied.

Downstream code reading the overview sees pixels at the sentinel value as ordinary data and folds them into statistics, threshold checks, and plots.

Related: #1640 added inheritance of transform and crs from level 0. #1613 / #1623 fixed the writer so the on-disk overview pixels carry the sentinel. The reader-side inheritance was added for the georef fields but not for nodata and the other tags that only appear on level 0.

All four backends are affected (numpy, dask, cupy, dask+cupy) because they all go through extract_geo_info_with_overview_inheritance, which only inherits CRS-side fields.

Reproduction

import numpy as np
import xarray as xr
from xrspatial.geotiff import open_geotiff, to_geotiff

arr = np.full((64, 64), 100.0, dtype=np.float32)
arr[0:16, 0:16] = -9999.0

da = xr.DataArray(arr, dims=['y', 'x'],
                  coords={'y': np.arange(64, dtype=np.float64),
                          'x': np.arange(64, dtype=np.float64)})

to_geotiff(da, 'cog.tif', cog=True, tile_size=16,
           overview_levels=[2], nodata=-9999.0,
           overview_resampling='nearest')

# Level 0: correct
d0 = open_geotiff('cog.tif', overview_level=0)
print(d0.attrs['nodata'])               # -9999.0
print(int(np.isnan(d0.values).sum()))   # 256

# Level 1: nodata dropped
d1 = open_geotiff('cog.tif', overview_level=1)
print(d1.attrs.get('nodata'))            # None  <-- BUG
print(int(np.isnan(d1.values).sum()))    # 0     <-- pixels not masked
print(int((d1.values == -9999.0).sum())) # 64    <-- sentinel survives
print(d1.values.mean())                  # -531  (should be 100)

Expected behavior

When the overview IFD lacks its own GDAL_NODATA tag but the level-0 IFD carries it, the reader should inherit the sentinel from level 0 (the same way extract_geo_info_with_overview_inheritance already inherits crs_epsg, crs_wkt, etc.), apply the standard NaN-mask substitution, and set attrs['nodata']. The same applies to attrs['gdal_metadata'] and attrs['gdal_metadata_xml'] since those are also written only on level 0.

Scope

  • Affects read_to_array (CPU eager), read_geotiff_dask, and read_geotiff_gpu (so all four backend permutations) when reading overview_level >= 1 from a COG produced by to_geotiff or any GDAL-compatible writer that only emits GDAL_NODATA on the level-0 IFD.
  • The reader emits no warning, so the corruption can sit undetected in any pipeline that pulls a COG overview for fast preview / downsampled analysis.
  • Workaround: read level 0 and downsample manually, or copy nodata from a separate read of level 0.

Fix sketch

In xrspatial/geotiff/_geotags.py::extract_geo_info_with_overview_inheritance, when the overview IFD lacks the relevant attribute and the level-0 IFD carries it, inherit:

  • nodata (and the underlying ifd.nodata_str)
  • gdal_metadata, gdal_metadata_xml
  • extra_tags, image_description, extra_samples
  • x_resolution, y_resolution, resolution_unit
  • colormap

The existing helper already short-circuits when the overview has its own georef. Follow the same pattern: only inherit when the overview IFD does not have its own value.

A 4-backend regression test should write a COG with nodata set, read each overview level on each backend, and assert that attrs['nodata'] is preserved and that pixels equal to the sentinel come back as NaN.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions