Skip to content

Reject mixed BitsPerSample per band in TIFF readers#1519

Merged
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:fix-1505-mixed-bps
May 8, 2026
Merged

Reject mixed BitsPerSample per band in TIFF readers#1519
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:fix-1505-mixed-bps

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1505

A TIFF whose BitsPerSample tag holds a different value per band (e.g. (16, 16, 16, 8) for RGB plus an 8-bit alpha) was silently decoded with the first band's width applied to every band, so the mismatched bands came back as garbage.

The fix adds a _dtypes.resolve_bits_per_sample(bps) helper and threads every reader site through it: stripped CPU, tiled CPU, GPU, VRT, the geo-info probe, and the palette colormap path. The helper accepts a scalar or a tuple/list whose entries all agree, and raises ValueError with the mixed values and a gdal_translate hint when they don't.

Decoding each band separately and promoting to a common dtype was the alternative; it's more code for a rare case, and silent promotion raises follow-up questions (whose nodata wins, what does the user actually want). An explicit error lets the user convert with GDAL or rasterio.

Test plan

  • pytest xrspatial/geotiff/tests/test_mixed_bps.py -- 9 new tests: the helper plus an end-to-end RGB+8-bit-alpha file built from raw IFD bytes
  • pytest xrspatial/geotiff/tests/ -- 699 passed; 3 pre-existing matplotlib palette failures are unrelated

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026
@brendancol brendancol requested a review from Copilot May 8, 2026 14:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #1505 by preventing silent data corruption when reading TIFFs whose BitsPerSample differs per band (e.g., 16-bit RGB + 8-bit alpha). It introduces a centralized resolve_bits_per_sample() helper and updates all relevant GeoTIFF reader entry points to either accept uniform per-band bit depths or raise a clear ValueError for mixed values.

Changes:

  • Added _dtypes.resolve_bits_per_sample() to validate/normalize scalar vs per-band BitsPerSample.
  • Threaded the helper through CPU strip/tile reading, COG HTTP reads, GPU reads, VRT writing, geo-info probing, and palette colormap extraction.
  • Added new tests covering the helper and an end-to-end mixed-BitsPerSample TIFF rejection case.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
xrspatial/geotiff/_dtypes.py Adds resolve_bits_per_sample() and the new mixed-BPS error path.
xrspatial/geotiff/_reader.py Uses resolve_bits_per_sample() in strip/tile/COG/read-to-array paths.
xrspatial/geotiff/__init__.py Uses resolve_bits_per_sample() in geo-info probing and GPU read metadata parsing.
xrspatial/geotiff/_vrt.py Uses resolve_bits_per_sample() when determining source metadata for VRT generation.
xrspatial/geotiff/_geotags.py Uses resolve_bits_per_sample() for palette colormap size calculation.
xrspatial/geotiff/tests/test_mixed_bps.py Adds unit + end-to-end coverage for mixed BitsPerSample handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +152 to +157
raise ValueError(
f"Mixed BitsPerSample per band is not supported: {tuple(bps)}. "
"xarray-spatial decodes all bands with a single dtype. "
"Convert the file to a uniform bit depth first, "
"e.g. `gdal_translate -ot UInt16 in.tif out.tif`."
)
Comment on lines +128 to +132
Parameters
----------
bps : int or tuple of int
Raw value from ``IFD.bits_per_sample``.

Comment thread xrspatial/geotiff/_geotags.py Outdated
brendancol added a commit to brendancol/xarray-spatial that referenced this pull request May 8, 2026
…top-level import

- The mixed-BitsPerSample ValueError used to hard-code
  ``gdal_translate -ot UInt16``. That's a sane default for 16-bit
  inputs but misleading (and lossy) when the file's widest band is
  32-bit, or when the data is float or signed. Pick a real GDAL type
  name based on the widest entry in the BitsPerSample tuple, refining
  on TIFF SampleFormat (1=uint, 2=int, 3=float). Falls back to
  ``<desired_type>`` for unrecognised widths instead of suggesting a
  lossy type. ``resolve_bits_per_sample`` now accepts an optional
  ``sample_format`` kwarg used only for that hint.
- ``resolve_bits_per_sample`` docstring/type previously said "int or
  tuple of int" but the implementation also accepts lists. Updated to
  "int or sequence of int".
- ``_geotags.py`` had a function-local ``from ._dtypes import
  resolve_bits_per_sample`` inside ``extract_geo_info``. Moved to
  the module's top-level imports for consistency with the rest of the
  module. ``_dtypes.py`` only imports numpy, so no circular risk.
- Two new tests verify the dynamic ``-ot`` suggestion picks UInt16
  vs UInt32 vs Int16 vs Float32 based on widest bps and SampleFormat.
@brendancol
Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

brendancol added 2 commits May 8, 2026 08:13
A TIFF whose BitsPerSample tag holds a different value per band (e.g.
(16, 16, 16, 8) for RGB plus an 8-bit alpha) was silently decoded with
the first band's width applied to every band, so the mismatched bands
came back as garbage.

The fix adds a _dtypes.resolve_bits_per_sample(bps) helper and threads
every reader site through it: stripped CPU, tiled CPU, GPU, VRT, the
geo-info probe, and the palette colormap path. The helper accepts a
scalar or a tuple/list whose entries all agree, and raises ValueError
with the mixed values and a gdal_translate hint when they don't.

Closes xarray-contrib#1505
…top-level import

- The mixed-BitsPerSample ValueError used to hard-code
  ``gdal_translate -ot UInt16``. That's a sane default for 16-bit
  inputs but misleading (and lossy) when the file's widest band is
  32-bit, or when the data is float or signed. Pick a real GDAL type
  name based on the widest entry in the BitsPerSample tuple, refining
  on TIFF SampleFormat (1=uint, 2=int, 3=float). Falls back to
  ``<desired_type>`` for unrecognised widths instead of suggesting a
  lossy type. ``resolve_bits_per_sample`` now accepts an optional
  ``sample_format`` kwarg used only for that hint.
- ``resolve_bits_per_sample`` docstring/type previously said "int or
  tuple of int" but the implementation also accepts lists. Updated to
  "int or sequence of int".
- ``_geotags.py`` had a function-local ``from ._dtypes import
  resolve_bits_per_sample`` inside ``extract_geo_info``. Moved to
  the module's top-level imports for consistency with the rest of the
  module. ``_dtypes.py`` only imports numpy, so no circular risk.
- Two new tests verify the dynamic ``-ot`` suggestion picks UInt16
  vs UInt32 vs Int16 vs Float32 based on widest bps and SampleFormat.
@brendancol brendancol force-pushed the fix-1505-mixed-bps branch from b28a6cd to cb71350 Compare May 8, 2026 15:14
@brendancol brendancol merged commit dc194e5 into xarray-contrib:main May 8, 2026
1 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mixed BitsPerSample per band silently uses bps[0]

2 participants