Reject mixed BitsPerSample per band in TIFF readers#1519
Merged
brendancol merged 2 commits intoxarray-contrib:mainfrom May 8, 2026
Merged
Reject mixed BitsPerSample per band in TIFF readers#1519brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol merged 2 commits intoxarray-contrib:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #1505 by preventing silent data corruption when reading TIFFs whose BitsPerSample differs per band (e.g., 16-bit RGB + 8-bit alpha). It introduces a centralized resolve_bits_per_sample() helper and updates all relevant GeoTIFF reader entry points to either accept uniform per-band bit depths or raise a clear ValueError for mixed values.
Changes:
- Added
_dtypes.resolve_bits_per_sample()to validate/normalize scalar vs per-bandBitsPerSample. - Threaded the helper through CPU strip/tile reading, COG HTTP reads, GPU reads, VRT writing, geo-info probing, and palette colormap extraction.
- Added new tests covering the helper and an end-to-end mixed-
BitsPerSampleTIFF rejection case.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
xrspatial/geotiff/_dtypes.py |
Adds resolve_bits_per_sample() and the new mixed-BPS error path. |
xrspatial/geotiff/_reader.py |
Uses resolve_bits_per_sample() in strip/tile/COG/read-to-array paths. |
xrspatial/geotiff/__init__.py |
Uses resolve_bits_per_sample() in geo-info probing and GPU read metadata parsing. |
xrspatial/geotiff/_vrt.py |
Uses resolve_bits_per_sample() when determining source metadata for VRT generation. |
xrspatial/geotiff/_geotags.py |
Uses resolve_bits_per_sample() for palette colormap size calculation. |
xrspatial/geotiff/tests/test_mixed_bps.py |
Adds unit + end-to-end coverage for mixed BitsPerSample handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+152
to
+157
| raise ValueError( | ||
| f"Mixed BitsPerSample per band is not supported: {tuple(bps)}. " | ||
| "xarray-spatial decodes all bands with a single dtype. " | ||
| "Convert the file to a uniform bit depth first, " | ||
| "e.g. `gdal_translate -ot UInt16 in.tif out.tif`." | ||
| ) |
Comment on lines
+128
to
+132
| Parameters | ||
| ---------- | ||
| bps : int or tuple of int | ||
| Raw value from ``IFD.bits_per_sample``. | ||
|
|
brendancol
added a commit
to brendancol/xarray-spatial
that referenced
this pull request
May 8, 2026
…top-level import - The mixed-BitsPerSample ValueError used to hard-code ``gdal_translate -ot UInt16``. That's a sane default for 16-bit inputs but misleading (and lossy) when the file's widest band is 32-bit, or when the data is float or signed. Pick a real GDAL type name based on the widest entry in the BitsPerSample tuple, refining on TIFF SampleFormat (1=uint, 2=int, 3=float). Falls back to ``<desired_type>`` for unrecognised widths instead of suggesting a lossy type. ``resolve_bits_per_sample`` now accepts an optional ``sample_format`` kwarg used only for that hint. - ``resolve_bits_per_sample`` docstring/type previously said "int or tuple of int" but the implementation also accepts lists. Updated to "int or sequence of int". - ``_geotags.py`` had a function-local ``from ._dtypes import resolve_bits_per_sample`` inside ``extract_geo_info``. Moved to the module's top-level imports for consistency with the rest of the module. ``_dtypes.py`` only imports numpy, so no circular risk. - Two new tests verify the dynamic ``-ot`` suggestion picks UInt16 vs UInt32 vs Int16 vs Float32 based on widest bps and SampleFormat.
Contributor
Author
|
@copilot resolve the merge conflicts in this pull request |
A TIFF whose BitsPerSample tag holds a different value per band (e.g. (16, 16, 16, 8) for RGB plus an 8-bit alpha) was silently decoded with the first band's width applied to every band, so the mismatched bands came back as garbage. The fix adds a _dtypes.resolve_bits_per_sample(bps) helper and threads every reader site through it: stripped CPU, tiled CPU, GPU, VRT, the geo-info probe, and the palette colormap path. The helper accepts a scalar or a tuple/list whose entries all agree, and raises ValueError with the mixed values and a gdal_translate hint when they don't. Closes xarray-contrib#1505
…top-level import - The mixed-BitsPerSample ValueError used to hard-code ``gdal_translate -ot UInt16``. That's a sane default for 16-bit inputs but misleading (and lossy) when the file's widest band is 32-bit, or when the data is float or signed. Pick a real GDAL type name based on the widest entry in the BitsPerSample tuple, refining on TIFF SampleFormat (1=uint, 2=int, 3=float). Falls back to ``<desired_type>`` for unrecognised widths instead of suggesting a lossy type. ``resolve_bits_per_sample`` now accepts an optional ``sample_format`` kwarg used only for that hint. - ``resolve_bits_per_sample`` docstring/type previously said "int or tuple of int" but the implementation also accepts lists. Updated to "int or sequence of int". - ``_geotags.py`` had a function-local ``from ._dtypes import resolve_bits_per_sample`` inside ``extract_geo_info``. Moved to the module's top-level imports for consistency with the rest of the module. ``_dtypes.py`` only imports numpy, so no circular risk. - Two new tests verify the dynamic ``-ot`` suggestion picks UInt16 vs UInt32 vs Int16 vs Float32 based on widest bps and SampleFormat.
b28a6cd to
cb71350
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1505
A TIFF whose
BitsPerSampletag holds a different value per band (e.g.(16, 16, 16, 8)for RGB plus an 8-bit alpha) was silently decoded with the first band's width applied to every band, so the mismatched bands came back as garbage.The fix adds a
_dtypes.resolve_bits_per_sample(bps)helper and threads every reader site through it: stripped CPU, tiled CPU, GPU, VRT, the geo-info probe, and the palette colormap path. The helper accepts a scalar or a tuple/list whose entries all agree, and raisesValueErrorwith the mixed values and agdal_translatehint when they don't.Decoding each band separately and promoting to a common dtype was the alternative; it's more code for a rare case, and silent promotion raises follow-up questions (whose nodata wins, what does the user actually want). An explicit error lets the user convert with GDAL or rasterio.
Test plan
pytest xrspatial/geotiff/tests/test_mixed_bps.py-- 9 new tests: the helper plus an end-to-end RGB+8-bit-alpha file built from raw IFD bytespytest xrspatial/geotiff/tests/-- 699 passed; 3 pre-existing matplotlib palette failures are unrelated