Fix read_geotiff_gpu byteswap on big-endian multi-byte TIFFs by brendancol · Pull Request #1515 · xarray-contrib/xarray-spatial

brendancol · 2026-05-08T02:23:36Z

Summary

Closes #1508.

cupy.ndarray (13.x) has no .byteswap() method, so any big-endian multi-byte TIFF was hitting AttributeError inside the GPU decode pipeline. The outer try/except in read_geotiff_gpu ate the error and quietly fell back to CPU, so users got correct results but lost the GPU fast path on BE data.

This swaps both arr.byteswap() calls in _gpu_decode.py for a small helper that views the array as the swapped-order dtype and copies. Works on both.

What changed

_gpu_decode.py: added _xp_byteswap(arr) helper, called at the two BE byteswap sites.
tests/test_gpu_byteswap_1508.py: new GPU regression test. Reads a BE TIFF (uint16, int16, uint32, int32, plus uncompressed uint16) via read_geotiff_gpu, asserts the result is a CuPy DataArray (not a NumPy fallback from a swallowed crash), and matches the CPU read.

Out of scope

The outer try/except Exception in read_geotiff_gpu that turns GPU errors into silent CPU fallback is itself worth revisiting; that's how this bug went undetected. Not touching it here. Filing as a follow-up.

GPU predictor=2 on BE data is also broken, but in a separate way: the predictor kernel decodes the byte stream as native LE before the byteswap. That's a kernel-level issue, not a byteswap one. Also a follow-up.

Test plan

pytest xrspatial/geotiff/tests/test_gpu_byteswap_1508.py -q -> 5 passed
pytest xrspatial/geotiff/tests/ -q -> 695 passed, 4 skipped (3 pre-existing matplotlib palette failures unrelated)

cupy.ndarray (13.x) does not expose .byteswap(), so any BE multi-byte TIFF hit AttributeError inside the GPU decode pipeline. The dispatcher in read_geotiff_gpu caught it and silently fell back to CPU, so output stayed correct but the GPU path was effectively dead for BE data. Replace both arr.byteswap() calls with a small helper that views the array as the swapped-order dtype and copies, which works on numpy and cupy arrays alike. Closes xarray-contrib#1508

Copilot

Pull request overview

Fixes the read_geotiff_gpu fast-path for big-endian, multi-byte GeoTIFFs on CuPy (which lacks cupy.ndarray.byteswap()), so BE tiled TIFFs decode on GPU instead of silently falling back to CPU.

Changes:

Added _xp_byteswap(arr) helper in xrspatial/geotiff/_gpu_decode.py and replaced two out.byteswap() call sites.
Added GPU regression tests to ensure BE multi-byte TIFFs remain CuPy-backed and match CPU output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`xrspatial/geotiff/_gpu_decode.py`	Introduces a cross-backend byteswap helper and uses it in the BE post-processing steps.
`xrspatial/geotiff/tests/test_gpu_byteswap_1508.py`	Adds a regression test to confirm GPU decoding is exercised for BE multi-byte TIFFs and matches CPU results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    (as of cupy 13.x) does not. The view-then-copy trick works on both:
+    re-interpret the buffer as the swapped-order dtype, then copy to
+    materialise the swapped bytes as a real array in that dtype.
+    """
+    return arr.view(arr.dtype.newbyteorder()).copy()
+
+


+    assert isinstance(gpu_da.data, cupy.ndarray), (
+        "expected cupy-backed DataArray, got "
+        f"{type(gpu_da.data).__name__} -- the GPU path likely fell back "
+        "to CPU again"
+    )


…yteswap The earlier implementation, ``arr.view(arr.dtype.newbyteorder()).copy()``, left the result tagged with non-native byteorder (``>u2`` instead of ``<u2``). That's values-equivalent for arithmetic but breaks downstream consumers that expect native dtypes -- numba ``@ngjit`` rejects non-native arrays, which is the same class of bug PR xarray-contrib#1507 fixed for predictor=2 BE. The new implementation reverses bytes through a uint8 view: u8 = arr.view('u1').reshape(*arr.shape, arr.itemsize) return u8[..., ::-1].copy().view(arr.dtype).reshape(arr.shape) Result preserves ``arr.dtype`` and is native-endian, matching numpy's ``ndarray.byteswap()`` contract. 1-byte dtypes short-circuit to a no-op return. Tests now assert ``gpu_da.data.dtype.isnative`` and equality against the input native dtype, plus two pure-numpy tests of the helper itself. The module-level ``pytest.skip`` for missing CUDA was widened to cover both helper tests too; pulled it apart so the helper tests run without a GPU and only the GPU end-to-end tests gate on cupy+CUDA+tifffile.

* Fix GPU predictor=2 decode on big-endian TIFFs (#1517) The per-dtype predictor=2 kernels view the raw decompressed byte buffer as native uint16/uint32/uint64. On big-endian files the bytes are MSB-first, so the prefix-sum runs on the wrong integer interpretation and the differencing produces garbage (~93% pixel mismatch in the reproducer from #1517). Swap the buffer to native byte order before running the predictor=2 kernel, then skip the post-assembly byteswap that previously tried to fix things up. Predictor=1 and predictor=3 paths keep the existing post-decode swap. Closes #1517. Builds on the byteswap helper added in #1515. * Address Copilot review on #1524 - Rename _swap_bytes_inplace -> _swap_byte_lanes. The previous name implied a zero-copy operation but view[:, ::-1].copy() allocates a same-sized temporary. The docstring now states the temp explicitly so callers can size their working set, and a CUDA kernel is left as a follow-up if peak GPU memory becomes an issue. - Add _block_cpu_fallback helper that monkeypatches the read_to_array binding inside xrspatial.geotiff. When the GPU tests call read_geotiff_gpu, any silent CPU fallback now raises an AssertionError instead of producing a passing-via-fallback result. - Apply the guard to the four tests where the GPU decode path is what is being exercised (BE int32 reproducer, BE dtype matrix, LE pred2, BE pred3). The stripped-uint16 test still relies on the CPU fallback by design and stays unguarded. * Make _swap_byte_lanes truly zero-temp Replaces the view[:, ::-1].copy() form with in-place byte reversal: - numpy: dispatch to ndarray.byteswap(inplace=True) via a uint16/32/64 view of the original buffer, so the original allocation is mutated with no temp. - cupy: launch _byte_swap_lanes_kernel, one thread per sample, swapping bytes through register-resident temporaries. Peak GPU memory for the swap is now O(1) instead of O(buffer size). Adds explicit validation: bps must be 2, 4, or 8 (3/5/6/7 are rejected rather than silently corrupting), and size must be a multiple of bps. Tests cover bps=2/4/8 on numpy and cupy, the bps=1 no-op, the validation errors, and address-equality assertions confirming both paths really are in-place. The existing integration tests (test_gpu_predictor2_big_endian_*, predictor3_big_endian) still pass unchanged.

github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026

brendancol mentioned this pull request May 8, 2026

GPU predictor=2 produces wrong values on big-endian TIFFs #1517

Closed

brendancol requested a review from Copilot May 8, 2026 12:21

Copilot started reviewing on behalf of brendancol May 8, 2026 12:22 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

brendancol merged commit 8134de9 into xarray-contrib:main May 8, 2026
11 checks passed

brendancol mentioned this pull request May 8, 2026

Fix GPU predictor=2 decode on big-endian TIFFs (#1517) #1524

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix read_geotiff_gpu byteswap on big-endian multi-byte TIFFs#1515

Fix read_geotiff_gpu byteswap on big-endian multi-byte TIFFs#1515
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:fix-1508-gpu-byteswap

brendancol commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brendancol commented May 8, 2026

Summary

What changed

Out of scope

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants