Skip to content

Fix read_geotiff_gpu byteswap on big-endian multi-byte TIFFs#1515

Merged
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:fix-1508-gpu-byteswap
May 8, 2026
Merged

Fix read_geotiff_gpu byteswap on big-endian multi-byte TIFFs#1515
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:fix-1508-gpu-byteswap

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #1508.

cupy.ndarray (13.x) has no .byteswap() method, so any big-endian multi-byte TIFF was hitting AttributeError inside the GPU decode pipeline. The outer try/except in read_geotiff_gpu ate the error and quietly fell back to CPU, so users got correct results but lost the GPU fast path on BE data.

This swaps both arr.byteswap() calls in _gpu_decode.py for a small helper that views the array as the swapped-order dtype and copies. Works on both.

What changed

  • _gpu_decode.py: added _xp_byteswap(arr) helper, called at the two BE byteswap sites.
  • tests/test_gpu_byteswap_1508.py: new GPU regression test. Reads a BE TIFF (uint16, int16, uint32, int32, plus uncompressed uint16) via read_geotiff_gpu, asserts the result is a CuPy DataArray (not a NumPy fallback from a swallowed crash), and matches the CPU read.

Out of scope

The outer try/except Exception in read_geotiff_gpu that turns GPU errors into silent CPU fallback is itself worth revisiting; that's how this bug went undetected. Not touching it here. Filing as a follow-up.

GPU predictor=2 on BE data is also broken, but in a separate way: the predictor kernel decodes the byte stream as native LE before the byteswap. That's a kernel-level issue, not a byteswap one. Also a follow-up.

Test plan

  • pytest xrspatial/geotiff/tests/test_gpu_byteswap_1508.py -q -> 5 passed
  • pytest xrspatial/geotiff/tests/ -q -> 695 passed, 4 skipped (3 pre-existing matplotlib palette failures unrelated)

cupy.ndarray (13.x) does not expose .byteswap(), so any BE multi-byte
TIFF hit AttributeError inside the GPU decode pipeline. The dispatcher
in read_geotiff_gpu caught it and silently fell back to CPU, so output
stayed correct but the GPU path was effectively dead for BE data.

Replace both arr.byteswap() calls with a small helper that views the
array as the swapped-order dtype and copies, which works on numpy and
cupy arrays alike.

Closes xarray-contrib#1508
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the read_geotiff_gpu fast-path for big-endian, multi-byte GeoTIFFs on CuPy (which lacks cupy.ndarray.byteswap()), so BE tiled TIFFs decode on GPU instead of silently falling back to CPU.

Changes:

  • Added _xp_byteswap(arr) helper in xrspatial/geotiff/_gpu_decode.py and replaced two out.byteswap() call sites.
  • Added GPU regression tests to ensure BE multi-byte TIFFs remain CuPy-backed and match CPU output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_gpu_decode.py Introduces a cross-backend byteswap helper and uses it in the BE post-processing steps.
xrspatial/geotiff/tests/test_gpu_byteswap_1508.py Adds a regression test to confirm GPU decoding is exercised for BE multi-byte TIFFs and matches CPU results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_gpu_decode.py Outdated
Comment on lines +63 to +69
(as of cupy 13.x) does not. The view-then-copy trick works on both:
re-interpret the buffer as the swapped-order dtype, then copy to
materialise the swapped bytes as a real array in that dtype.
"""
return arr.view(arr.dtype.newbyteorder()).copy()


Comment on lines +51 to +55
assert isinstance(gpu_da.data, cupy.ndarray), (
"expected cupy-backed DataArray, got "
f"{type(gpu_da.data).__name__} -- the GPU path likely fell back "
"to CPU again"
)
…yteswap

The earlier implementation, ``arr.view(arr.dtype.newbyteorder()).copy()``,
left the result tagged with non-native byteorder (``>u2`` instead of
``<u2``). That's values-equivalent for arithmetic but breaks downstream
consumers that expect native dtypes -- numba ``@ngjit`` rejects non-native
arrays, which is the same class of bug PR xarray-contrib#1507 fixed for predictor=2 BE.

The new implementation reverses bytes through a uint8 view:

    u8 = arr.view('u1').reshape(*arr.shape, arr.itemsize)
    return u8[..., ::-1].copy().view(arr.dtype).reshape(arr.shape)

Result preserves ``arr.dtype`` and is native-endian, matching numpy's
``ndarray.byteswap()`` contract.

1-byte dtypes short-circuit to a no-op return.

Tests now assert ``gpu_da.data.dtype.isnative`` and equality against the
input native dtype, plus two pure-numpy tests of the helper itself.

The module-level ``pytest.skip`` for missing CUDA was widened to cover
both helper tests too; pulled it apart so the helper tests run without a
GPU and only the GPU end-to-end tests gate on cupy+CUDA+tifffile.
@brendancol brendancol merged commit 8134de9 into xarray-contrib:main May 8, 2026
11 checks passed
brendancol added a commit that referenced this pull request May 8, 2026
* Fix GPU predictor=2 decode on big-endian TIFFs (#1517)

The per-dtype predictor=2 kernels view the raw decompressed byte buffer
as native uint16/uint32/uint64. On big-endian files the bytes are
MSB-first, so the prefix-sum runs on the wrong integer interpretation
and the differencing produces garbage (~93% pixel mismatch in the
reproducer from #1517).

Swap the buffer to native byte order before running the predictor=2
kernel, then skip the post-assembly byteswap that previously tried to
fix things up. Predictor=1 and predictor=3 paths keep the existing
post-decode swap.

Closes #1517. Builds on the byteswap helper added in #1515.

* Address Copilot review on #1524

- Rename _swap_bytes_inplace -> _swap_byte_lanes. The previous name
  implied a zero-copy operation but view[:, ::-1].copy() allocates a
  same-sized temporary. The docstring now states the temp explicitly so
  callers can size their working set, and a CUDA kernel is left as a
  follow-up if peak GPU memory becomes an issue.
- Add _block_cpu_fallback helper that monkeypatches the
  read_to_array binding inside xrspatial.geotiff. When the GPU tests
  call read_geotiff_gpu, any silent CPU fallback now raises an
  AssertionError instead of producing a passing-via-fallback result.
- Apply the guard to the four tests where the GPU decode path is what
  is being exercised (BE int32 reproducer, BE dtype matrix, LE pred2,
  BE pred3). The stripped-uint16 test still relies on the CPU
  fallback by design and stays unguarded.

* Make _swap_byte_lanes truly zero-temp

Replaces the view[:, ::-1].copy() form with in-place byte reversal:

- numpy: dispatch to ndarray.byteswap(inplace=True) via a uint16/32/64
  view of the original buffer, so the original allocation is mutated
  with no temp.
- cupy: launch _byte_swap_lanes_kernel, one thread per sample, swapping
  bytes through register-resident temporaries. Peak GPU memory for the
  swap is now O(1) instead of O(buffer size).

Adds explicit validation: bps must be 2, 4, or 8 (3/5/6/7 are rejected
rather than silently corrupting), and size must be a multiple of bps.

Tests cover bps=2/4/8 on numpy and cupy, the bps=1 no-op, the
validation errors, and address-equality assertions confirming both
paths really are in-place. The existing integration tests
(test_gpu_predictor2_big_endian_*, predictor3_big_endian) still pass
unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPU read of big-endian multi-byte TIFFs crashes on cupy.ndarray.byteswap()

2 participants