Skip to content

Narrow to_geotiff GPU fallback catch (#1674)#1678

Merged
brendancol merged 2 commits into
mainfrom
issue-1674
May 12, 2026
Merged

Narrow to_geotiff GPU fallback catch (#1674)#1678
brendancol merged 2 commits into
mainfrom
issue-1674

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Fixes #1674.

Summary

to_geotiff(..., gpu=True) wrapped the GPU writer in except (ImportError, Exception), which is just except Exception. Any failure inside write_geotiff_gpu -- a real bug in the nvCOMP path, a CRS encoding error, a RuntimeError from a CuPy mismatch -- disappeared silently and the data was written via the CPU pipeline. The CPU and GPU writers do not guarantee bit-identical output, so the file the user received was not the file they asked for.

The new branch is narrow:

  • except ImportError: cupy or nvCOMP not installed. Fall back to the CPU writer and emit a GeoTIFFFallbackWarning. XRSPATIAL_GEOTIFF_STRICT=1 re-raises.
  • except RuntimeError: only fall back when the message names a GPU-availability signal (nvcomp, cuda, no device, no gpu, cuinit). Any other RuntimeError propagates so real bugs surface. Strict mode still re-raises on the availability-signal branch.
  • Anything else propagates unchanged.

Side fix: a function-scoped import warnings at the tile_size warning site shadowed the module-level warnings import for the rest of to_geotiff, which would have made the new fallback warnings.warn calls raise UnboundLocalError. Removed the redundant local import; the module-level one was already in scope.

Tests

xrspatial/geotiff/tests/test_to_geotiff_gpu_fallback_1674.py monkeypatches write_geotiff_gpu to raise synthetic exceptions so no real GPU is required:

  • RuntimeError("synthetic non-GPU error") propagates (the regression target).
  • ValueError propagates.
  • ImportError("no nvCOMP") falls back to CPU with a GeoTIFFFallbackWarning.
  • RuntimeError("CUDA not available") and four other availability-signal phrasings fall back with warning.
  • Under XRSPATIAL_GEOTIFF_STRICT=1, both the ImportError and the CUDA not available cases re-raise.
  • Auto-detected CuPy data (via patched _is_gpu_data) follows the same fallback rules.

Test plan

  • pytest xrspatial/geotiff/tests/test_to_geotiff_gpu_fallback_1674.py -v (11 passed)
  • pytest xrspatial/geotiff/tests/ -k "gpu or fallback or strict" (330 passed, 1 skipped)
  • pytest xrspatial/geotiff/tests/ -k "tile_size or strip" to guard the warnings-shadow fix (110 passed)

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 12, 2026
@brendancol brendancol requested a review from Copilot May 12, 2026 15:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens to_geotiff(..., gpu=True) error handling so that only clearly “GPU unavailable” failures trigger a CPU fallback, while real GPU writer bugs propagate to the caller (fixing #1674).

Changes:

  • Narrowed GPU fallback logic in to_geotiff to catch ImportError and selected RuntimeError messages, emitting GeoTIFFFallbackWarning and honoring strict mode.
  • Removed a function-scoped import warnings that could shadow the module-level import.
  • Added regression tests that monkeypatch write_geotiff_gpu to validate propagation vs. fallback behavior without requiring a real GPU.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/__init__.py Replaces overly broad GPU exception swallowing with targeted fallback + typed warnings, and fixes warnings shadowing.
xrspatial/geotiff/tests/test_to_geotiff_gpu_fallback_1674.py Adds regression coverage for the new fallback/strict-mode semantics via synthetic exceptions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1178 to +1185
if _geotiff_strict_mode():
raise
warnings.warn(
f"to_geotiff(gpu=True) fell back to CPU "
f"({type(e).__name__}: {e}).",
GeoTIFFFallbackWarning,
stacklevel=2,
)
Comment thread xrspatial/geotiff/__init__.py Outdated
Comment on lines +1174 to +1177
# cupy or nvCOMP not installed. Fall back to the CPU writer
# with a typed warning so callers see that gpu=True (or
# auto-detected CuPy data) was not honoured. Strict mode
# re-raises so CI can fail loudly on missing GPU stacks.
brendancol added a commit that referenced this pull request May 12, 2026
Two Copilot review notes on PR #1678:

1. The fallback warning at to_geotiff(..., gpu=True) dispatch hard-coded
   "to_geotiff(gpu=True) fell back to CPU" even when GPU was reached
   through the auto-detect branch (gpu is None and the data is CuPy).
   That tells the caller their explicit flag was dropped when they
   never set one, pointing them at the wrong knob. Capture
   auto_detected_gpu = (gpu is None) before resolving use_gpu and pick
   the wording in a single helper so the ImportError and
   RuntimeError branches stay in lockstep.

2. The inline comment on the ImportError branch claimed
   "cupy or nvCOMP not installed" but write_geotiff_gpu only raises
   ImportError for missing cupy. nvCOMP unavailability is handled
   inside _try_nvcomp_from_device_bufs (returns None, triggering CPU
   compression internally), so it never reaches this except block.
   Reworded the comment to match the actual code path.

Tests:
- Updated existing explicit-gpu assertions to pin the new template.
- Strengthened the auto-detect test with full text assertions.
- Added test_auto_detected_gpu_runtime_error_falls_back_with_warning
  to confirm both branches use the same template under auto-detect.
- Added test_explicit_gpu_false_then_true_uses_explicit_template
  to pin that template selection follows gpu is None, not use_gpu
  (so gpu=True on numpy data still gets the explicit wording).

Narrowing logic and strict-mode (XRSPATIAL_GEOTIFF_STRICT) behaviour
are unchanged.
`to_geotiff(..., gpu=True)` wrapped the GPU writer in
`except (ImportError, Exception)`, equivalent to `except Exception`.
Any failure inside `write_geotiff_gpu` -- a real bug in the nvCOMP
path, a CRS encoding error, a RuntimeError from a CuPy mismatch --
disappeared silently and the data was written via the CPU pipeline
even though the user asked for `gpu=True`. The CPU and GPU writers
do not guarantee bit-identical output, so the file the user got was
not the file they asked for.

Replace the broad catch with two narrow ones:

- `except ImportError`: cupy or nvCOMP not installed. Fall back to
  the CPU writer and emit a `GeoTIFFFallbackWarning` so the user
  sees the substitution. `XRSPATIAL_GEOTIFF_STRICT=1` re-raises.
- `except RuntimeError`: only fall back when the message names a
  GPU-availability signal (`nvcomp`, `cuda`, `no device`, `no gpu`,
  `cuinit`). Any other RuntimeError propagates so real bugs in the
  GPU writer surface to the caller. Strict mode still re-raises on
  the availability-signal branch.

Any other exception type propagates unchanged.

Side fix: the function-scoped `import warnings` at the tile_size
warning site shadowed the module-level `warnings` import for the
rest of `to_geotiff`, which made the new fallback `warnings.warn`
call raise UnboundLocalError. Removed the redundant local import;
the module-level one was already in scope.

Tests in test_to_geotiff_gpu_fallback_1674.py monkeypatch
`write_geotiff_gpu` to raise synthetic exceptions so no real GPU is
required. Coverage: ImportError + RuntimeError(gpu signal) fall back
with warning, non-gpu RuntimeError and ValueError propagate, strict
mode re-raises both fallback cases, auto-detected CuPy data warns on
fallback the same as an explicit gpu=True.
Two Copilot review notes on PR #1678:

1. The fallback warning at to_geotiff(..., gpu=True) dispatch hard-coded
   "to_geotiff(gpu=True) fell back to CPU" even when GPU was reached
   through the auto-detect branch (gpu is None and the data is CuPy).
   That tells the caller their explicit flag was dropped when they
   never set one, pointing them at the wrong knob. Capture
   auto_detected_gpu = (gpu is None) before resolving use_gpu and pick
   the wording in a single helper so the ImportError and
   RuntimeError branches stay in lockstep.

2. The inline comment on the ImportError branch claimed
   "cupy or nvCOMP not installed" but write_geotiff_gpu only raises
   ImportError for missing cupy. nvCOMP unavailability is handled
   inside _try_nvcomp_from_device_bufs (returns None, triggering CPU
   compression internally), so it never reaches this except block.
   Reworded the comment to match the actual code path.

Tests:
- Updated existing explicit-gpu assertions to pin the new template.
- Strengthened the auto-detect test with full text assertions.
- Added test_auto_detected_gpu_runtime_error_falls_back_with_warning
  to confirm both branches use the same template under auto-detect.
- Added test_explicit_gpu_false_then_true_uses_explicit_template
  to pin that template selection follows gpu is None, not use_gpu
  (so gpu=True on numpy data still gets the explicit wording).

Narrowing logic and strict-mode (XRSPATIAL_GEOTIFF_STRICT) behaviour
are unchanged.
@brendancol brendancol merged commit d3b5d78 into main May 12, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_geotiff(..., gpu=True) swallows all GPU writer exceptions and silently falls back to CPU

2 participants