Skip to content

Add lightweight GeoTIFF/COG reader and writer#1035

Open
brendancol wants to merge 30 commits intomasterfrom
geotiff-reader-writer
Open

Add lightweight GeoTIFF/COG reader and writer#1035
brendancol wants to merge 30 commits intomasterfrom
geotiff-reader-writer

Conversation

@brendancol
Copy link
Contributor

@brendancol brendancol commented Mar 20, 2026

Summary

  • New xrspatial/geotiff/ subpackage: read and write GeoTIFF and COG files without GDAL
  • Only depends on numpy, numba, xarray, and the standard library
  • Stripped, tiled, uncompressed, deflate, and LZW layouts
  • GeoTIFF metadata: CRS (EPSG), affine transforms, GeoKeys, PixelIsArea/PixelIsPoint, nodata
  • COG writer with overview generation and IFD-first layout
  • HTTP range-request reader for remote COGs, mmap for local files
  • Reads uncompressed data 5-7x faster than rioxarray; compressed COGs are comparable
  • Writes are on par for uncompressed, 2-4x slower for compressed (GDAL's zlib runs in C)
  • Pixel-exact match against rioxarray on Landsat 8, Copernicus DEM, USGS 1-arc-second, and USGS 1-meter files
  • 154 tests: codecs, header parsing, geo metadata, 8 dtypes x 3 compressions, edge cases (corrupt files, NaN/Inf, odd shapes, PixelIsPoint), public API

Test plan

  • pytest xrspatial/geotiff/tests/ passes all 154 tests
  • Round-trip against rioxarray on real Landsat/Copernicus/USGS files
  • Read perf benchmark vs rioxarray (bench_vs_rioxarray.py)
  • Metadata fidelity: CRS, transform origin, pixel size, nodata, raster type all survive a read-rewrite cycle

Reads and writes GeoTIFF and Cloud Optimized GeoTIFF files using only
numpy, numba, xarray, and the standard library. No GDAL required.

What it does:
- Parses TIFF/BigTIFF headers and IFDs via struct
- Reads GeoTIFF tags: CRS (EPSG), affine transforms, GeoKeys,
  PixelIsArea vs PixelIsPoint
- Deflate (zlib), LZW (Numba JIT), horizontal and floating-point
  predictor codecs
- Strip and tiled layouts, windowed reads
- COG writer with overview generation and IFD-first layout
- HTTP range-request reader for remote COGs
- mmap for local file access (zero-copy)
- Nodata sentinel masking to NaN on read
- Metadata round-trip: CRS, transform, nodata, raster type, pixels

Read performance vs rioxarray/GDAL:
- 5-7x faster on uncompressed data
- On par for compressed COGs
- 100% pixel-exact across all tested formats

Write performance:
- On par for uncompressed
- 2-4x slower for compressed (zlib is C-native in GDAL)

Tested against Landsat 8, Copernicus DEM, USGS 1-arc-second, and
USGS 1-meter DEMs. 154 tests cover codecs, header parsing, geo
metadata, round-trips across 8 dtypes x 3 compressions, edge cases
(corrupt files, NaN/Inf, extreme shapes, PixelIsPoint), and the
public API.
@github-actions github-actions bot added the performance PR touches performance-sensitive code label Mar 20, 2026
…rite

Six features filling the main gaps for real-world use:

1. Multi-band write: 3D arrays (height, width, bands) now write as
   multi-band GeoTIFFs with correct BitsPerSample, SampleFormat, and
   PhotometricInterpretation (RGB for 3+ bands). Overviews work for
   multi-band too. read_geotiff returns all bands by default (band=None)
   with a 'band' dimension.

2. Integer nodata masking: uint8/uint16/int16 arrays with nodata values
   are promoted to float64 and masked with NaN on read, matching
   rioxarray behavior. Previously only float arrays were masked.

3. PackBits compression (tag 32773): simple RLE codec, both read and
   write. Common in older TIFF files.

4. JPEG decompression (tag 7): read support via Pillow for
   JPEG-compressed tiles/strips. Import is optional and lazy.

5. BigTIFF write: auto-detects when output exceeds ~4GB and switches
   to BigTIFF format (16-byte header, 20-byte IFD entries, 8-byte
   offsets). Prevents silent offset overflow corruption on large files.

6. Dask lazy reads: read_geotiff_dask() returns a dask-backed
   DataArray using windowed reads per chunk. Works for single-band
   and multi-band files with nodata masking per chunk.

178 tests passing.
Strip-based windowed reads now only decompress strips that overlap the
requested row range. Previously, all strips were decompressed into a
full image buffer and then sliced.

For a 4096x512 deflate file with 256-row strips, reading a 10x10
window from the top-left goes from 31 ms to 1.9 ms (16x). On a
100,000-row file the savings scale linearly with the number of
strips skipped.
Read and write Zstandard-compressed GeoTIFFs using the zstandard
package (lazy import, clear error if missing).

On a 2048x2048 float32 raster, ZSTD vs deflate:
- Write: 39 ms vs 420 ms (10.7x faster)
- Read:  14 ms vs  66 ms (4.7x faster)
- Size:  15.5 MB vs 15.5 MB (comparable)

9 new tests covering codec round-trips, stripped/tiled layouts,
uint16, predictor, multi-band, and the public API.
TIFF PlanarConfiguration=2 stores each band as a separate set of
strips or tiles (RRR...GGG...BBB) instead of interleaved
(RGBRGB...). The reader now detects this from the IFD and iterates
band-by-band through the strip/tile offset array, placing each
single-band chunk into the correct slice of the output.

Both strip and tile layouts are handled. Windowed reads and single-
band selection work correctly with planar files.

6 new tests: planar strips (RGB, 2-band), planar tiles, windowed
read, band selection, and public API.
Adds read support for non-byte-aligned pixel data, common in bilevel
masks (1-bit), palette images (4-bit), and medical/scientific sensors
(12-bit).

Changes:
- _dtypes.py: Map 1/2/4-bit to uint8 and 12-bit to uint16
- _compression.py: Add unpack_bits() for MSB-first bit unpacking
  at 1, 2, 4, and 12 bits per sample
- _reader.py: Add _decode_strip_or_tile() helper that handles the
  full decompress -> predictor -> unpack -> reshape pipeline,
  detecting sub-byte depths automatically. Both strip and tile
  readers refactored to use it.

7 new tests: 1-bit bilevel, 1-bit non-byte-aligned width, 4-bit,
4-bit odd width, 12-bit, and direct codec tests.
Reads TIFF files with Photometric=3 (Palette) and ColorMap tag (320).
The TIFF color table (uint16 R/G/B arrays) is converted to a
matplotlib ListedColormap and stored in da.attrs['cmap'].

New plot_geotiff() convenience function uses the embedded colormap
with BoundaryNorm so that integer class indices map to the correct
palette colors when plotted. Works out of the box:

    da = read_geotiff('landcover.tif')
    plot_geotiff(da)  # colors match the TIFF's palette

Also stores raw RGBA tuples in attrs['colormap_rgba'] for custom use.
Supports both 8-bit (256-color) and 4-bit (16-color) palettes.

5 new tests: 8-bit palette read, 4-bit palette, colormap object
verification, plot_geotiff smoke test, and non-palette attr check.
The .xrs accessor (registered on all DataArrays by xrspatial) now has
a plot() method that checks for an embedded TIFF colormap in attrs.
If present, it applies BoundaryNorm with the ListedColormap so that
integer class indices map to the correct palette colors.

    da = read_geotiff('landcover.tif')
    da.xrs.plot()  # palette colors applied automatically

For non-palette DataArrays, falls through to the standard da.plot().
The old plot_geotiff() function is kept as a thin wrapper.
Multiple threads reading the same file now share a single read-only
mmap instead of each opening their own. A module-level _MmapCache
protected by a threading.Lock manages reference counts per file path.
The mmap is closed when the last reader releases it.

Read-only mmap slicing (which is what the strip/tile readers do) is
thread-safe at the OS level -- no seek or file position involved.

Tested with 16 concurrent threads reading different windows from the
same deflate+tiled file, and a stress test of 400 reads across 8
threads. Zero errors, cache drains properly.

For dask lazy reads, this means all chunk-read tasks for the same
file share one mmap instead of opening/closing the file per chunk.
Writes now go to a temporary file in the same directory, then
os.replace() atomically swaps it over the target path. This gives:

- No interleaved output when multiple threads write the same path
- Readers never see a half-written file
- No corrupt file left behind if the process crashes mid-write
- Temp file cleaned up on any exception

os.replace is atomic on POSIX (single rename syscall) and
near-atomic on Windows (ReplaceFile).
_make_overview() now accepts a method parameter instead of hardcoding
2x2 block averaging. Available methods:

- mean (default): nanmean of each 2x2 block, same as before
- nearest: top-left pixel of each block (no interpolation)
- min/max: nanmin/nanmax of each block
- median: nanmedian of each block
- mode: most frequent value per block (for classified rasters)
- cubic: scipy.ndimage.zoom with order=3 (requires scipy)

All methods work on both 2D and 3D (multi-band) arrays. Exposed via
overview_resampling= parameter on write() and write_geotiff().

12 new tests covering each method, NaN handling, multi-band, COG
round-trips with nearest and mode, the public API, and error on
invalid method names.
Adds support for TIFF resolution metadata used in print and
cartographic workflows:

- Tag 282 (XResolution): pixels per unit, stored as RATIONAL
- Tag 283 (YResolution): pixels per unit, stored as RATIONAL
- Tag 296 (ResolutionUnit): 1=none, 2=inch, 3=centimeter

Read: resolution values are stored in DataArray attrs as
x_resolution, y_resolution (float), and resolution_unit (string:
'none', 'inch', or 'centimeter').

Write: accepted as keyword args on write() and write_geotiff(),
or extracted automatically from DataArray attrs. Written as
RATIONAL tags (numerator/denominator pairs).

Also adds RATIONAL type serialization to the writer's tag encoder.

5 new tests: DPI round-trip, centimeter unit, no-resolution check,
DataArray attrs preservation, unit='none'.
…ical CRS

GeoInfo and DataArray attrs now include all commonly-used GeoKeys
parsed from the GeoKeyDirectory:

- crs_name: full CRS name (e.g. "NAD83 / UTM zone 18N")
- geog_citation: geographic CRS name (e.g. "WGS 84", "NAD83")
- datum_code: EPSG geodetic datum code
- angular_units / angular_units_code: e.g. "degree" (9102)
- linear_units / linear_units_code: e.g. "metre" (9001)
- semi_major_axis, inv_flattening: ellipsoid parameters
- projection_code: EPSG projection method code
- vertical_crs, vertical_citation, vertical_units: for compound CRS

EPSG unit codes are resolved to human-readable names via lookup
tables (ANGULAR_UNITS, LINEAR_UNITS). Raw geokeys dict is still
available for anything not covered by a named field.

6 new tests covering geographic and projected CRS extraction,
real-file verification, no-CRS baseline, and unit lookups.
_HTTPSource now uses a module-level urllib3.PoolManager that reuses
TCP connections (and TLS sessions) across range requests to the same
host. For a COG with 64 tiles, this eliminates 63 TCP handshakes.

On localhost: 1.7x faster for 16 range requests. Over real networks
the benefit is much larger since each TLS handshake adds 50-200ms.

Falls back to stdlib urllib.request if urllib3 is not installed.
The pool is created lazily on first use with retry support
(2 retries, 0.1s backoff).
CRS can now be specified as WKT strings, PROJ strings, or EPSG
integers. pyproj (lazy import) resolves between them:

Read side:
- crs_wkt attr is populated by resolving EPSG -> WKT via pyproj
- Falls back gracefully if pyproj is not installed (EPSG still works)

Write side:
- crs= parameter on write_geotiff accepts int (EPSG), WKT string,
  or PROJ string. String inputs are resolved to EPSG via
  pyproj.CRS.from_user_input().to_epsg().
- DataArray with crs_wkt attr (no integer crs) is also handled:
  the WKT is resolved to EPSG for the GeoKeyDirectory.

This means files with user-defined CRS no longer lose their spatial
reference when round-tripped, as long as pyproj can resolve the
WKT/PROJ string to an EPSG code.

5 new tests: WKT from EPSG, write with WKT string, write with PROJ
string, crs_wkt attr round-trip, and no-CRS baseline.
Band names, statistics, and other GDAL-specific metadata stored in
the GDALMetadata XML tag (42112) are now read, exposed, and written
back.

Read: the XML is parsed into a dict stored in attrs['gdal_metadata'].
Dataset-level items use string keys ('DataType'), per-band items use
(name, band_int) tuple keys (('STATISTICS_MAXIMUM', 0)). The raw XML
is also available in attrs['gdal_metadata_xml'].

Write: accepts gdal_metadata_xml on write(), or extracts from
DataArray attrs on write_geotiff(). If attrs has a gdal_metadata
dict but no raw XML, it's re-serialized automatically.

Round-trip verified on the USGS 1-meter DEM which has statistics,
layer type, and data type metadata -- all items survive intact.

7 new tests: XML parsing, dict serialization, file round-trip,
DataArray attrs preservation, no-metadata baseline, real-file read,
and real-file round-trip.
Any IFD tag that the writer doesn't explicitly manage (Software,
DateTime, ImageDescription, Copyright, custom private tags, etc.)
is now collected on read, stored in attrs['extra_tags'], and
re-emitted on write.

Read: extract_geo_info collects (tag_id, type_id, count, value)
tuples for all tags not in the _MANAGED_TAGS set (structural tags
that the writer builds from scratch: dimensions, compression,
offsets, geo tags, etc.). Stored in attrs['extra_tags'].

Write: extra_tags are appended to the IFD, skipping any tag_id
that was already written to avoid duplicates. The tag values are
serialized using the same type-aware encoder as built-in tags.

Tested with a hand-crafted TIFF containing Software (305) and
DateTime (306) tags. Both survive read -> write -> read intact.

3 new tests: read detection, round-trip preservation, and
no-extra-tags baseline.
The auto-detection now estimates total file size (header + IFDs +
overflow + pixel data) instead of only checking compressed pixel
data size, and compares against UINT32_MAX (4,294,967,295) instead
of a hardcoded 3.9 GB threshold.

Also adds a bigtiff= parameter to write() and write_geotiff():
- bigtiff=None (default): auto-detect based on estimated file size
- bigtiff=True: force BigTIFF even for small files
- bigtiff=False: force classic TIFF (user's responsibility if >4GB)

3 new tests: force BigTIFF via public API, small file stays classic,
force classic via bigtiff=False.
Big-endian TIFFs (byte order marker 'MM') now byte-swap pixel data
to native order after decompression. Previously, the reader did
.view(dtype) with a native-order dtype, producing garbage values
for multi-byte types (uint16, int32, float32, float64).

Fix: _decode_strip_or_tile uses dtype.newbyteorder(file_byte_order)
for the view, then .astype(native_dtype) if a swap is needed.
Single-byte types (uint8) need no swap. The COG HTTP reader path
has the same fix.

Also fixed the test conftest: make_minimal_tiff(big_endian=True) now
actually writes pixel bytes in big-endian order.

7 new tests: float32, uint16, int32, float64, uint8 (no swap),
windowed read, and public API -- all with big-endian TIFFs.
Read and write GeoTIFFs to/from cloud storage using fsspec as the
filesystem abstraction layer. Any URI with a :// scheme (that isn't
http/https) is routed through fsspec, which delegates to the
appropriate backend:

- s3://bucket/key.tif  (requires s3fs)
- gs://bucket/key.tif  (requires gcsfs)
- az://container/blob.tif  (requires adlfs)
- abfs://container/blob.tif  (requires adlfs)
- Any other fsspec-supported scheme (memory://, ftp://, etc.)

Read: _CloudSource uses fsspec.core.url_to_fs() then fs.open()
for full reads and range reads. Falls through to the same TIFF
parsing pipeline as local files.

Write: _write_bytes detects fsspec URIs and writes via fs.open()
instead of the local atomic-rename path (which doesn't apply to
cloud storage).

If fsspec or the backend library isn't installed, a clear ImportError
is raised with install instructions.

5 new tests using fsspec's memory:// filesystem for integration
testing without real cloud credentials.
Reads GDAL .vrt files by parsing the XML and assembling pixel data
from the referenced source GeoTIFFs using windowed reads.

Supported VRT features:
- SimpleSource: direct pixel copy with source/destination rects
- ComplexSource: scaling (ScaleRatio) and offset (ScaleOffset)
- Source nodata masking
- Multiple bands
- GeoTransform and SRS/CRS propagation
- Relative and absolute source file paths
- Windowed reads (only fetches overlapping source regions)

Usage:
    da = read_geotiff('mosaic.vrt')       # auto-detected by extension
    da = read_vrt('mosaic.vrt')           # explicit function
    da = read_vrt('mosaic.vrt', window=(0, 0, 100, 100))  # windowed

read_geotiff auto-detects .vrt files and routes them through the
VRT reader. The DataArray gets coordinates from the VRT's
GeoTransform and CRS from the SRS tag.

New module: _vrt.py with parse_vrt() and read_vrt() functions.

8 new tests: single tile, 2x1 mosaic, 2x2 mosaic, windowed read,
CRS propagation, nodata, read_vrt API, and XML parser unit test.
1. Band-first DataArray (CRITICAL): write_geotiff now detects
   (band, y, x) dimension order and transposes to (y, x, band).
   Prevents silent data corruption from rasterio-style arrays.

2. HTTP COG sub-byte support (CRITICAL): the COG HTTP reader now
   routes through _decode_strip_or_tile like the local readers,
   so 1-bit/4-bit/12-bit COGs over HTTP work correctly.

3. Dask VRT support (USEFUL): read_geotiff_dask detects .vrt files
   and reads eagerly then chunks, since VRT windowed reads need
   the virtual dataset's source layout.

4. VRT writer (USEFUL): write_vrt() generates a VRT XML file from
   multiple source GeoTIFFs, computing the mosaic layout from their
   geo transforms. Supports relative paths and CRS/nodata.

5. ExtraSamples tag (USEFUL): RGBA writes now include tag 338 with
   value 2 (unassociated alpha). Multi-band with >3 bands also
   gets ExtraSamples for bands beyond RGB.

6. MinIsWhite (USEFUL): photometric=0 (MinIsWhite) single-band
   files are now inverted on read so 0=black, 255=white. Integer
   values are inverted via max-value, floats via negation.

7. Post-write validation (POLISH): after writing, the header bytes
   are parsed to verify the output is a valid TIFF. Emits a
   warning if the header is corrupt.

8. Float16/bool auto-promotion (POLISH): float16 arrays are
   promoted to float32, bool arrays to uint8, instead of raising
   ValueError.

275 tests passing. 7 new tests for the fixes plus updated edge
case tests.
Removes the rioxarray dependency from all example notebooks:

- multispectral.ipynb: rioxarray.open_rasterio -> read_geotiff
- classification-methods.ipynb: same
- viewshed_gpu.ipynb: same
- 25_GLCM_Texture.ipynb: rioxarray.open_rasterio COG read ->
  read_geotiff with window= and band= parameters. Also removes
  GDAL-specific env vars (AWS_NO_SIGN_REQUEST, etc.) since our
  reader doesn't use GDAL.

Also updates reproject/_crs_utils.py to check attrs['crs'] and
attrs['crs_wkt'] (xrspatial.geotiff convention) before falling
back to .rio.crs (rioxarray). This means DataArrays from
read_geotiff work directly with xrspatial.reproject without
needing rioxarray installed.

The rioxarray fallback is kept in _crs_utils.py for backwards
compatibility with users who pass rioxarray-decorated DataArrays.
Both are now required (not optional):
- matplotlib: needed for palette colormap (ListedColormap) and
  da.xrs.plot() with palette TIFFs
- zstandard: needed for ZSTD compression (tag 50000), increasingly
  common in modern COGs

This fixes the CI failures where these packages weren't installed.
read_geotiff_gpu() decodes tiled GeoTIFFs on the GPU and returns a
CuPy-backed DataArray that stays on device memory. No CPU->GPU
transfer needed for downstream xrspatial GPU operations (slope,
aspect, hillshade, etc.).

CUDA kernels implemented:
- LZW decode: one thread block per tile, LZW table in shared memory
  (20KB per block, fast on-chip SRAM)
- Predictor decode (pred=2): one thread per row, horizontal cumsum
- Float predictor (pred=3): one thread per row, byte-lane undiff +
  un-transpose
- Tile assembly: one thread per pixel, copies from decompressed
  tile buffer to output image

Supports LZW and uncompressed tiled TIFFs. Falls back to CPU for
unsupported compression types or stripped files.

100% pixel-exact match with CPU reader on all tested files (USGS
LZW+pred3 3612x3612, synthetic LZW tiled).

Performance: GPU LZW is comparable to CPU (~330ms vs 270ms for
3612x3612) because LZW is inherently sequential per-stream. The
value is in keeping data on GPU for end-to-end pipelines without
CPU->GPU transfer overhead. Future work: CUDA inflate (deflate)
kernel would unlock the parallel decompression win since deflate
tiles are much more common in COGs.
Implements RFC 1951 deflate decompression as a Numba @cuda.jit kernel
for GPU-accelerated TIFF tile decoding. One thread block per tile,
all tiles decompress in parallel.

Supports all three deflate block types:
- BTYPE=0: stored (no compression)
- BTYPE=1: fixed Huffman codes
- BTYPE=2: dynamic Huffman codes (most common in real files)

Uses a two-level Huffman decode:
- Fast path: 10-bit shared-memory lookup table (1024 entries)
- Slow path: overflow array scan for codes > 10 bits (up to 15)
  Fixes the infinite loop bug where 14-bit lit/len codes exceeded
  the original 10-bit table size.

Tested: 100% pixel-exact match on Copernicus deflate+pred3 COG
(3600x3600, 16 tiles) vs CPU zlib.

Performance: GPU inflate is ~20x slower than CPU zlib for this file
size (16 tiles). Deflate is inherently sequential per-stream, so
each thread block runs a long serial loop while most SMs sit idle.
The value is keeping data on GPU for end-to-end pipelines. For
files with hundreds of tiles, the parallelism would help more.
gpu_decode_tiles() now tries kvikio.nvcomp.DeflateManager for batch
deflate decompression before falling back to the Numba CUDA inflate
kernel. nvCOMP is NVIDIA's optimized batched compression library
that decompresses all tiles in a single GPU API call.

Fallback chain for GPU decompression:
1. nvCOMP via kvikio (if installed) -- optimized CUDA kernels
2. Numba @cuda.jit inflate kernel -- pure Python/Numba implementation
3. CPU zlib fallback -- if GPU decode raises any error

kvikio is an optional dependency (pip install kvikio-cu12 or
conda install -c rapidsai kvikio). When not installed, the Numba
kernels are used transparently.
Fixed the nvCOMP C API ctypes binding to pass opts structs by value
using proper ctypes.Structure subclasses. The previous byte-array
approach caused the struct to be misinterpreted by nvCOMP.

Working: nvCOMP ZSTD batch decompress (nvcompBatchedZstdDecompressAsync)
- 100% pixel-exact match on all tested files
- 1.5x end-to-end speedup on 8192x8192 ZSTD with 1024 tiles
  (GPU pipeline: 404ms vs CPU+transfer: 620ms)

Not working on Ampere: nvCOMP deflate returns nvcompErrorNotSupported
(status 11). Deflate GPU decompression requires Ada Lovelace or
newer GPU with HW decompression engine. Falls back to the Numba
CUDA inflate kernel on Ampere.

nvCOMP is auto-detected by searching for libnvcomp.so in
CONDA_PREFIX and sibling conda environments. When found, ZSTD
tiles are batch-decompressed in a single GPU API call.
When kvikio is installed, read_geotiff_gpu() can read compressed tile
bytes directly from NVMe SSD to GPU VRAM via GPUDirect Storage,
bypassing CPU memory entirely:

  Normal:   SSD -> CPU (mmap) -> cupy.asarray (CPU->GPU copy)
  With GDS: SSD -> GPU VRAM (direct DMA, no CPU involved)

The full pipeline for a ZSTD COG with GDS + nvCOMP:
  SSD --(GDS)--> GPU compressed tiles --(nvCOMP)--> GPU decompressed
  --> GPU predictor decode --> GPU tile assembly --> CuPy DataArray

Fallback chain in read_geotiff_gpu:
1. KvikIO GDS file read + nvCOMP batch decompress (fastest)
2. CPU mmap tile extract + nvCOMP batch decompress
3. CPU mmap tile extract + Numba CUDA kernels
4. CPU read_to_array + cupy.asarray transfer (slowest)

Also adds:
- gpu_decode_tiles_from_file(): accepts file path + offsets
  instead of pre-extracted bytes, enabling the GDS path
- _try_nvcomp_from_device_bufs(): nvCOMP on tiles already in GPU
  memory (from GDS), avoiding a device-to-host round-trip
- _apply_predictor_and_assemble(): shared GPU post-processing
  used by both GDS and mmap paths

KvikIO is optional: conda install -c rapidsai kvikio
GDS requires: NVMe SSD + NVIDIA kernel module (nvidia-fs)
- GDS tile read: added sync + verification after each pread to catch
  partial reads and CUDA errors early. Catches exception and tries
  to reset CUDA state before falling back.
- gpu_decode_tiles: unsupported GPU codecs (ZSTD without nvCOMP, etc.)
  now decompress on CPU then transfer to GPU instead of raising
  ValueError. This keeps the predictor + assembly on GPU.
- Fixes cudaErrorIllegalAddress from kvikio version mismatch
  (26.02 C lib vs 26.06 Python bindings) by catching the error
  gracefully instead of poisoning the GPU state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant