Add lightweight GeoTIFF/COG reader and writer#1035
Open
brendancol wants to merge 30 commits intomasterfrom
Open
Add lightweight GeoTIFF/COG reader and writer#1035brendancol wants to merge 30 commits intomasterfrom
brendancol wants to merge 30 commits intomasterfrom
Conversation
Reads and writes GeoTIFF and Cloud Optimized GeoTIFF files using only numpy, numba, xarray, and the standard library. No GDAL required. What it does: - Parses TIFF/BigTIFF headers and IFDs via struct - Reads GeoTIFF tags: CRS (EPSG), affine transforms, GeoKeys, PixelIsArea vs PixelIsPoint - Deflate (zlib), LZW (Numba JIT), horizontal and floating-point predictor codecs - Strip and tiled layouts, windowed reads - COG writer with overview generation and IFD-first layout - HTTP range-request reader for remote COGs - mmap for local file access (zero-copy) - Nodata sentinel masking to NaN on read - Metadata round-trip: CRS, transform, nodata, raster type, pixels Read performance vs rioxarray/GDAL: - 5-7x faster on uncompressed data - On par for compressed COGs - 100% pixel-exact across all tested formats Write performance: - On par for uncompressed - 2-4x slower for compressed (zlib is C-native in GDAL) Tested against Landsat 8, Copernicus DEM, USGS 1-arc-second, and USGS 1-meter DEMs. 154 tests cover codecs, header parsing, geo metadata, round-trips across 8 dtypes x 3 compressions, edge cases (corrupt files, NaN/Inf, extreme shapes, PixelIsPoint), and the public API.
…rite Six features filling the main gaps for real-world use: 1. Multi-band write: 3D arrays (height, width, bands) now write as multi-band GeoTIFFs with correct BitsPerSample, SampleFormat, and PhotometricInterpretation (RGB for 3+ bands). Overviews work for multi-band too. read_geotiff returns all bands by default (band=None) with a 'band' dimension. 2. Integer nodata masking: uint8/uint16/int16 arrays with nodata values are promoted to float64 and masked with NaN on read, matching rioxarray behavior. Previously only float arrays were masked. 3. PackBits compression (tag 32773): simple RLE codec, both read and write. Common in older TIFF files. 4. JPEG decompression (tag 7): read support via Pillow for JPEG-compressed tiles/strips. Import is optional and lazy. 5. BigTIFF write: auto-detects when output exceeds ~4GB and switches to BigTIFF format (16-byte header, 20-byte IFD entries, 8-byte offsets). Prevents silent offset overflow corruption on large files. 6. Dask lazy reads: read_geotiff_dask() returns a dask-backed DataArray using windowed reads per chunk. Works for single-band and multi-band files with nodata masking per chunk. 178 tests passing.
Strip-based windowed reads now only decompress strips that overlap the requested row range. Previously, all strips were decompressed into a full image buffer and then sliced. For a 4096x512 deflate file with 256-row strips, reading a 10x10 window from the top-left goes from 31 ms to 1.9 ms (16x). On a 100,000-row file the savings scale linearly with the number of strips skipped.
Read and write Zstandard-compressed GeoTIFFs using the zstandard package (lazy import, clear error if missing). On a 2048x2048 float32 raster, ZSTD vs deflate: - Write: 39 ms vs 420 ms (10.7x faster) - Read: 14 ms vs 66 ms (4.7x faster) - Size: 15.5 MB vs 15.5 MB (comparable) 9 new tests covering codec round-trips, stripped/tiled layouts, uint16, predictor, multi-band, and the public API.
TIFF PlanarConfiguration=2 stores each band as a separate set of strips or tiles (RRR...GGG...BBB) instead of interleaved (RGBRGB...). The reader now detects this from the IFD and iterates band-by-band through the strip/tile offset array, placing each single-band chunk into the correct slice of the output. Both strip and tile layouts are handled. Windowed reads and single- band selection work correctly with planar files. 6 new tests: planar strips (RGB, 2-band), planar tiles, windowed read, band selection, and public API.
Adds read support for non-byte-aligned pixel data, common in bilevel masks (1-bit), palette images (4-bit), and medical/scientific sensors (12-bit). Changes: - _dtypes.py: Map 1/2/4-bit to uint8 and 12-bit to uint16 - _compression.py: Add unpack_bits() for MSB-first bit unpacking at 1, 2, 4, and 12 bits per sample - _reader.py: Add _decode_strip_or_tile() helper that handles the full decompress -> predictor -> unpack -> reshape pipeline, detecting sub-byte depths automatically. Both strip and tile readers refactored to use it. 7 new tests: 1-bit bilevel, 1-bit non-byte-aligned width, 4-bit, 4-bit odd width, 12-bit, and direct codec tests.
Reads TIFF files with Photometric=3 (Palette) and ColorMap tag (320).
The TIFF color table (uint16 R/G/B arrays) is converted to a
matplotlib ListedColormap and stored in da.attrs['cmap'].
New plot_geotiff() convenience function uses the embedded colormap
with BoundaryNorm so that integer class indices map to the correct
palette colors when plotted. Works out of the box:
da = read_geotiff('landcover.tif')
plot_geotiff(da) # colors match the TIFF's palette
Also stores raw RGBA tuples in attrs['colormap_rgba'] for custom use.
Supports both 8-bit (256-color) and 4-bit (16-color) palettes.
5 new tests: 8-bit palette read, 4-bit palette, colormap object
verification, plot_geotiff smoke test, and non-palette attr check.
The .xrs accessor (registered on all DataArrays by xrspatial) now has
a plot() method that checks for an embedded TIFF colormap in attrs.
If present, it applies BoundaryNorm with the ListedColormap so that
integer class indices map to the correct palette colors.
da = read_geotiff('landcover.tif')
da.xrs.plot() # palette colors applied automatically
For non-palette DataArrays, falls through to the standard da.plot().
The old plot_geotiff() function is kept as a thin wrapper.
Multiple threads reading the same file now share a single read-only mmap instead of each opening their own. A module-level _MmapCache protected by a threading.Lock manages reference counts per file path. The mmap is closed when the last reader releases it. Read-only mmap slicing (which is what the strip/tile readers do) is thread-safe at the OS level -- no seek or file position involved. Tested with 16 concurrent threads reading different windows from the same deflate+tiled file, and a stress test of 400 reads across 8 threads. Zero errors, cache drains properly. For dask lazy reads, this means all chunk-read tasks for the same file share one mmap instead of opening/closing the file per chunk.
Writes now go to a temporary file in the same directory, then os.replace() atomically swaps it over the target path. This gives: - No interleaved output when multiple threads write the same path - Readers never see a half-written file - No corrupt file left behind if the process crashes mid-write - Temp file cleaned up on any exception os.replace is atomic on POSIX (single rename syscall) and near-atomic on Windows (ReplaceFile).
_make_overview() now accepts a method parameter instead of hardcoding 2x2 block averaging. Available methods: - mean (default): nanmean of each 2x2 block, same as before - nearest: top-left pixel of each block (no interpolation) - min/max: nanmin/nanmax of each block - median: nanmedian of each block - mode: most frequent value per block (for classified rasters) - cubic: scipy.ndimage.zoom with order=3 (requires scipy) All methods work on both 2D and 3D (multi-band) arrays. Exposed via overview_resampling= parameter on write() and write_geotiff(). 12 new tests covering each method, NaN handling, multi-band, COG round-trips with nearest and mode, the public API, and error on invalid method names.
Adds support for TIFF resolution metadata used in print and cartographic workflows: - Tag 282 (XResolution): pixels per unit, stored as RATIONAL - Tag 283 (YResolution): pixels per unit, stored as RATIONAL - Tag 296 (ResolutionUnit): 1=none, 2=inch, 3=centimeter Read: resolution values are stored in DataArray attrs as x_resolution, y_resolution (float), and resolution_unit (string: 'none', 'inch', or 'centimeter'). Write: accepted as keyword args on write() and write_geotiff(), or extracted automatically from DataArray attrs. Written as RATIONAL tags (numerator/denominator pairs). Also adds RATIONAL type serialization to the writer's tag encoder. 5 new tests: DPI round-trip, centimeter unit, no-resolution check, DataArray attrs preservation, unit='none'.
…ical CRS GeoInfo and DataArray attrs now include all commonly-used GeoKeys parsed from the GeoKeyDirectory: - crs_name: full CRS name (e.g. "NAD83 / UTM zone 18N") - geog_citation: geographic CRS name (e.g. "WGS 84", "NAD83") - datum_code: EPSG geodetic datum code - angular_units / angular_units_code: e.g. "degree" (9102) - linear_units / linear_units_code: e.g. "metre" (9001) - semi_major_axis, inv_flattening: ellipsoid parameters - projection_code: EPSG projection method code - vertical_crs, vertical_citation, vertical_units: for compound CRS EPSG unit codes are resolved to human-readable names via lookup tables (ANGULAR_UNITS, LINEAR_UNITS). Raw geokeys dict is still available for anything not covered by a named field. 6 new tests covering geographic and projected CRS extraction, real-file verification, no-CRS baseline, and unit lookups.
_HTTPSource now uses a module-level urllib3.PoolManager that reuses TCP connections (and TLS sessions) across range requests to the same host. For a COG with 64 tiles, this eliminates 63 TCP handshakes. On localhost: 1.7x faster for 16 range requests. Over real networks the benefit is much larger since each TLS handshake adds 50-200ms. Falls back to stdlib urllib.request if urllib3 is not installed. The pool is created lazily on first use with retry support (2 retries, 0.1s backoff).
CRS can now be specified as WKT strings, PROJ strings, or EPSG integers. pyproj (lazy import) resolves between them: Read side: - crs_wkt attr is populated by resolving EPSG -> WKT via pyproj - Falls back gracefully if pyproj is not installed (EPSG still works) Write side: - crs= parameter on write_geotiff accepts int (EPSG), WKT string, or PROJ string. String inputs are resolved to EPSG via pyproj.CRS.from_user_input().to_epsg(). - DataArray with crs_wkt attr (no integer crs) is also handled: the WKT is resolved to EPSG for the GeoKeyDirectory. This means files with user-defined CRS no longer lose their spatial reference when round-tripped, as long as pyproj can resolve the WKT/PROJ string to an EPSG code. 5 new tests: WKT from EPSG, write with WKT string, write with PROJ string, crs_wkt attr round-trip, and no-CRS baseline.
Band names, statistics, and other GDAL-specific metadata stored in
the GDALMetadata XML tag (42112) are now read, exposed, and written
back.
Read: the XML is parsed into a dict stored in attrs['gdal_metadata'].
Dataset-level items use string keys ('DataType'), per-band items use
(name, band_int) tuple keys (('STATISTICS_MAXIMUM', 0)). The raw XML
is also available in attrs['gdal_metadata_xml'].
Write: accepts gdal_metadata_xml on write(), or extracts from
DataArray attrs on write_geotiff(). If attrs has a gdal_metadata
dict but no raw XML, it's re-serialized automatically.
Round-trip verified on the USGS 1-meter DEM which has statistics,
layer type, and data type metadata -- all items survive intact.
7 new tests: XML parsing, dict serialization, file round-trip,
DataArray attrs preservation, no-metadata baseline, real-file read,
and real-file round-trip.
Any IFD tag that the writer doesn't explicitly manage (Software, DateTime, ImageDescription, Copyright, custom private tags, etc.) is now collected on read, stored in attrs['extra_tags'], and re-emitted on write. Read: extract_geo_info collects (tag_id, type_id, count, value) tuples for all tags not in the _MANAGED_TAGS set (structural tags that the writer builds from scratch: dimensions, compression, offsets, geo tags, etc.). Stored in attrs['extra_tags']. Write: extra_tags are appended to the IFD, skipping any tag_id that was already written to avoid duplicates. The tag values are serialized using the same type-aware encoder as built-in tags. Tested with a hand-crafted TIFF containing Software (305) and DateTime (306) tags. Both survive read -> write -> read intact. 3 new tests: read detection, round-trip preservation, and no-extra-tags baseline.
The auto-detection now estimates total file size (header + IFDs + overflow + pixel data) instead of only checking compressed pixel data size, and compares against UINT32_MAX (4,294,967,295) instead of a hardcoded 3.9 GB threshold. Also adds a bigtiff= parameter to write() and write_geotiff(): - bigtiff=None (default): auto-detect based on estimated file size - bigtiff=True: force BigTIFF even for small files - bigtiff=False: force classic TIFF (user's responsibility if >4GB) 3 new tests: force BigTIFF via public API, small file stays classic, force classic via bigtiff=False.
Big-endian TIFFs (byte order marker 'MM') now byte-swap pixel data to native order after decompression. Previously, the reader did .view(dtype) with a native-order dtype, producing garbage values for multi-byte types (uint16, int32, float32, float64). Fix: _decode_strip_or_tile uses dtype.newbyteorder(file_byte_order) for the view, then .astype(native_dtype) if a swap is needed. Single-byte types (uint8) need no swap. The COG HTTP reader path has the same fix. Also fixed the test conftest: make_minimal_tiff(big_endian=True) now actually writes pixel bytes in big-endian order. 7 new tests: float32, uint16, int32, float64, uint8 (no swap), windowed read, and public API -- all with big-endian TIFFs.
Read and write GeoTIFFs to/from cloud storage using fsspec as the filesystem abstraction layer. Any URI with a :// scheme (that isn't http/https) is routed through fsspec, which delegates to the appropriate backend: - s3://bucket/key.tif (requires s3fs) - gs://bucket/key.tif (requires gcsfs) - az://container/blob.tif (requires adlfs) - abfs://container/blob.tif (requires adlfs) - Any other fsspec-supported scheme (memory://, ftp://, etc.) Read: _CloudSource uses fsspec.core.url_to_fs() then fs.open() for full reads and range reads. Falls through to the same TIFF parsing pipeline as local files. Write: _write_bytes detects fsspec URIs and writes via fs.open() instead of the local atomic-rename path (which doesn't apply to cloud storage). If fsspec or the backend library isn't installed, a clear ImportError is raised with install instructions. 5 new tests using fsspec's memory:// filesystem for integration testing without real cloud credentials.
Reads GDAL .vrt files by parsing the XML and assembling pixel data
from the referenced source GeoTIFFs using windowed reads.
Supported VRT features:
- SimpleSource: direct pixel copy with source/destination rects
- ComplexSource: scaling (ScaleRatio) and offset (ScaleOffset)
- Source nodata masking
- Multiple bands
- GeoTransform and SRS/CRS propagation
- Relative and absolute source file paths
- Windowed reads (only fetches overlapping source regions)
Usage:
da = read_geotiff('mosaic.vrt') # auto-detected by extension
da = read_vrt('mosaic.vrt') # explicit function
da = read_vrt('mosaic.vrt', window=(0, 0, 100, 100)) # windowed
read_geotiff auto-detects .vrt files and routes them through the
VRT reader. The DataArray gets coordinates from the VRT's
GeoTransform and CRS from the SRS tag.
New module: _vrt.py with parse_vrt() and read_vrt() functions.
8 new tests: single tile, 2x1 mosaic, 2x2 mosaic, windowed read,
CRS propagation, nodata, read_vrt API, and XML parser unit test.
1. Band-first DataArray (CRITICAL): write_geotiff now detects (band, y, x) dimension order and transposes to (y, x, band). Prevents silent data corruption from rasterio-style arrays. 2. HTTP COG sub-byte support (CRITICAL): the COG HTTP reader now routes through _decode_strip_or_tile like the local readers, so 1-bit/4-bit/12-bit COGs over HTTP work correctly. 3. Dask VRT support (USEFUL): read_geotiff_dask detects .vrt files and reads eagerly then chunks, since VRT windowed reads need the virtual dataset's source layout. 4. VRT writer (USEFUL): write_vrt() generates a VRT XML file from multiple source GeoTIFFs, computing the mosaic layout from their geo transforms. Supports relative paths and CRS/nodata. 5. ExtraSamples tag (USEFUL): RGBA writes now include tag 338 with value 2 (unassociated alpha). Multi-band with >3 bands also gets ExtraSamples for bands beyond RGB. 6. MinIsWhite (USEFUL): photometric=0 (MinIsWhite) single-band files are now inverted on read so 0=black, 255=white. Integer values are inverted via max-value, floats via negation. 7. Post-write validation (POLISH): after writing, the header bytes are parsed to verify the output is a valid TIFF. Emits a warning if the header is corrupt. 8. Float16/bool auto-promotion (POLISH): float16 arrays are promoted to float32, bool arrays to uint8, instead of raising ValueError. 275 tests passing. 7 new tests for the fixes plus updated edge case tests.
Removes the rioxarray dependency from all example notebooks: - multispectral.ipynb: rioxarray.open_rasterio -> read_geotiff - classification-methods.ipynb: same - viewshed_gpu.ipynb: same - 25_GLCM_Texture.ipynb: rioxarray.open_rasterio COG read -> read_geotiff with window= and band= parameters. Also removes GDAL-specific env vars (AWS_NO_SIGN_REQUEST, etc.) since our reader doesn't use GDAL. Also updates reproject/_crs_utils.py to check attrs['crs'] and attrs['crs_wkt'] (xrspatial.geotiff convention) before falling back to .rio.crs (rioxarray). This means DataArrays from read_geotiff work directly with xrspatial.reproject without needing rioxarray installed. The rioxarray fallback is kept in _crs_utils.py for backwards compatibility with users who pass rioxarray-decorated DataArrays.
Both are now required (not optional): - matplotlib: needed for palette colormap (ListedColormap) and da.xrs.plot() with palette TIFFs - zstandard: needed for ZSTD compression (tag 50000), increasingly common in modern COGs This fixes the CI failures where these packages weren't installed.
read_geotiff_gpu() decodes tiled GeoTIFFs on the GPU and returns a CuPy-backed DataArray that stays on device memory. No CPU->GPU transfer needed for downstream xrspatial GPU operations (slope, aspect, hillshade, etc.). CUDA kernels implemented: - LZW decode: one thread block per tile, LZW table in shared memory (20KB per block, fast on-chip SRAM) - Predictor decode (pred=2): one thread per row, horizontal cumsum - Float predictor (pred=3): one thread per row, byte-lane undiff + un-transpose - Tile assembly: one thread per pixel, copies from decompressed tile buffer to output image Supports LZW and uncompressed tiled TIFFs. Falls back to CPU for unsupported compression types or stripped files. 100% pixel-exact match with CPU reader on all tested files (USGS LZW+pred3 3612x3612, synthetic LZW tiled). Performance: GPU LZW is comparable to CPU (~330ms vs 270ms for 3612x3612) because LZW is inherently sequential per-stream. The value is in keeping data on GPU for end-to-end pipelines without CPU->GPU transfer overhead. Future work: CUDA inflate (deflate) kernel would unlock the parallel decompression win since deflate tiles are much more common in COGs.
Implements RFC 1951 deflate decompression as a Numba @cuda.jit kernel for GPU-accelerated TIFF tile decoding. One thread block per tile, all tiles decompress in parallel. Supports all three deflate block types: - BTYPE=0: stored (no compression) - BTYPE=1: fixed Huffman codes - BTYPE=2: dynamic Huffman codes (most common in real files) Uses a two-level Huffman decode: - Fast path: 10-bit shared-memory lookup table (1024 entries) - Slow path: overflow array scan for codes > 10 bits (up to 15) Fixes the infinite loop bug where 14-bit lit/len codes exceeded the original 10-bit table size. Tested: 100% pixel-exact match on Copernicus deflate+pred3 COG (3600x3600, 16 tiles) vs CPU zlib. Performance: GPU inflate is ~20x slower than CPU zlib for this file size (16 tiles). Deflate is inherently sequential per-stream, so each thread block runs a long serial loop while most SMs sit idle. The value is keeping data on GPU for end-to-end pipelines. For files with hundreds of tiles, the parallelism would help more.
gpu_decode_tiles() now tries kvikio.nvcomp.DeflateManager for batch deflate decompression before falling back to the Numba CUDA inflate kernel. nvCOMP is NVIDIA's optimized batched compression library that decompresses all tiles in a single GPU API call. Fallback chain for GPU decompression: 1. nvCOMP via kvikio (if installed) -- optimized CUDA kernels 2. Numba @cuda.jit inflate kernel -- pure Python/Numba implementation 3. CPU zlib fallback -- if GPU decode raises any error kvikio is an optional dependency (pip install kvikio-cu12 or conda install -c rapidsai kvikio). When not installed, the Numba kernels are used transparently.
Fixed the nvCOMP C API ctypes binding to pass opts structs by value using proper ctypes.Structure subclasses. The previous byte-array approach caused the struct to be misinterpreted by nvCOMP. Working: nvCOMP ZSTD batch decompress (nvcompBatchedZstdDecompressAsync) - 100% pixel-exact match on all tested files - 1.5x end-to-end speedup on 8192x8192 ZSTD with 1024 tiles (GPU pipeline: 404ms vs CPU+transfer: 620ms) Not working on Ampere: nvCOMP deflate returns nvcompErrorNotSupported (status 11). Deflate GPU decompression requires Ada Lovelace or newer GPU with HW decompression engine. Falls back to the Numba CUDA inflate kernel on Ampere. nvCOMP is auto-detected by searching for libnvcomp.so in CONDA_PREFIX and sibling conda environments. When found, ZSTD tiles are batch-decompressed in a single GPU API call.
When kvikio is installed, read_geotiff_gpu() can read compressed tile bytes directly from NVMe SSD to GPU VRAM via GPUDirect Storage, bypassing CPU memory entirely: Normal: SSD -> CPU (mmap) -> cupy.asarray (CPU->GPU copy) With GDS: SSD -> GPU VRAM (direct DMA, no CPU involved) The full pipeline for a ZSTD COG with GDS + nvCOMP: SSD --(GDS)--> GPU compressed tiles --(nvCOMP)--> GPU decompressed --> GPU predictor decode --> GPU tile assembly --> CuPy DataArray Fallback chain in read_geotiff_gpu: 1. KvikIO GDS file read + nvCOMP batch decompress (fastest) 2. CPU mmap tile extract + nvCOMP batch decompress 3. CPU mmap tile extract + Numba CUDA kernels 4. CPU read_to_array + cupy.asarray transfer (slowest) Also adds: - gpu_decode_tiles_from_file(): accepts file path + offsets instead of pre-extracted bytes, enabling the GDS path - _try_nvcomp_from_device_bufs(): nvCOMP on tiles already in GPU memory (from GDS), avoiding a device-to-host round-trip - _apply_predictor_and_assemble(): shared GPU post-processing used by both GDS and mmap paths KvikIO is optional: conda install -c rapidsai kvikio GDS requires: NVMe SSD + NVIDIA kernel module (nvidia-fs)
- GDS tile read: added sync + verification after each pread to catch partial reads and CUDA errors early. Catches exception and tries to reset CUDA state before falling back. - gpu_decode_tiles: unsupported GPU codecs (ZSTD without nvCOMP, etc.) now decompress on CPU then transfer to GPU instead of raising ValueError. This keeps the predictor + assembly on GPU. - Fixes cudaErrorIllegalAddress from kvikio version mismatch (26.02 C lib vs 26.06 Python bindings) by catching the error gracefully instead of poisoning the GPU state.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
xrspatial/geotiff/subpackage: read and write GeoTIFF and COG files without GDALTest plan
pytest xrspatial/geotiff/tests/passes all 154 testsbench_vs_rioxarray.py)