[GH-2662] Implement a pure Java single-thread COG writer#2663
[GH-2662] Implement a pure Java single-thread COG writer#2663
Conversation
- TiffIfdParser: extracts IFD structure from TIFF byte arrays - CogAssembler: reassembles parsed IFDs into COG byte order (ported from GeoTrellis) - CogWriter: orchestrates decimation, overview generation, tiled writing, and COG assembly - RasterOutputs: adds asCloudOptimizedGeoTiff() public API - CogWriterTest: 8 unit tests covering decimation, overview, round-trip, multiband, compression
- Remove unused ReferencedEnvelope variable (Finding 5) - Remove unused isOverview parameter from writeAsTiledGeoTiff (Finding 2) - Add input validation: compressionQuality range, tileSize positive & power-of-2 (Finding 3) - Add parser bounds checks: IFD offset, entries length, overflow/image data ranges (Finding 4) - Inject NewSubfileType tag (254=1) into overview IFDs when missing (Finding 1) - Add 4 new tests: NewSubfileType presence, invalid inputs, malformed TIFF rejection, forward-pointing tile offsets
- CogOptions: immutable builder with compression, compressionQuality, tileSize, resampling (Nearest/Bilinear/Bicubic), and overviewCount - CogWriter: new write(raster, CogOptions) primary method; existing overloads delegate to it; generateOverview accepts Interpolation param - RasterOutputs: new asCloudOptimizedGeoTiff(raster, CogOptions) overload - 13 new tests (25 total): builder validation, resampling modes, overviewCount=0/1, tileSize=512, RasterOutputs CogOptions path
There was a problem hiding this comment.
Pull request overview
Adds a pure-Java Cloud Optimized GeoTIFF (COG) writer implementation to Sedona’s common raster module, exposing new RasterOutputs entrypoints and introducing a TIFF-IFD parser + COG byte-layout assembler.
Changes:
- Introduces
CogWriter,CogAssembler,TiffIfdParser, andCogOptionsto generate COGs fromGridCoverage2D. - Adds
RasterOutputs.asCloudOptimizedGeoTiff(...)public APIs that delegate to the new writer. - Adds
CogWriterTestcoverage for overview generation, assembly structure, and option validation.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| common/src/main/java/org/apache/sedona/common/raster/cog/CogWriter.java | Implements overview computation/resampling and writes tiled GeoTIFF levels for later COG assembly. |
| common/src/main/java/org/apache/sedona/common/raster/cog/CogAssembler.java | Rebuilds multi-level TIFFs into COG-compliant byte layout and patches IFD offsets/tags. |
| common/src/main/java/org/apache/sedona/common/raster/cog/TiffIfdParser.java | Parses TIFF header/IFD/overflow/image segment regions needed for assembly. |
| common/src/main/java/org/apache/sedona/common/raster/cog/CogOptions.java | Adds an immutable options/builder API for controlling compression/tiling/resampling/overviews. |
| common/src/main/java/org/apache/sedona/common/raster/RasterOutputs.java | Exposes new public asCloudOptimizedGeoTiff overloads to produce COG bytes. |
| common/src/test/java/org/apache/sedona/common/raster/cog/CogWriterTest.java | Adds unit tests validating decimations, overview creation, COG layout properties, and options validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Verify all IFDs are before image data (forward-pointing) | ||
| // The last IFD end should be well before the end of the file | ||
| assertTrue("IFD region should be at start of file", lastIfdEnd < cogBytes.length / 2); | ||
| } |
There was a problem hiding this comment.
This test’s “IFD region should be at start of file” assertion uses lastIfdEnd < cogBytes.length / 2, which can be flaky for highly compressible inputs (image data can be smaller than metadata/IFDs). Consider computing the minimum TileOffsets/StripOffsets across all IFDs and asserting lastIfdEnd <= minOffset instead of comparing to half the file length.
| GridGeometry2D gridGeometry = | ||
| new GridGeometry2D( | ||
| new GridEnvelope2D(0, 0, newWidth, newHeight), | ||
| PixelInCell.CELL_CORNER, | ||
| newTransform, | ||
| crs, | ||
| null); |
There was a problem hiding this comment.
generateOverview hard-codes PixelInCell.CELL_CORNER when constructing the target GridGeometry2D. Sedona can create rasters with PixelInCell.CELL_CENTER (e.g., NetCDF reader / some constructors), so this can shift the georeferencing of overviews by half a pixel. Preserve the input raster’s pixel anchor when building the overview grid geometry (or derive the target transform via the existing GridGeometry APIs rather than forcing CELL_CORNER).
| ByteArrayOutputStream out = new ByteArrayOutputStream(); | ||
| GridCoverageWriter writer = new GeoTiffWriter(out); | ||
|
|
||
| ParameterValueGroup defaultParams = writer.getFormat().getWriteParameters(); | ||
| GeoTiffWriteParams params = new GeoTiffWriteParams(); | ||
|
|
||
| // Set tiling — must use the 2-arg overload from GeoToolsWriteParams | ||
| // which delegates to the inner write param. The 4-arg ImageWriteParam.setTiling() | ||
| // writes to the wrong fields (parent vs inner param). | ||
| params.setTilingMode(ImageWriteParam.MODE_EXPLICIT); | ||
| params.setTiling(tileSize, tileSize); | ||
|
|
||
| // Set compression | ||
| params.setCompressionMode(ImageWriteParam.MODE_EXPLICIT); | ||
| params.setCompressionType(compressionType); | ||
| params.setCompressionQuality((float) compressionQuality); | ||
|
|
||
| defaultParams | ||
| .parameter(AbstractGridFormat.GEOTOOLS_WRITE_PARAMS.getName().toString()) | ||
| .setValue(params); | ||
|
|
||
| GeneralParameterValue[] wps = defaultParams.values().toArray(new GeneralParameterValue[0]); | ||
|
|
||
| writer.write(raster, wps); | ||
| writer.dispose(); | ||
| out.close(); | ||
|
|
||
| return out.toByteArray(); |
There was a problem hiding this comment.
writeAsTiledGeoTiff doesn’t guarantee writer.dispose() runs if writer.write(...) throws, which can leak native resources/file handles in GeoTools. Wrap the writer lifecycle in try/finally (or try-with-resources if applicable) so dispose() is always called on failure paths.
| ByteArrayOutputStream out = new ByteArrayOutputStream(); | |
| GridCoverageWriter writer = new GeoTiffWriter(out); | |
| ParameterValueGroup defaultParams = writer.getFormat().getWriteParameters(); | |
| GeoTiffWriteParams params = new GeoTiffWriteParams(); | |
| // Set tiling — must use the 2-arg overload from GeoToolsWriteParams | |
| // which delegates to the inner write param. The 4-arg ImageWriteParam.setTiling() | |
| // writes to the wrong fields (parent vs inner param). | |
| params.setTilingMode(ImageWriteParam.MODE_EXPLICIT); | |
| params.setTiling(tileSize, tileSize); | |
| // Set compression | |
| params.setCompressionMode(ImageWriteParam.MODE_EXPLICIT); | |
| params.setCompressionType(compressionType); | |
| params.setCompressionQuality((float) compressionQuality); | |
| defaultParams | |
| .parameter(AbstractGridFormat.GEOTOOLS_WRITE_PARAMS.getName().toString()) | |
| .setValue(params); | |
| GeneralParameterValue[] wps = defaultParams.values().toArray(new GeneralParameterValue[0]); | |
| writer.write(raster, wps); | |
| writer.dispose(); | |
| out.close(); | |
| return out.toByteArray(); | |
| byte[] result; | |
| try (ByteArrayOutputStream out = new ByteArrayOutputStream()) { | |
| GridCoverageWriter writer = null; | |
| try { | |
| writer = new GeoTiffWriter(out); | |
| ParameterValueGroup defaultParams = writer.getFormat().getWriteParameters(); | |
| GeoTiffWriteParams params = new GeoTiffWriteParams(); | |
| // Set tiling — must use the 2-arg overload from GeoToolsWriteParams | |
| // which delegates to the inner write param. The 4-arg ImageWriteParam.setTiling() | |
| // writes to the wrong fields (parent vs inner param). | |
| params.setTilingMode(ImageWriteParam.MODE_EXPLICIT); | |
| params.setTiling(tileSize, tileSize); | |
| // Set compression | |
| params.setCompressionMode(ImageWriteParam.MODE_EXPLICIT); | |
| params.setCompressionType(compressionType); | |
| params.setCompressionQuality((float) compressionQuality); | |
| defaultParams | |
| .parameter(AbstractGridFormat.GEOTOOLS_WRITE_PARAMS.getName().toString()) | |
| .setValue(params); | |
| GeneralParameterValue[] wps = | |
| defaultParams.values().toArray(new GeneralParameterValue[0]); | |
| writer.write(raster, wps); | |
| } finally { | |
| if (writer != null) { | |
| writer.dispose(); | |
| } | |
| } | |
| result = out.toByteArray(); | |
| } | |
| return result; |
| int fieldType = buf.getShort(entryOffset + 2) & 0xFFFF; | ||
| int count = buf.getInt(entryOffset + 4); | ||
| int valueSize = count * getFieldTypeSize(fieldType); | ||
|
|
||
| if (tag == TAG_TILE_OFFSETS || tag == TAG_STRIP_OFFSETS) { | ||
| offsetsTag = tag; | ||
| segmentCount = count; | ||
| } else if (tag == TAG_TILE_BYTE_COUNTS || tag == TAG_STRIP_BYTE_COUNTS) { | ||
| byteCountsTag = tag; | ||
| } else if (tag == TAG_NEW_SUBFILE_TYPE) { | ||
| hasNewSubfileType = true; | ||
| } | ||
|
|
||
| // Track overflow data region (values > 4 bytes stored outside IFD entries) | ||
| if (valueSize > 4) { | ||
| int valOffset = buf.getInt(entryOffset + 8); | ||
| if (valOffset < 0 || valOffset + valueSize > tiffBytes.length) { | ||
| throw new IllegalArgumentException( | ||
| "Overflow data for tag " | ||
| + tag | ||
| + " out of range: offset=" | ||
| + valOffset | ||
| + " size=" | ||
| + valueSize | ||
| + " fileSize=" | ||
| + tiffBytes.length); | ||
| } | ||
| overflowStart = Math.min(overflowStart, valOffset); | ||
| overflowEnd = Math.max(overflowEnd, valOffset + valueSize); | ||
| } |
There was a problem hiding this comment.
TiffIfdParser.parse treats the TIFF count field as a signed int and computes valueSize as count * typeSize in an int. For large/unsigned counts this can overflow to a negative valueSize, which can bypass the range checks and lead to unexpected IndexOutOfBoundsException/DoS instead of a controlled IllegalArgumentException. Use unsigned/long arithmetic for count/valueSize and explicitly reject counts that are negative or would exceed the input length.
| // Read segment offsets and byte counts | ||
| int[] segmentOffsets = readIntArray(buf, tiffBytes, entriesStart, tagCount, offsetsTag); | ||
| int[] segmentByteCounts = readIntArray(buf, tiffBytes, entriesStart, tagCount, byteCountsTag); | ||
|
|
||
| // Extract overflow data | ||
| byte[] overflowData; | ||
| int overflowDataStart; | ||
| if (overflowStart < Integer.MAX_VALUE) { | ||
| overflowDataStart = overflowStart; | ||
| overflowData = new byte[overflowEnd - overflowStart]; | ||
| System.arraycopy(tiffBytes, overflowStart, overflowData, 0, overflowData.length); | ||
| } else { | ||
| overflowDataStart = 0; | ||
| overflowData = new byte[0]; | ||
| } | ||
|
|
||
| // Find image data bounds | ||
| int imageDataStart = Integer.MAX_VALUE; | ||
| int imageDataEnd = 0; | ||
| for (int i = 0; i < segmentCount; i++) { | ||
| imageDataStart = Math.min(imageDataStart, segmentOffsets[i]); | ||
| imageDataEnd = Math.max(imageDataEnd, segmentOffsets[i] + segmentByteCounts[i]); | ||
| } |
There was a problem hiding this comment.
After reading segmentOffsets and segmentByteCounts, the code assumes both arrays have the same length (segmentCount) and later indexes them together. If a malformed TIFF provides mismatched counts for offsets vs byte-counts, this can throw ArrayIndexOutOfBoundsException instead of a clear validation error. Add an explicit length/count equality check and throw IllegalArgumentException when they differ.
| // Handle TileOffsets/StripOffsets — rewrite to point to new image data location | ||
| if (tag == TiffIfdParser.TAG_TILE_OFFSETS || tag == TiffIfdParser.TAG_STRIP_OFFSETS) { | ||
| if (count == 1 && valueSize <= 4) { | ||
| // Single segment: offset stored inline | ||
| buf.putInt(offset + 8, newImageDataStart + pt.segmentOffsets[0]); | ||
| } else { | ||
| // Multiple segments: the entry points to an overflow array. | ||
| // We need to rewrite the overflow array with new absolute offsets. | ||
| // First, rebase the pointer to the overflow data. | ||
| int origPointer = buf.getInt(offset + 8); | ||
| int newPointer = origPointer + overflowDelta; | ||
| buf.putInt(offset + 8, newPointer); | ||
|
|
||
| // Now patch the overflow data array with new image data offsets | ||
| int overflowArrayOffset = origPointer - pt.overflowDataStart; | ||
| ByteBuffer overflowBuf = ByteBuffer.wrap(pt.overflowData).order(byteOrder); | ||
| for (int j = 0; j < count; j++) { | ||
| int newSegmentOffset = newImageDataStart + pt.segmentOffsets[j]; | ||
| overflowBuf.putInt(overflowArrayOffset + j * 4, newSegmentOffset); | ||
| } | ||
| } |
There was a problem hiding this comment.
patchIfdEntries mutates pt.overflowData in place when rewriting TileOffsets/StripOffsets arrays. Since assemble(...) takes ParsedTiff objects from the caller, this creates surprising side effects (the parsed structures are no longer a faithful representation of the original TIFF after assembly). Prefer patching a copy of the overflow buffer (and writing the patched copy) so assemble doesn’t mutate its inputs.
| /** | ||
| * Creates a Cloud Optimized GeoTIFF (COG) byte array from the given raster. The COG format | ||
| * arranges tiles and overviews in an order optimized for HTTP range-request based access, | ||
| * enabling efficient partial reads from cloud storage. | ||
| * | ||
| * @param raster The input raster | ||
| * @param compressionType Compression type: "Deflate", "LZW", "JPEG", "PackBits", or null for | ||
| * default (Deflate) | ||
| * @param compressionQuality Quality 0.0 (max compression) to 1.0 (no compression) | ||
| * @return COG file as byte array | ||
| */ | ||
| public static byte[] asCloudOptimizedGeoTiff( | ||
| GridCoverage2D raster, String compressionType, double compressionQuality) { | ||
| try { | ||
| return CogWriter.write(raster, compressionType, compressionQuality); | ||
| } catch (IOException e) { | ||
| throw new RuntimeException("Failed to write Cloud Optimized GeoTIFF", e); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Creates a Cloud Optimized GeoTIFF (COG) byte array with default settings (Deflate compression, | ||
| * 256x256 tiles). | ||
| * | ||
| * @param raster The input raster | ||
| * @return COG file as byte array | ||
| */ | ||
| public static byte[] asCloudOptimizedGeoTiff(GridCoverage2D raster) { | ||
| try { | ||
| return CogWriter.write(raster); | ||
| } catch (IOException e) { | ||
| throw new RuntimeException("Failed to write Cloud Optimized GeoTIFF", e); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Creates a Cloud Optimized GeoTIFF (COG) byte array with the given options. | ||
| * | ||
| * @param raster The input raster | ||
| * @param options COG generation options (compression, tileSize, resampling, overviewCount) | ||
| * @return COG file as byte array | ||
| */ | ||
| public static byte[] asCloudOptimizedGeoTiff(GridCoverage2D raster, CogOptions options) { | ||
| try { | ||
| return CogWriter.write(raster, options); | ||
| } catch (IOException e) { | ||
| throw new RuntimeException("Failed to write Cloud Optimized GeoTIFF", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
The PR description says it “does not affect any public API”, but this change adds new public methods to RasterOutputs (asCloudOptimizedGeoTiff overloads). Either update the PR description to reflect the public API addition, or (if the intent is to keep it internal) reduce visibility / place it behind an internal API.
| // The next IFD should be before any image data (COG requirement) | ||
| assertTrue( | ||
| "Overview IFD should immediately follow first IFD region", | ||
| nextIfdOffset < cogBytes.length / 2); | ||
|
|
There was a problem hiding this comment.
This assertion uses nextIfdOffset < cogBytes.length / 2 as a proxy for “IFDs are before image data”. That threshold is arbitrary and can be flaky if compression makes the image data very small (valid COGs can have IFD/metadata > 50% of the file). Prefer asserting nextIfdOffset (or the end of the IFD region) is strictly less than the minimum TileOffsets/StripOffsets value.
| public CogOptions build() { | ||
| if (compression == null || compression.isEmpty()) { | ||
| throw new IllegalArgumentException("compression must not be null or empty"); | ||
| } | ||
| if (compressionQuality < 0 || compressionQuality > 1.0) { | ||
| throw new IllegalArgumentException( | ||
| "compressionQuality must be between 0.0 and 1.0, got: " + compressionQuality); | ||
| } | ||
| if (tileSize <= 0) { | ||
| throw new IllegalArgumentException("tileSize must be positive, got: " + tileSize); | ||
| } | ||
| if ((tileSize & (tileSize - 1)) != 0) { | ||
| throw new IllegalArgumentException("tileSize must be a power of 2, got: " + tileSize); | ||
| } | ||
| if (overviewCount < -1) { | ||
| throw new IllegalArgumentException( | ||
| "overviewCount must be -1 (auto), 0 (none), or positive, got: " + overviewCount); | ||
| } | ||
|
|
||
| // Normalize resampling to title-case for matching | ||
| String normalized = normalizeResampling(resampling); | ||
| if (!VALID_RESAMPLING.contains(normalized)) { | ||
| throw new IllegalArgumentException( | ||
| "resampling must be one of " + VALID_RESAMPLING + ", got: '" + resampling + "'"); | ||
| } |
There was a problem hiding this comment.
CogOptions.Builder validates resampling/tileSize/quality but does not validate compression against the compression types actually supported by the GeoTools GeoTIFF writer. As-is, invalid strings will pass build() and fail later at write time. Consider normalizing/validating compression (and ideally allowing the same set mentioned in RasterOutputs.asGeoTiff, e.g., "None", "Huffman", etc.) so errors surface early with a clear message.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Implement a pure Java single thread COG writer #2662What changes were proposed in this PR?
This PR adds a pure Java, single-threaded Cloud Optimized GeoTIFF (COG) writer to Apache Sedona. The implementation generates COG files by:
New files
compression(default: Deflate) - compression algorithmcompressionQuality(default: 0.2) - quality from 0.0 to 1.0tileSize(default: 256) - tile dimensions, must be power of 2resampling(default: Nearest) - overview resampling: Nearest, Bilinear, or BicubicoverviewCount(default: -1/auto) - number of overview levels, 0 for nonewrite(GridCoverage2D, CogOptions)primary methodModified files
asCloudOptimizedGeoTiff(GridCoverage2D raster)asCloudOptimizedGeoTiff(GridCoverage2D raster, String compressionType, double compressionQuality)asCloudOptimizedGeoTiff(GridCoverage2D raster, CogOptions options)How was this patch tested?
rio cogeo validateandgdalinfoDid this PR include necessary documentation updates?