Cap decompressed size for deflate/zstd/lz4/packbits to block bomb attacks#1533
Merged
brendancol merged 4 commits intoxarray-contrib:mainfrom May 10, 2026
Merged
Conversation
…acks
The four codecs in ``xrspatial/geotiff/_compression.py`` decompressed strip
and tile payloads with no output-size cap, so a small malicious TIFF could
expand to multiple gigabytes during decode and OOM-kill the reader. An
audit confirmed it: a 1 MiB deflate-compressed strip declaring a 1024x1024
image expands to 1 GiB, and a process with ``RLIMIT_AS=300MB`` is killed
before the existing post-decode size check (``_decode_strip_or_tile`` in
``_reader.py:565``) ever runs. The threat model is untrusted TIFF input
from web downloads, fsspec, third-party catalogs, or user upload.
Each codec now accepts an ``expected_size`` kwarg (the byte count the
caller already computed for the post-check) and refuses to emit more than
``expected_size * 1.05 + 1`` bytes before raising ``ValueError`` with the
codec name and actual vs cap. The 1.05x margin allows for legitimate
codec metadata that some encoders emit; bomb ratios (1000:1+) are
rejected long before peak RSS spikes.
Per-codec implementation:
- deflate: ``zlib.decompressobj().decompress(data, max_length=cap+1)``
with a drain loop over ``unconsumed_tail``. ``zlib.decompress`` had no
cap.
- zstd: ``ZstdDecompressor().stream_reader(data).read(cap+1)``, then
probe one more byte to detect overflow. ``decompress(data,
max_output_size=cap)`` is not actually enforced when the frame embeds
the content size, which the bomb does.
- lz4: ``LZ4FrameDecompressor().decompress(data, max_length=cap+1)`` and
treat ``needs_input == False`` as overflow (decoder has buffered output
it could not deliver).
- packbits: pure-Python loop already, so just check the running output
length after each opcode.
The dispatch ``decompress`` plumbs ``expected_size`` through to all four.
Tests in ``xrspatial/geotiff/tests/test_decompression_caps.py`` cover:
codec-direct bomb rejection, end-to-end TIFF bomb rejection (1 MiB
declared, 1 GiB decoded), legitimate high-ratio compression (all-zero
arrays at >50:1) passing without false rejection, and the 5% metadata
margin not over-rejecting.
Audit reproducer behaviour: the 1 MiB-to-1 GiB TIFF previously triggered
``MemoryError`` (or OS OOM kill) in zlib.decompress; now raises
``ValueError("deflate decode exceeded expected size: ...")`` with peak
RSS bounded by the cap.
There was a problem hiding this comment.
Pull request overview
Adds decompression output-size caps to several GeoTIFF codecs to mitigate decompression-bomb (zip-bomb) style attacks during strip/tile decode, ensuring the reader can fail fast before allocating unbounded output buffers.
Changes:
- Add
expected_size-based output caps (with a 5% margin) to deflate, zstd, lz4, and packbits decompressors inxrspatial/geotiff/_compression.py. - Thread
expected_sizethrough the codec dispatch so_reader._decode_strip_or_tile()can enforce the cap during decompression, not after. - Add a new test module covering direct-codec overflow rejection and end-to-end TIFF read rejection scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_compression.py |
Adds expected_size-capped decompression for deflate/zstd/lz4/packbits and wires caps into the decompression dispatch. |
xrspatial/geotiff/tests/test_decompression_caps.py |
New tests intended to validate bomb rejection and ensure legitimate high compression still succeeds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+211
to
+214
| def test_lz4_bomb_rejected(tmp_path): | ||
| lz4_frame = pytest.importorskip("lz4.frame") | ||
| payload = b'\x00' * _BOMB_BYTES | ||
| strip = lz4_frame.compress(payload) |
Comment on lines
+24
to
+37
| def _max_output_with_margin(expected_size: int) -> int: | ||
| """Return the cap (in bytes) for a codec given the caller's expected size. | ||
|
|
||
| Adds at least one byte of slack so that callers passing 0 (meaning | ||
| "unknown") still get a usable buffer for tests, while a single byte of | ||
| overflow is detected. | ||
| """ | ||
| if expected_size <= 0: | ||
| # No cap available: fall back to a generous default to preserve | ||
| # backward compatibility with callers that don't supply a size. | ||
| # The reader always supplies a size, so this branch is mainly for | ||
| # direct callers and round-trip tests. | ||
| return 0 | ||
| return int(expected_size * _DECOMPRESS_MARGIN) + 1 |
Eight findings, all acted on:
Source:
- _max_output_with_margin docstring claimed callers passing 0 still
get a usable buffer, but the implementation returns 0 (= no cap).
Update the docstring to match the implementation: positive size
caps at 1.05x + 1; non-positive disables capping.
- Replace deflate's `out += more` bytes accumulation with a bytearray.
The bytes path was O(n^2) for large strips because every chunk
reallocated and copied the whole accumulated buffer; the bytearray
path is amortized O(n).
Tests:
- Drop the 1 GiB host allocations from test_{deflate,zstd,lz4}_bomb_rejected.
Each test was building `b'\x00' * (1 << 30)` then compressing it,
which contradicted the file's own "no test materializes the 1 GiB
payload" claim and would OOM or stall on shared CI runners. Shrink
to 8 MiB; the cap (~1.05 MiB on a 1024x1024 uint8 declared image)
is exceeded by ~7x, which still proves the codec rejects rather
than truncates without forcing the test process to allocate
gigabytes.
- Replace `pytest.importorskip(...)` calls inside `@pytest.mark.skipif`
decorators with precomputed `_HAS_ZSTD` / `_HAS_LZ4` flags via
`importlib.util.find_spec`. The previous form could skip the entire
module at import time on hosts missing one optional dep, dropping
unrelated tests with it.
- Update the module docstring to reflect the actual declared dimensions
(1024x1024 uint8 = 1 MiB) instead of the misleading "~1 KiB pixel
data" claim.
- Move the inline `pytest.importorskip(...)` calls inside the
end-to-end zstd / lz4 tests to module-level skipif so the optional-
dep skip is consistent across direct and end-to-end tests.
Two follow-ups on top of the earlier review-fix commit: - Direct-codec bomb tests in TestCodecDirect / TestZstdDirect / TestLz4Direct were still allocating 100 MiB of zeros on the host (`big = b'\x00' * (100 * 1024 * 1024)`) to compress for the bomb payload. The earlier fix only shrank the end-to-end TIFF tests. Centralise on `_DIRECT_BOMB_BYTES = 4 MiB` -- still a 4000:1 ratio over the 1 KiB cap used in those tests, plenty to prove the rejection without 100 MiB of CI memory pressure per test. - Add `expected_size=0` (default) backward-compat regression tests for deflate, zstd, lz4, and packbits. The cap implementation has a no-cap fallback for callers that don't pass a size; without these tests, a future change that silently enabled a default cap would break unmodified callers and not be caught.
`importlib.util.find_spec("lz4.frame")` imports the `lz4` parent
package to resolve the submodule spec, so it raises ModuleNotFoundError
rather than returning None when lz4 itself is not installed. macOS CI
(Python 3.12) failed at module collection with "No module named 'lz4'"
because of this. Wrap the spec lookup in a try/except helper so a
missing optional dependency cleanly turns into False regardless of
dotted depth, and route both the zstandard and lz4 detection through it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four codecs in
xrspatial/geotiff/_compression.pydecompressed strip and tile payloads with no output-size cap. A small malicious TIFF (web download, fsspec, third-party catalog, user upload) could declare a 1024x1024 uint8 image (1 MiB expected) and supply a 1 MiB deflate-compressed strip that decodes to 1 GiB, OOM-killing the reader. The existing size check at_decode_strip_or_tile(_reader.py:565) runs after decompression, so it cannot bound peak RSS. Audit confirmed by settingRLIMIT_AS=300MBand feeding a 1 MiB deflate-bombed TIFF toread_to_array: process is killed.This PR adds an
expected_sizecap inside each codec, rejecting any decode that would exceedexpected_size * 1.05 + 1bytes with a cleanValueErrorbefore peak allocation.Codecs covered (one fix template, four codecs)
deflate_decompress):zlib.decompressobj().decompress(data, max_length=cap+1)with a drain loop.zlib.decompresshas no cap parameter.zstd_decompress):ZstdDecompressor().stream_reader(data).read(cap+1)plus a one-byte overflow probe.decompress(data, max_output_size=cap)is not enforced when the frame embeds the content size, which is exactly the case in the bomb.lz4_decompress):LZ4FrameDecompressor().decompress(data, max_length=cap+1)and treatneeds_input == Falseas overflow.packbits_decompress): pure-Python loop already; check the running output length after each opcode.Margin
The 1.05x margin (5%) lets legitimate codec framing or trailing metadata bytes through while still rejecting the 1000:1 ratios characteristic of bomb attacks. A test (
test_cap_includes_metadata_margin) covers the legitimate metadata case, andtest_legitimate_high_compression_passesconfirms an all-zero array at >50:1 ratio is accepted.Tests
xrspatial/geotiff/tests/test_decompression_caps.py(14 tests):The end-to-end TIFFs are hand-built with
struct.packso no test ever materializes the 1 GiB payload outside the codec's own buffer.Out of scope
Per project policy on one-fix-per-PR for security work, the user explicitly opted to bundle these four codec caps into one PR (same vulnerability class, same fix template). Other audit findings (IFD count, IFD chain) ship in separate PRs.
Test plan
pytest xrspatial/geotiff/tests/test_decompression_caps.py -x -q(14 pass)pytest xrspatial/geotiff/tests/test_compression.py xrspatial/geotiff/tests/test_compression_level.py xrspatial/geotiff/tests/test_lz4.py xrspatial/geotiff/tests/test_reader.py xrspatial/geotiff/tests/test_writer.py -x -q(63 pass, no regressions)