Bounded-memory LZMA encoders (xz/lzma2/.lzma): O(dict_size), not O(input)#101
Merged
Conversation
The xz, raw lzma2, and .lzma encoders previously buffered the entire input and built a hash-chain match finder with a `prev[i]` slot for every input position, so peak memory was O(input): a 600 MB file needed >600 MB and aborted under a 244 MB cap. Replace the whole-buffer match finder + buffer-then-emit drivers with a bounded sliding-window streaming encoder, keeping the same continuous dictionary (and therefore the same compression ratio): - HashChain `prev` is now a power-of-two ring sized O(dict_size + MAX_MATCH_LEN), indexed `pos & mask`. Positions older than the dictionary are evicted naturally; the chain walk breaks on `dist > dict_size` before it can follow a wrapped (stale) link, so the finder returns exactly the same matches a whole-buffer chain would. - A sliding input window retains only ~dict_size + slop of history plus one chunk/lookahead of pending input; the front is dropped once the droppable prefix exceeds dict_size (amortised O(1) per byte). - All parse/price/emit code now reads the window via an absolute `base` (`win[pos - base]`) instead of indexing the whole input. Drivers: - lzma2_internal: new `Lzma2StreamEncoder` (push/finish) emits framed chunks incrementally; xz and raw lzma2 feed it and drain chunks as they are produced instead of staging the whole payload at finish. - lzma: new `LzmaStreamEncoder` streams the continuous range-coded body, emitting the 13-byte header up front and the EOS marker + flush at finish. Small inputs (<= 64 KiB) still run the greedy-vs-optimal guard pass (buffered, bounded) so the optimal parser's cold start never loses to greedy; larger inputs stream with the optimal parse. Peak memory is now O(dict_size), independent of input length. Ratio on the 2.9 MB corpus at -l 9 is essentially unchanged (xz 532320 -> 532316, lzma 521918 -> 521957); reference cross-decode (`xz -d`, `xz --format=lzma -d`) is byte-exact at every level including inputs far larger than the dictionary, the exact window boundary, incompressible, and empty. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The optimal parser set a `commit_end` once a match >= nice_len was found but kept filling the DP for every position the match spans, doing O(nice..273) price work per covered byte. On highly repetitive input (e.g. 600 MB of one byte, or a short repeated phrase) this made the parse effectively quadratic — a 20 MB all-`a` input took ~3 minutes. The long match from the current node already records the cheapest arrival at the commit boundary (a single match decision the traceback will pick), so break out of the window loop immediately instead of grinding through the spanned positions. This mirrors the SDK's greedy `nice_len` acceptance in GetOptimum. Effect: - 20 MB all-`a`: ~174 s -> ~0.9 s (both encoders). - Ratio essentially unchanged on the 2.9 MB corpus at -l 9: xz 532316 -> 532468 (+0.03%), lzma 521957 -> 522057 (+0.03%) — still far from the per-chunk-reset regression and well within 1% of the pre-change baselines (xz 532320, lzma 521918). - 600 MB all-`a` now compresses in seconds under the 244 MB memory cap; reference cross-decode (`xz -d`, `xz --format=lzma -d`) byte-exact. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cb07403 to
cc42fd9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to the round-2 ratio work: the
xz, rawlzma2, and.lzmaencoders gained their cross-chunk-dictionary ratio by buffering the whole input and building anO(input)hash chain. This makes them stream with a bounded sliding window instead — peak memory is now O(dict_size), independent of input length — while keeping the ratio identical. (zstd already streamed within a bounded window; nothing to do there.)The problem (measured)
( ulimit -v 250000; compcol -t xz -c < 600MB_file )aborted — the encoder needed memory proportional to the input.Why it's fixable with no ratio loss
The LZMA dictionary is already capped at
dict_size(default 4 MiB), so matches can never reach further back than that. Buffering more than ~dict_sizeof history is pure waste — a sliding window ofdict_sizefinds exactly the same matches.What changed
prev[]is now a power-of-two ring of sizeO(dict_size)indexedpos & mask; the chain walk stops atdist > dict_sizebefore it can follow a wrapped (evicted) link, so it returns the same matches a whole-buffer chain would.~dict_size + slopof history plus one chunk of lookahead, dropped amortized-O(1); all parse/price/emit code reads via an absolute base offset..lzmabody) incrementally instead of staging the whole payload. Continue-dict semantics preserved (first chunk0xE0reset, rest0xC0continue).nice_len) — also fixes a pre-existing quadratic on repetitive input (20 MB all-a: ~174 s → ~0.9 s), matching the LZMA SDK'sGetOptimum.Verification
xz/lzma2/lzmaall exit 0 (~45 MB RSS); was OOM-l 9)xz -d/xz --format=lzma -dbyte-exact at every level incl. >dict_size multi-window and exact-window-boundarycargo test --all-features61 suites green; fmt, clippy-D warnings, rustdoc-D warningsclean🤖 Generated with Claude Code