Bounded-memory LZMA encoders (xz/lzma2/.lzma): O(dict_size), not O(input) by MagicalTux · Pull Request #101 · KarpelesLab/compcol

MagicalTux · 2026-06-15T14:30:53Z

Follow-up to the round-2 ratio work: the xz, raw lzma2, and .lzma encoders gained their cross-chunk-dictionary ratio by buffering the whole input and building an O(input) hash chain. This makes them stream with a bounded sliding window instead — peak memory is now O(dict_size), independent of input length — while keeping the ratio identical. (zstd already streamed within a bounded window; nothing to do there.)

The problem (measured)

( ulimit -v 250000; compcol -t xz -c < 600MB_file ) aborted — the encoder needed memory proportional to the input.

Why it's fixable with no ratio loss

The LZMA dictionary is already capped at dict_size (default 4 MiB), so matches can never reach further back than that. Buffering more than ~dict_size of history is pure waste — a sliding window of dict_size finds exactly the same matches.

What changed

Windowed hash chain: prev[] is now a power-of-two ring of size O(dict_size) indexed pos & mask; the chain walk stops at dist > dict_size before it can follow a wrapped (evicted) link, so it returns the same matches a whole-buffer chain would.
Sliding input window: retains only ~dict_size + slop of history plus one chunk of lookahead, dropped amortized-O(1); all parse/price/emit code reads via an absolute base offset.
Streaming drivers: new stream encoders emit framed LZMA2 chunks (and the .lzma body) incrementally instead of staging the whole payload. Continue-dict semantics preserved (first chunk 0xE0 reset, rest 0xC0 continue).
Optimal-parser early-commit on long matches (≥ nice_len) — also fixes a pre-existing quadratic on repetitive input (20 MB all-a: ~174 s → ~0.9 s), matching the LZMA SDK's GetOptimum.

Verification

Gate	Result
Memory — 600 MB input under a 244 MB vmem cap	`xz`/`lzma2`/`lzma` all exit 0 (~45 MB RSS); was OOM
Ratio preserved (corpus, `-l 9`)	xz 532320 → 532468 (+0.03%); lzma 521918 → 522057 (+0.03%) — not the old per-chunk-reset ~734000
Reference cross-decode	`xz -d` / `xz --format=lzma -d` byte-exact at every level incl. >dict_size multi-window and exact-window-boundary
Tests / fmt / clippy / docs	`cargo test --all-features` 61 suites green; fmt, clippy `-D warnings`, rustdoc `-D warnings` clean

🤖 Generated with Claude Code

The xz, raw lzma2, and .lzma encoders previously buffered the entire input and built a hash-chain match finder with a `prev[i]` slot for every input position, so peak memory was O(input): a 600 MB file needed >600 MB and aborted under a 244 MB cap. Replace the whole-buffer match finder + buffer-then-emit drivers with a bounded sliding-window streaming encoder, keeping the same continuous dictionary (and therefore the same compression ratio): - HashChain `prev` is now a power-of-two ring sized O(dict_size + MAX_MATCH_LEN), indexed `pos & mask`. Positions older than the dictionary are evicted naturally; the chain walk breaks on `dist > dict_size` before it can follow a wrapped (stale) link, so the finder returns exactly the same matches a whole-buffer chain would. - A sliding input window retains only ~dict_size + slop of history plus one chunk/lookahead of pending input; the front is dropped once the droppable prefix exceeds dict_size (amortised O(1) per byte). - All parse/price/emit code now reads the window via an absolute `base` (`win[pos - base]`) instead of indexing the whole input. Drivers: - lzma2_internal: new `Lzma2StreamEncoder` (push/finish) emits framed chunks incrementally; xz and raw lzma2 feed it and drain chunks as they are produced instead of staging the whole payload at finish. - lzma: new `LzmaStreamEncoder` streams the continuous range-coded body, emitting the 13-byte header up front and the EOS marker + flush at finish. Small inputs (<= 64 KiB) still run the greedy-vs-optimal guard pass (buffered, bounded) so the optimal parser's cold start never loses to greedy; larger inputs stream with the optimal parse. Peak memory is now O(dict_size), independent of input length. Ratio on the 2.9 MB corpus at -l 9 is essentially unchanged (xz 532320 -> 532316, lzma 521918 -> 521957); reference cross-decode (`xz -d`, `xz --format=lzma -d`) is byte-exact at every level including inputs far larger than the dictionary, the exact window boundary, incompressible, and empty. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The optimal parser set a `commit_end` once a match >= nice_len was found but kept filling the DP for every position the match spans, doing O(nice..273) price work per covered byte. On highly repetitive input (e.g. 600 MB of one byte, or a short repeated phrase) this made the parse effectively quadratic — a 20 MB all-`a` input took ~3 minutes. The long match from the current node already records the cheapest arrival at the commit boundary (a single match decision the traceback will pick), so break out of the window loop immediately instead of grinding through the spanned positions. This mirrors the SDK's greedy `nice_len` acceptance in GetOptimum. Effect: - 20 MB all-`a`: ~174 s -> ~0.9 s (both encoders). - Ratio essentially unchanged on the 2.9 MB corpus at -l 9: xz 532316 -> 532468 (+0.03%), lzma 521957 -> 522057 (+0.03%) — still far from the per-chunk-reset regression and well within 1% of the pre-change baselines (xz 532320, lzma 521918). - 600 MB all-`a` now compresses in seconds under the 244 MB memory cap; reference cross-decode (`xz -d`, `xz --format=lzma -d`) byte-exact. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…link

MagicalTux and others added 3 commits June 16, 2026 00:14

docs: changelog for bounded-memory LZMA encoders; demote private doc …

cc42fd9

…link

MagicalTux force-pushed the lzma-bounded-memory branch from cb07403 to cc42fd9 Compare June 15, 2026 15:14

MagicalTux merged commit ec466e0 into master Jun 15, 2026
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounded-memory LZMA encoders (xz/lzma2/.lzma): O(dict_size), not O(input)#101

Bounded-memory LZMA encoders (xz/lzma2/.lzma): O(dict_size), not O(input)#101
MagicalTux merged 3 commits into
masterfrom
lzma-bounded-memory

MagicalTux commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MagicalTux commented Jun 15, 2026

The problem (measured)

Why it's fixable with no ratio loss

What changed

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant