Skip to content

Bounded-memory LZMA encoders (xz/lzma2/.lzma): O(dict_size), not O(input)#101

Merged
MagicalTux merged 3 commits into
masterfrom
lzma-bounded-memory
Jun 15, 2026
Merged

Bounded-memory LZMA encoders (xz/lzma2/.lzma): O(dict_size), not O(input)#101
MagicalTux merged 3 commits into
masterfrom
lzma-bounded-memory

Conversation

@MagicalTux

Copy link
Copy Markdown
Member

Follow-up to the round-2 ratio work: the xz, raw lzma2, and .lzma encoders gained their cross-chunk-dictionary ratio by buffering the whole input and building an O(input) hash chain. This makes them stream with a bounded sliding window instead — peak memory is now O(dict_size), independent of input length — while keeping the ratio identical. (zstd already streamed within a bounded window; nothing to do there.)

The problem (measured)

( ulimit -v 250000; compcol -t xz -c < 600MB_file ) aborted — the encoder needed memory proportional to the input.

Why it's fixable with no ratio loss

The LZMA dictionary is already capped at dict_size (default 4 MiB), so matches can never reach further back than that. Buffering more than ~dict_size of history is pure waste — a sliding window of dict_size finds exactly the same matches.

What changed

  • Windowed hash chain: prev[] is now a power-of-two ring of size O(dict_size) indexed pos & mask; the chain walk stops at dist > dict_size before it can follow a wrapped (evicted) link, so it returns the same matches a whole-buffer chain would.
  • Sliding input window: retains only ~dict_size + slop of history plus one chunk of lookahead, dropped amortized-O(1); all parse/price/emit code reads via an absolute base offset.
  • Streaming drivers: new stream encoders emit framed LZMA2 chunks (and the .lzma body) incrementally instead of staging the whole payload. Continue-dict semantics preserved (first chunk 0xE0 reset, rest 0xC0 continue).
  • Optimal-parser early-commit on long matches (≥ nice_len) — also fixes a pre-existing quadratic on repetitive input (20 MB all-a: ~174 s → ~0.9 s), matching the LZMA SDK's GetOptimum.

Verification

Gate Result
Memory — 600 MB input under a 244 MB vmem cap xz/lzma2/lzma all exit 0 (~45 MB RSS); was OOM
Ratio preserved (corpus, -l 9) xz 532320 → 532468 (+0.03%); lzma 521918 → 522057 (+0.03%) — not the old per-chunk-reset ~734000
Reference cross-decode xz -d / xz --format=lzma -d byte-exact at every level incl. >dict_size multi-window and exact-window-boundary
Tests / fmt / clippy / docs cargo test --all-features 61 suites green; fmt, clippy -D warnings, rustdoc -D warnings clean

🤖 Generated with Claude Code

MagicalTux and others added 3 commits June 16, 2026 00:14
The xz, raw lzma2, and .lzma encoders previously buffered the entire
input and built a hash-chain match finder with a `prev[i]` slot for every
input position, so peak memory was O(input): a 600 MB file needed >600 MB
and aborted under a 244 MB cap.

Replace the whole-buffer match finder + buffer-then-emit drivers with a
bounded sliding-window streaming encoder, keeping the same continuous
dictionary (and therefore the same compression ratio):

- HashChain `prev` is now a power-of-two ring sized O(dict_size +
  MAX_MATCH_LEN), indexed `pos & mask`. Positions older than the
  dictionary are evicted naturally; the chain walk breaks on
  `dist > dict_size` before it can follow a wrapped (stale) link, so the
  finder returns exactly the same matches a whole-buffer chain would.
- A sliding input window retains only ~dict_size + slop of history plus
  one chunk/lookahead of pending input; the front is dropped once the
  droppable prefix exceeds dict_size (amortised O(1) per byte).
- All parse/price/emit code now reads the window via an absolute `base`
  (`win[pos - base]`) instead of indexing the whole input.

Drivers:
- lzma2_internal: new `Lzma2StreamEncoder` (push/finish) emits framed
  chunks incrementally; xz and raw lzma2 feed it and drain chunks as they
  are produced instead of staging the whole payload at finish.
- lzma: new `LzmaStreamEncoder` streams the continuous range-coded body,
  emitting the 13-byte header up front and the EOS marker + flush at
  finish. Small inputs (<= 64 KiB) still run the greedy-vs-optimal guard
  pass (buffered, bounded) so the optimal parser's cold start never loses
  to greedy; larger inputs stream with the optimal parse.

Peak memory is now O(dict_size), independent of input length. Ratio on
the 2.9 MB corpus at -l 9 is essentially unchanged (xz 532320 -> 532316,
lzma 521918 -> 521957); reference cross-decode (`xz -d`,
`xz --format=lzma -d`) is byte-exact at every level including inputs far
larger than the dictionary, the exact window boundary, incompressible,
and empty.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The optimal parser set a `commit_end` once a match >= nice_len was found
but kept filling the DP for every position the match spans, doing
O(nice..273) price work per covered byte. On highly repetitive input
(e.g. 600 MB of one byte, or a short repeated phrase) this made the parse
effectively quadratic — a 20 MB all-`a` input took ~3 minutes.

The long match from the current node already records the cheapest arrival
at the commit boundary (a single match decision the traceback will pick),
so break out of the window loop immediately instead of grinding through
the spanned positions. This mirrors the SDK's greedy `nice_len`
acceptance in GetOptimum.

Effect:
- 20 MB all-`a`: ~174 s -> ~0.9 s (both encoders).
- Ratio essentially unchanged on the 2.9 MB corpus at -l 9:
  xz 532316 -> 532468 (+0.03%), lzma 521957 -> 522057 (+0.03%) — still
  far from the per-chunk-reset regression and well within 1% of the
  pre-change baselines (xz 532320, lzma 521918).
- 600 MB all-`a` now compresses in seconds under the 244 MB memory cap;
  reference cross-decode (`xz -d`, `xz --format=lzma -d`) byte-exact.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@MagicalTux MagicalTux force-pushed the lzma-bounded-memory branch from cb07403 to cc42fd9 Compare June 15, 2026 15:14
@MagicalTux MagicalTux merged commit ec466e0 into master Jun 15, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant