Add configurable on-disk value compression (zstd + lz4)#29
Open
christianparpart wants to merge 1 commit into
Open
Add configurable on-disk value compression (zstd + lz4)#29christianparpart wants to merge 1 commit into
christianparpart wants to merge 1 commit into
Conversation
Compress cache values before they hit the persistent CoW-tree, shrinking disk footprint and improving throughput for I/O-bound workloads. The codec is chosen per store and defaults to zstd; admins opt out with `--compression none` or pick `lz4` for a CPU-minimal codec. Design (value level, not page level): - Compression lives in CowTreeStorage's encode/decode boundary (StoreEntry compresses; LoadEntry decompresses), so keys stay cleartext for B+tree navigation, the fixed page-offset layout is untouched, and the L1 cache / wire protocols only ever see plaintext. - Threshold + shrink-check: values below --compression-min-bytes, and any value that does not actually get smaller (incompressible / already compressed), are stored verbatim under an Identity tag — a zstd default never enlarges a value or burns CPU pointlessly. - Per-entry codec tag + original length in the leaf record (format v3->v4): a store may freely mix codecs, changing the config only affects new writes, and no migration is ever required. Old-format stores are detected via a version sentinel and refused rather than mis-parsed. Data-driven, DI-respecting: - New Core/Compression codec registry (stateless, modelled on Core/Crc32c) with a descriptor table driving name<->id, availability, and the compress/decompress dispatch — adding a codec is one table row. - Config threads through the existing StorageDurability enum pattern: --compression / --compression-level / --compression-min-bytes across CLI, YAML, ConfigMerge, and the startup banner. Build: gated behind FASTCACHED_ENABLE_COMPRESSION (ON by default; the standard build fetches/links zstd + lz4 via CPM, preferring a system package). Building OFF degrades the codec table to Identity-only and turns selecting zstd/lz4 into a clean startup error — no compression symbols linked. Performance: writes compress once and are fsync-bound in the default Batched durability, so lz4/zstd-3 add negligible wall-clock; reads decompress only on an L1 miss, off the hot path. Append/Prepend are read-modify-write and therefore decompress+recompress. End-to-end: a 200 KiB compressible value shrinks the store from 278 KiB to 64 KiB (~4.25x) with an identical client-visible round-trip, verified across a restart. Risk: on-disk record format bumped v3->v4 (pre-release; no external stores). A v3 store is detected and refused (detect-and-bail; migrator out of scope). New third-party deps (zstd, lz4) are contained behind the build flag. Tests: 13 new codec + storage cases (per-codec round-trip inline/overflow across reopen, footprint shrink, incompressible->Identity fallback, corrupt-input / wrong-length rejection, Append/Prepend on compressed entries, Touch preserves value+codec, mixed-codec store) plus CLI/YAML/ merge config cases; all green in both ON and OFF builds (829 tests pass). Coverage: the clang-coverage preset does not instrument the FastCache library target (pre-existing: --coverage is link-only there, no .gcno emitted), so no numeric delta is available; the new paths are exercised by the dedicated cases above in both build configurations. Signed-off-by: Christian Parpart <christian@parpart.family> Claude-Session: https://claude.ai/code/session_01H8xUNRPoP6JfAif4742zJv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cache values written to the persistent CoW-tree are now compressed, shrinking the on-disk footprint and improving throughput for I/O-bound workloads. The codec is chosen per store and defaults to zstd; administrators opt out with
--compression none, or picklz4for a CPU-minimal codec. Reads always return plaintext — decompression happens only on an L1 cache miss, off the hot path — and each record is tagged with its own codec, so a store may freely mix codecs and changing the setting only affects new writes (no migration).Compression lives at the value level inside
CowTreeStorage(compress inStoreEntry, decompress inLoadEntry), so keys stay cleartext for B+tree navigation, the fixed page-offset layout is untouched, and the L1 cache and wire protocols only ever see the original bytes. A threshold plus a shrink-check keep the zstd default safe: values below--compression-min-bytes, and any value that does not actually get smaller (incompressible or already-compressed data), are stored verbatim under an Identity tag, so compression never enlarges a value or burns CPU pointlessly.Changes
Core/Compressioncodec registry (stateless, modelled onCore/Crc32c): a descriptor table drives name↔id, availability, and the compress/decompress dispatch, so adding a codec is a single table row.CowTreeStorageleaf-record format bumped v3→v4 with a per-entry codec id + original length;StoreEntrycompresses once (threshold + shrink-check) andLoadEntrydecompresses to plaintext. Metadata-onlyTouch/MarkStalepreserve the codec without recompressing.StorageDurabilitypattern:--compression,--compression-level, and--compression-min-bytesacross CLI, YAML,ConfigMerge, and the startup banner.FASTCACHED_ENABLE_COMPRESSION(ON by default; the standard build fetches/links zstd + lz4 via CPM, preferring a system package). Building with it OFF degrades the codec table to Identity-only and makes selecting zstd/lz4 a clean startup error, linking no compression symbols.Performance: writes compress once and are fsync-bound in the default Batched durability, so lz4 and zstd level 3 add negligible wall-clock;
Append/Prependare read-modify-write and so decompress+recompress. End-to-end, a 200 KiB compressible value shrinks the store from 278 KiB to 64 KiB (~4.25×) with an identical client-visible round-trip, verified across a restart.Performance comparison
Performance comparison: master vs compression modes vs value size
Setup. Storage-layer micro-benchmark driving
CowTreeStorageSet/Getdirectly (isolates the codec from network/protocol noise), plus an end-to-end
sanity pass over the memcached text protocol against the running daemon.
Release builds (
clang-release,-O2 -DNDEBUG, no sanitizers), pinned to onecore,
durability=batched(default), single shard. master =origin/master(no compression code); the other three are this branch with the named
--compression. Backing store on tmpfs, so throughput reflects CPU + fsynccost rather than disk-seek latency — on real disk the compression throughput
win is larger (fewer bytes through the actual I/O path). On-disk footprint is
filesystem-independent. Machine: i9-14900HX.
Storage micro-benchmark — compressible data
Storage micro-benchmark — random (incompressible) data
(random rows for
none/lz4omitted — identical tomaster/zstd.)End-to-end (daemon, memcached text, single connection) — corroboration
* The single-connection Python driver adds noise at large sizes; the storage
micro-benchmark above is the clean signal (there
none≈masterexactly).Findings
identical across all four configs — the fsync/commit dominates, and the codec
is in the noise. Footprint is dominated by the fixed 16 KiB page + per-entry
overhead, so the ratio is < 1 regardless (compression can't beat the page
granularity here). This is exactly what the 256-byte
--compression-min-bytesdefault and the shrink-check exist to avoid wasting CPU on.
256 KiB, zstd/lz4 cut the on-disk footprint ~100× and run Set/Get
~4–6× faster than master — fewer bytes to serialize, write, and fsync more
than repay the compression CPU. At 16 KiB, zstd is 47× smaller and ~1.3–1.5×
faster.
comparable speed (level 3). lz4 has a slight edge in raw compress throughput
at 256 KiB (2,054 vs 1,883 MiB/s set) but a worse ratio at 16 KiB (18.9× vs
47.4×). zstd is the right general-purpose default; lz4 is the pick when CPU is
the scarce resource.
random values verbatim under Identity, so footprint and read throughput match
master exactly. The only cost is one wasted compression attempt on the write
path — visible as ~7% lower Set MiB/s for zstd at 256 KiB random (426 vs 457),
and negligible for lz4.
--compression nonetracks
masterwithin measurement noise across every size and data type — thenew code sits entirely behind the codec check.
sccache-sized values (compile-cache workload)
sccache is the project's stated primary workload, so a dedicated sweep over
compile-cache-representative sizes and shapes. Two shapes per size:
symbol/debug strings + zero padding interleaved with incompressible machine
code), zstd ratio ~1.3–1.7×.
zstd/lz4): effectively incompressible; exercises the shrink-check.
Same setup as above (release, one core, batched durability, single shard, store
on tmpfs — so on real disk the compression throughput picture is more favourable
than shown, since fewer bytes reach the actual I/O path).
objfile (semi-compressible raw
.o)precompressed (already sccache-compressed)
Findings (sccache)
equal; the 16 KiB page granularity dominates footprint (ratio < 1). The min-
bytes default + shrink-check keep compression from wasting CPU here.
(zstd 2 MiB: 1.68×, ~40% smaller on disk). On tmpfs this comes with a write
CPU cost (zstd 2 MiB set: 236 vs 540 MiB/s for master) because there is no
disk-write time to reclaim; GET is faster (660 vs 589 — fewer bytes read).
On a real disk the write cost is partly-to-fully offset by writing ~40% fewer
bytes.
1.68× at 2 MiB) at markedly higher write throughput (356 vs 236 MiB/s), so it
reclaims disk with a smaller CPU hit. zstd wins when space is the priority.
them verbatim, so footprint and reads equal master; the only cost is one
wasted compress attempt on write (zstd 2 MiB: 501 vs 540 MiB/s, ~7%; lz4 ~3%).
--compression noneand master are identical across every sccache sizeand shape — no regression from the feature when it is off.
Recommendation for sccache deployments: object files compress modestly, so
compression trades write CPU for ~30–40% less disk. Use
lz4when the boxis CPU-constrained or write-heavy (best throughput-per-byte-saved),
zstdwhen disk is the constraint, and
noneif the cache stores predominantlysccache-precompressed blobs (compression can't help those, and the shrink-check
already makes the default safe if left on).