Skip to content

feat: Migrate block compression infrastructure with LZ4, ZSTD, and none backends#52

Merged
leaves12138 merged 3 commits into
apache:mainfrom
lxy-9602:add-compression
Jun 8, 2026
Merged

feat: Migrate block compression infrastructure with LZ4, ZSTD, and none backends#52
leaves12138 merged 3 commits into
apache:mainfrom
lxy-9602:add-compression

Conversation

@lxy-9602
Copy link
Copy Markdown
Contributor

@lxy-9602 lxy-9602 commented Jun 8, 2026

Purpose

No Linked issue.

Migrate block compression framework for SST file I/O:

Block compression core (src/paimon/common/compression/):

  • BlockCompressionType — compression type enum (LZ4, ZSTD, NONE) (block_compression_type.h)
  • BlockCompressor — abstract compressor interface (block_compressor.h/cpp)
  • BlockDecompressor — abstract decompressor interface (block_decompressor.h/cpp)
  • BlockCompressionFactory — factory for creating compressor/decompressor by type (block_compression_factory.h/cpp)
  • NoneBlockCompressionFactory — passthrough (no-op) compression factory (none_block_compression_factory.h)

LZ4 backend (src/paimon/common/compression/lz4/):

  • Lz4BlockCompressionFactory — LZ4 compression factory (lz4_block_compression_factory.h)
  • Lz4BlockCompressor — LZ4 compressor implementation (lz4_block_compressor.h)
  • Lz4BlockDecompressor — LZ4 decompressor implementation (lz4_block_decompressor.h)

ZSTD backend (src/paimon/common/compression/zstd/):

  • ZstdBlockCompressionFactory — ZSTD compression factory (zstd_block_compression_factory.h)
  • ZstdBlockCompressor — ZSTD compressor implementation (zstd_block_compressor.h)
  • ZstdBlockDecompressor — ZSTD decompressor implementation (zstd_block_decompressor.h)

Tests

  • block_compression_factory_test.cpp — factory creation, round-trip compress/decompress for LZ4, ZSTD, and NONE

API and Format

Documentation

Generative AI tooling

Migrate-by: Aone Copilot (Claude)

@lxy-9602
Copy link
Copy Markdown
Contributor Author

lxy-9602 commented Jun 8, 2026

Thank you @ChaomingZhangCN for the contributions to the block compression infrastructure — including the core framework, LZ4/ZSTD/NONE backends, and tests — migrated as part of this batch. 🎉

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a blocking issue in the LZ4 block decompressor: it reads the 8-byte header before validating that src_length is at least HEADER_LENGTH. If a truncated/corrupted block shorter than 8 bytes is passed in, Decompress will read past the supplied buffer instead of returning an Invalid status. Please add a src_length >= HEADER_LENGTH guard before ReadIntLE and cover it with a corruption test. No other blocking issue found in this pass.

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the latest revision. The previous blocking issue is fixed: the LZ4 decompressor now validates that the input is at least the header length before reading the header, and the added regression coverage exercises the truncated-header and insufficient-output-buffer cases. I did not find any remaining blocking issue within the staged migration scope. Approving.

@leaves12138 leaves12138 merged commit 1bac51f into apache:main Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants