Skip to content

GH-3522: Eliminate unnecessary page-size copies in compressed page assembly and CRC checksums#3536

Closed
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:perf-page-copy-elimination
Closed

GH-3522: Eliminate unnecessary page-size copies in compressed page assembly and CRC checksums#3536
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:perf-page-copy-elimination

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented May 1, 2026

Summary

  • Eliminate full-page toByteArray() copy in BAOSBytesInput.writeInto(ByteBuffer) by streaming through a ByteBufferBackedOutputStream adapter
  • Eliminate full-page toByteArray() copy in CRC32 checksum computation by streaming through a CRC32OutputStream adapter

Details

Two sources of unnecessary full-page-size byte[] allocations:

  1. BAOSBytesInput.writeInto(ByteBuffer) called toByteArray() which copies the entire internal buffer. Replace with writeTo() using a thin ByteBufferBackedOutputStream adapter that writes directly from the internal buf[] without allocation.

  2. CRC32 checksum computation in ColumnChunkPageWriteStore called toByteArray() on each BytesInput to pass to crc.update(byte[]). Replace with writeAllTo(CRC32OutputStream) that streams bytes directly to crc.update() without intermediate copies.

For a typical 1MB page, this eliminates 1-3 full-page allocations per page during write (one per CRC computation + one for the ByteBuffer assembly). The benefit is reduced GC pressure and peak memory rather than steady-state throughput.

All TestBytesInput tests pass. Compiles clean.

…age assembly and CRC checksums

Two sources of unnecessary full-page-size byte[] allocations:

1. BAOSBytesInput.writeInto(ByteBuffer) called toByteArray() which copies
   the entire internal buffer. Replace with writeTo() using a thin
   ByteBufferBackedOutputStream adapter that writes directly from the
   internal buf[] without allocation.

2. CRC32 checksum computation in ColumnChunkPageWriteStore called
   toByteArray() on each BytesInput to pass to crc.update(byte[]).
   Replace with writeAllTo(CRC32OutputStream) that streams bytes
   directly to crc.update() without intermediate copies.

For a typical 1MB page, this eliminates 1-3 full-page allocations per
page during write (one per CRC computation + one for the ByteBuffer
assembly).
@iemejia iemejia force-pushed the perf-page-copy-elimination branch from 006519e to 1032712 Compare May 12, 2026 06:32
@iemejia iemejia marked this pull request as draft May 15, 2026 09:35
@iemejia
Copy link
Copy Markdown
Member Author

iemejia commented May 17, 2026

Closing — this will be superseded by a forthcoming PR for column I/O and page assembly optimizations (v2-par7-column-io), which is not yet opened because it depends on #3565 (PLAIN) and #3568 (RLE) being merged first.

I initially submitted a series of small, focused PRs thinking they'd be easier to review. In practice the sheer number (~16 PRs, with more pending) made things harder to follow — even for me. I've regrouped the changes by encoding type / performance area so that each PR is self-contained with its own benchmarks and test coverage, which should make review and performance analysis much more straightforward.

Apologies for the churn. The replacement PR will be opened once its dependencies land. Thank you.

@iemejia iemejia closed this May 17, 2026
@iemejia iemejia deleted the perf-page-copy-elimination branch May 17, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant