GH-1116: [Java] Fix compressed buffer prefix write and ZSTD dstCapacity#1119
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
While I think this change is fine, nothing should be modifying the buffer during this method. I don't believe any vectors/etc. are safe to use concurrently, so the comment is misleading about the guarantees we can provide.
There was a problem hiding this comment.
Good point — the original framing implied a concurrency guarantee we don't actually provide, and there's no in-tree doCompress implementation that mutates the source buffer's writerIndex. Tightened the comment to state only the engineering invariant (three in-method consumers must observe the same value) and dropped the "shared reference" / doCompress speculation:
// GH-1116: capture writerIndex() once so the empty-buffer check, size
// comparison, and uncompressed-length prefix all see the same value.
long uncompressedLength = uncompressedBuffer.writerIndex();Pushed in d7b6a8f.
The previous comment said the captured writerIndex() guarded against the buffer being a 'shared reference to a vector's internal buffer'. As @lidavidm pointed out, Arrow vectors are not safe for concurrent use, so that framing implied a guarantee we do not provide. Tighten the comment to state only the engineering invariant: the three in-method consumers (empty-buffer check, size comparison, and the uncompressed-length prefix) must all see the same value. Reference apacheGH-1116 for context.
|
@LuciferYang were you able to replicate the original issue, by the way? |
|
Honest answer: no. I built the two new tests around the shape of data the reporter described (all-null The fix came from reading the code: the old If you'd rather hold until we have a failing-then-green test, that's fair. Otherwise happy to dig further — the reporter's env was Arrow 18.3.0/19.0.0 with zstd-jni 1.5.7-6/-7, but the allocator setup wasn't in the report. |
|
I think we should document that the new tests don't actually reflect the original bug report, at least (or consider if they're necessary even). But the fixes themselves are worth having. |
…H-1116 The two tests added in the previous commit didn't fail on the unfixed code, so they aren't a regression test for the reported symptom. The shapes they exercise (all-null fixed-width round-trip, multi-batch ZSTD stream) are already covered by testEmptyBuffer and testReadWriteStream. Drop them per review feedback.
|
Fair. Dropped both tests in b8e6b46 — |
|
Thanks for your review @lidavidm |
What
Two fixes in the compression codec:
AbstractCompressionCodec.compress(): capturewriterIndex()onceThe previous code read
uncompressedBuffer.writerIndex()at multiple sites — for the size comparison and again afterdoCompress()to populate the 8-byte uncompressed-length prefix. Capture the value once at the top ofcompress()and reuse it for the empty-buffer check, the size comparison, and the prefix, so all three consumers see the same value.ZstdCompressionCodec.doCompress():dstCapacityoverstated by 8 bytesZstd.compressUnsafe(dst, dstSize, ...)expectsdstSizeto be the available space fromdst. The code offsetsdstby 8 bytes past the prefix but passed8 + maxSizeinstead ofmaxSize. ThecompressBound()headroom hides this in practice, but the parameter was semantically wrong. PassmaxSize.Tests
Covered by the existing round-trip tests (
testEmptyBuffer,testReadWriteStream,testReadWriteFile, etc.). I was not able to construct a minimal reproducer for the originaldeclaredUncompressed=0symptom on the unfixed code, so both fixes are conservative correctness improvements derived from code inspection rather than failing-then-green regression tests.