feat(table): write deletion vectors (Puffin) for v3#22
Open
abnobdoss wants to merge 2 commits into
Open
Conversation
added 2 commits
June 23, 2026 08:38
Revives the deletion-vector writer from PR apache#3474 (which itself revives the stale-bot-closed apache#2822) and hardens it with adversarial round-trip tests proving the writer and the existing PuffinFile reader are true inverses. - PuffinWriter writes a spec-compliant deletion-vector-v1 Puffin blob: length(4B BE) | DV magic D1D33964 | roaring portable vector | CRC-32(4B BE). - Vector body = roaring portable 64-bit (8B LE count + per-key 4B LE key + standard 32-bit roaring bitmap from pyroaring). - Reader fix: _bitmaps_to_chunked_array pins type=pa.int64() so high-key-only DVs (leading empty chunks) round-trip instead of failing pyarrow inference. - Tests cover: empty file, empty-high-key, large/sparse (~50k), 32-bit boundary positions, duplicate dedup, independent byte-level CRC/length framing verification, and independent croaring-portability of the vector body (parsed without pyiceberg's own deserializer; cookie pinned). Integration into the commit path depends on the v3 write gate (track T1); this prototype is the writer + round-trip, decoupled from manifest/commit.
… fails Build and validate the new blob into local variables before replacing the writer's stored blob, so a failed set_blob call no longer silently discards an already-set blob and produces an empty Puffin file on finish(). Add a regression test and a byte-exact assertion against the RoaringBitmap portable serialization format.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.