Skip to content

feat(table): write deletion vectors (Puffin) for v3#22

Open
abnobdoss wants to merge 2 commits into
mainfrom
v3/t2-dv-write
Open

feat(table): write deletion vectors (Puffin) for v3#22
abnobdoss wants to merge 2 commits into
mainfrom
v3/t2-dv-write

Conversation

@abnobdoss

Copy link
Copy Markdown
Owner

No description provided.

Abanoub Doss added 2 commits June 23, 2026 08:38
Revives the deletion-vector writer from PR apache#3474 (which itself revives the
stale-bot-closed apache#2822) and hardens it with adversarial round-trip tests
proving the writer and the existing PuffinFile reader are true inverses.

- PuffinWriter writes a spec-compliant deletion-vector-v1 Puffin blob:
  length(4B BE) | DV magic D1D33964 | roaring portable vector | CRC-32(4B BE).
- Vector body = roaring portable 64-bit (8B LE count + per-key 4B LE key +
  standard 32-bit roaring bitmap from pyroaring).
- Reader fix: _bitmaps_to_chunked_array pins type=pa.int64() so high-key-only
  DVs (leading empty chunks) round-trip instead of failing pyarrow inference.
- Tests cover: empty file, empty-high-key, large/sparse (~50k), 32-bit
  boundary positions, duplicate dedup, independent byte-level CRC/length
  framing verification, and independent croaring-portability of the vector
  body (parsed without pyiceberg's own deserializer; cookie pinned).

Integration into the commit path depends on the v3 write gate (track T1);
this prototype is the writer + round-trip, decoupled from manifest/commit.
… fails

Build and validate the new blob into local variables before replacing the
writer's stored blob, so a failed set_blob call no longer silently discards
an already-set blob and produces an empty Puffin file on finish(). Add a
regression test and a byte-exact assertion against the RoaringBitmap portable
serialization format.
@abnobdoss abnobdoss closed this Jun 25, 2026
@abnobdoss abnobdoss reopened this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant