feat(table): merge-on-read position-delete writer#26
Open
abnobdoss wants to merge 3 commits into
Open
Conversation
added 3 commits
June 23, 2026 18:33
Wires Transaction.delete() to write Iceberg v2 position-delete files instead of copy-on-write rewrites when write.delete.mode=merge-on-read on a v2 table. CoW remains the default; v3 keeps the existing fallback. - pyarrow: _write_position_delete_file writes a spec-correct positional delete Parquet (file_path field-id 2147483546 string required, pos field-id 2147483545 long required), sorted by (file_path, pos), with real parquet metrics; positions are physical row offsets numbered against the full data file (not the post-delete view). - manifest: parameterize ManifestWriterV2/write_manifest with content so a DELETES-content manifest can be written; v1 rejected. Fix POSITIONAL_DELETE_SCHEMA pos to LongType per spec. - snapshot: _SnapshotProducer gains append_delete_file + _write_added_delete_manifest; _DeleteFiles flips to OVERWRITE and commits position-delete files in their own deletes manifest with inherited sequence number. Data files are never rewritten. - delete(): MoR path computes matched row positions per data file, skips already-deleted positions, groups by (spec_id, partition), writes one delete file per partition, commits. Round-trips through the existing positional-delete reader. Tests (sqlite/in-memory catalog, x3 backends): single/multi-row delete, partitioned per-partition delete files, sequence-number scoping (later appends survive), successive deletes, plain to_arrow, empty-match warn, delete-all (no data drop), multi-data-file-one-partition, CoW regression guard. 30 passed.
…ete index applies Override the file_path column statistics to FULL so its lower/upper bound is the untruncated data-file path. Without it the default truncate(16) string metric makes lower != upper, so the reader's exact-path delete index is never used and every delete file falls back to the partition-level metrics evaluator. Adds a test asserting the bound equals the full path and drops a redundant specs lookup.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.