feat(table): position-delete file writer and deletes manifest#28
Open
abnobdoss wants to merge 5 commits into
Open
feat(table): position-delete file writer and deletes manifest#28abnobdoss wants to merge 5 commits into
abnobdoss wants to merge 5 commits into
Conversation
added 5 commits
June 23, 2026 22:25
Introduces the merge-on-read write primitives so a v2 table can commit position deletes without rewriting data files: - manifest.py: ManifestWriterV2 / write_manifest accept a ManifestContent so deletes manifests can be written (v1 rejects delete manifests). - io/pyarrow.py: write_position_delete_file writes one spec-compliant v2 position-delete Parquet for a single referenced data file (sorted, deduped long positions, full untruncated file_path bound, content=POSITION_DELETES, equality_ids=None, partition/spec_id copied from the data file). - table/update/snapshot.py: _SnapshotProducer gains append_delete_file and writes a deletes manifest per spec; a new _RowDeltaFiles producer (exposed via UpdateSnapshot.row_delta(), aliased RowDeltaSnapshotProducer) commits appended data files AND delete files as one OVERWRITE row-delta snapshot that the existing positional-delete reader applies. Delete-file sequence numbers stay unassigned and are stamped at commit, so delete seq == data seq == snapshot seq. Data files are never rewritten.
Validate referenced file is DATA and positions are non-negative, reject non-delete files in append_delete_file, fall back to default_spec_id when the referenced data file has no spec_id, and reuse POSITIONAL_DELETE_SCHEMA (now with required fields) instead of a duplicate write schema.
Guard write_position_delete_file against non-v2 tables (a v1 DataFile has no content field and would be corrupt), write file_path and pos as non-nullable Arrow fields to match the required positional-delete schema, and restrict append_delete_file to position deletes since scan planning cannot yet read equality deletes.
Move append_delete_file off the shared _SnapshotProducer base so it is no longer inherited by fast_append/merge_append, where committing position deletes would produce an append snapshot that logically deletes rows.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.