Skip to content

feat(table): position-delete file writer and deletes manifest#28

Open
abnobdoss wants to merge 5 commits into
feat/mor-pr1-pos-longfrom
feat/mor-pr2-posdelete-writer
Open

feat(table): position-delete file writer and deletes manifest#28
abnobdoss wants to merge 5 commits into
feat/mor-pr1-pos-longfrom
feat/mor-pr2-posdelete-writer

Conversation

@abnobdoss

Copy link
Copy Markdown
Owner

No description provided.

Abanoub Doss added 5 commits June 23, 2026 22:25
Introduces the merge-on-read write primitives so a v2 table can commit
position deletes without rewriting data files:

- manifest.py: ManifestWriterV2 / write_manifest accept a ManifestContent
  so deletes manifests can be written (v1 rejects delete manifests).
- io/pyarrow.py: write_position_delete_file writes one spec-compliant v2
  position-delete Parquet for a single referenced data file (sorted, deduped
  long positions, full untruncated file_path bound, content=POSITION_DELETES,
  equality_ids=None, partition/spec_id copied from the data file).
- table/update/snapshot.py: _SnapshotProducer gains append_delete_file and
  writes a deletes manifest per spec; a new _RowDeltaFiles producer
  (exposed via UpdateSnapshot.row_delta(), aliased RowDeltaSnapshotProducer)
  commits appended data files AND delete files as one OVERWRITE row-delta
  snapshot that the existing positional-delete reader applies.

Delete-file sequence numbers stay unassigned and are stamped at commit, so
delete seq == data seq == snapshot seq. Data files are never rewritten.
Validate referenced file is DATA and positions are non-negative, reject
non-delete files in append_delete_file, fall back to default_spec_id when
the referenced data file has no spec_id, and reuse POSITIONAL_DELETE_SCHEMA
(now with required fields) instead of a duplicate write schema.
Guard write_position_delete_file against non-v2 tables (a v1 DataFile has
no content field and would be corrupt), write file_path and pos as
non-nullable Arrow fields to match the required positional-delete schema,
and restrict append_delete_file to position deletes since scan planning
cannot yet read equality deletes.
Move append_delete_file off the shared _SnapshotProducer base so it is no
longer inherited by fast_append/merge_append, where committing position
deletes would produce an append snapshot that logically deletes rows.
@abnobdoss abnobdoss closed this Jun 25, 2026
@abnobdoss abnobdoss reopened this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant