Skip to content

feat(table): merge-on-read position-delete writer#26

Open
abnobdoss wants to merge 3 commits into
mainfrom
v3/w5-posdelete
Open

feat(table): merge-on-read position-delete writer#26
abnobdoss wants to merge 3 commits into
mainfrom
v3/w5-posdelete

Conversation

@abnobdoss

Copy link
Copy Markdown
Owner

No description provided.

Abanoub Doss added 3 commits June 23, 2026 18:33
Wires Transaction.delete() to write Iceberg v2 position-delete files
instead of copy-on-write rewrites when write.delete.mode=merge-on-read
on a v2 table. CoW remains the default; v3 keeps the existing fallback.

- pyarrow: _write_position_delete_file writes a spec-correct positional
  delete Parquet (file_path field-id 2147483546 string required, pos
  field-id 2147483545 long required), sorted by (file_path, pos), with
  real parquet metrics; positions are physical row offsets numbered
  against the full data file (not the post-delete view).
- manifest: parameterize ManifestWriterV2/write_manifest with content so
  a DELETES-content manifest can be written; v1 rejected. Fix
  POSITIONAL_DELETE_SCHEMA pos to LongType per spec.
- snapshot: _SnapshotProducer gains append_delete_file +
  _write_added_delete_manifest; _DeleteFiles flips to OVERWRITE and
  commits position-delete files in their own deletes manifest with
  inherited sequence number. Data files are never rewritten.
- delete(): MoR path computes matched row positions per data file,
  skips already-deleted positions, groups by (spec_id, partition),
  writes one delete file per partition, commits. Round-trips through the
  existing positional-delete reader.

Tests (sqlite/in-memory catalog, x3 backends): single/multi-row delete,
partitioned per-partition delete files, sequence-number scoping (later
appends survive), successive deletes, plain to_arrow, empty-match warn,
delete-all (no data drop), multi-data-file-one-partition, CoW regression
guard. 30 passed.
…ete index applies

Override the file_path column statistics to FULL so its lower/upper bound is the
untruncated data-file path. Without it the default truncate(16) string metric makes
lower != upper, so the reader's exact-path delete index is never used and every
delete file falls back to the partition-level metrics evaluator. Adds a test
asserting the bound equals the full path and drops a redundant specs lookup.
@abnobdoss abnobdoss closed this Jun 25, 2026
@abnobdoss abnobdoss reopened this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant