Skip to content

feat(table): enable v3 writes with row-lineage (first_row_id) support#21

Open
abnobdoss wants to merge 8 commits into
mainfrom
v3/t1-foundation-v2
Open

feat(table): enable v3 writes with row-lineage (first_row_id) support#21
abnobdoss wants to merge 8 commits into
mainfrom
v3/t1-foundation-v2

Conversation

@abnobdoss

Copy link
Copy Markdown
Owner

No description provided.

Abanoub Doss added 8 commits June 23, 2026 09:45
Enables writing v3 table metadata and implements spec-conformant row
lineage (next-row-id / first-row-id / added-rows) end to end.

- metadata.py: bump SUPPORTED_TABLE_FORMAT_VERSION to 3; remove the
  TableMetadataV3.model_dump_json NotImplementedError so v3 serializes;
  initialize next_row_id=0 for new v3 tables.
- manifest.py: DEFAULT_READ_VERSION=3 so manifest-list first_row_id (field
  520) round-trips; add ManifestWriterV3, ManifestListWriterV3 (assigns
  first_row_id only to DATA manifests lacking one, advancing by
  existing+added rows), and the ManifestFile.first_row_id accessor.
- update/__init__.py: upgrade-to-v3 seeds next_row_id=0; AddSnapshotUpdate
  computes next_row_id = next_row_id + added_rows with NO silent fallback
  to None (raises instead), fixing the bug where v3 commits collapsed
  next_row_id to None.
- update/snapshot.py: _commit derives added_rows from the manifest-list
  writer's row-id advance and sets first_row_id; merge manager inherits
  min(first_row_id) for merged data manifests and refuses to merge v3
  manifests whose row-id ranges are non-contiguous/out-of-order
  (correctness over compaction).
- tests: acceptance suite (create v3, append twice, merge-append,
  delete+merge gap, json round-trip) plus negative guards.
…erent upgrade gate

Remediate four defects found by the t1-foundation skeptic audit:

1. [HIGH] Copy-on-write delete no longer re-numbers surviving rows.
   - Whole-file deletes (incl. shared-manifest case) materialize each survivor's
     absolute _row_id into DataFile field 142 and inherit the source manifest's
     first_row_id, so next_row_id and survivors' row ids are preserved.
   - Partial deletes that need a physical data-file rewrite now FAIL LOUDLY on v3
     (NotImplementedError) instead of silently re-numbering survivors, because
     PyIceberg has no read-side _row_id materialization yet.

2. [HIGH] v3 manifest merge was silently disabled by a descending-vs-ascending
   ordering check in _v3_row_ids_are_contiguous. The check now sorts ranges and
   verifies gapless/non-overlapping regardless of input order; _create_manifest
   writes merged entries in ascending row-id order so min(first_row_id) inheritance
   is correct. The apache#3070 double-count fix is now actually exercised.

3. [MED] upgrade_table_version gate hardcoded {1,2}; now uses
   SUPPORTED_TABLE_FORMAT_VERSION so v2->v3 upgrade works end to end (seeds
   next_row_id=0), consistent with create-allows-v3.

4. [LOW] The two mislabeled "merge" tests now instrument _create_manifest and
   assert a merge actually runs; added whole-file delete-lineage tests asserting
   field-142 _row_id values, a loud-fail test, a v2 regression test, and an
   upgrade test.

DataFile gains a first_row_id (field 142) accessor and a __copy__ that isolates
its _data list so lineage materialization does not leak into source DataFiles.
Address skeptic re-audit LOW nit: from_args always allocates the full v3-width
record, so the length-based "non-v3" guard never fires for from_args-built files.
Clarify the comment; the guard is retained only for directly-constructed short
records.
…py ManifestFile; advance next-row-id from snapshot first-row-id
…mbering carried-forward files after v2 to v3 upgrade
@abnobdoss abnobdoss closed this Jun 25, 2026
@abnobdoss abnobdoss reopened this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant