Skip to content

[WIP] Replace transactions rebase onto refreshed metadata#15904

Draft
smaheshwar-pltr wants to merge 6 commits intoapache:mainfrom
smaheshwar-pltr:sm/replace-rebase-v2
Draft

[WIP] Replace transactions rebase onto refreshed metadata#15904
smaheshwar-pltr wants to merge 6 commits intoapache:mainfrom
smaheshwar-pltr:sm/replace-rebase-v2

Conversation

@smaheshwar-pltr
Copy link
Copy Markdown
Contributor

@smaheshwar-pltr smaheshwar-pltr commented Apr 7, 2026

Supersedes #15092.

Motivation

There are a few issues related to table replaces. BaseTransaction.commitReplaceTransaction() does not re-apply replacement and transaction updates onto refreshed metadata. When concurrent changes occur, the transaction therefore commits stale metadata.

When a REPLACE transaction commits after concurrent changes (appends, snapshot expiration, other replaces), it overwrites those changes with stale metadata. This can lead to snapshot history loss, and concurrent snapshot expiration can even cause table corruption. (#15090)

V3 tables require that snapshot.first-row-id >= table.next-row-id when adding a snapshot. The snapshot's first-row-id is set from base.nextRowId() when the snapshot is produced.

With REST catalogs, updates are sent to the server which are generally applied to the server's current metadata. If a concurrent commit advanced the server's next-row-id, the snapshot's first-row-id (based on stale metadata) will be behind:

Cannot add a snapshot, first-row-id is behind table next-row-id: 100 < 150

This is returned as CommitFailedException so the client can retry, but commitReplaceTransaction retries the same stale current — the snapshot still has the old first-row-id, so it fails every time. Therefore, I believe that in V3, any concurrent snapshot change in general (append, compaction, other replace) causes the replace to fail entirely. (#15905)

Less severe, but there are currently behaviour differences in concurrent replaces for REST vs non-REST catalogs due to this. E.g. for REST catalogs, properties are sent as a SetProperties delta and the server generally merges them via putAll, so concurrent property additions that have succeed survive a concurrent table replace. For non-REST catalogs though, they don't as the full TableMetadata object is committed directly, so the stale current overwrites all concurrent property changes.

This PR

This PR unifies commitReplaceTransaction and commitSimpleTransaction into a single commitWithRetry() method that refreshes metadata and re-applies pending updates before committing.

The start metadata (the initial buildReplacement result) is stored on BaseTransaction so the replacement can be rebuilt against refreshed metadata via startingMetadataFor(). A validateFieldIds check ensures that concurrent schema changes haven't caused field ID reassignment, which would make data files written during the transaction unreadable.

Also: in RESTTableOperations, the replaceBase field used before to generate requirements is removed — requirements are now generated from base and kept in sync via applyUpdates.

@github-actions github-actions Bot added the core label Apr 7, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@smaheshwar-pltr smaheshwar-pltr force-pushed the sm/replace-rebase-v2 branch from 1eac286 to 009f2a7 Compare May 5, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant