Skip to content

[flink] Add restore_as_latest procedure#8139

Open
zhuxiangyi wants to merge 2 commits into
apache:masterfrom
zhuxiangyi:feat/restore-as-latest-procedure
Open

[flink] Add restore_as_latest procedure#8139
zhuxiangyi wants to merge 2 commits into
apache:masterfrom
zhuxiangyi:feat/restore-as-latest-procedure

Conversation

@zhuxiangyi
Copy link
Copy Markdown

Purpose

This PR adds a non-destructive restore procedure for Flink:

CALL sys.restore_as_latest(`table` => 'default.T', snapshot_id => 3);
CALL sys.restore_as_latest(`table` => 'default.T', tag => 'tag-1');

Unlike rollback_to, this procedure restores the table to the state of a target snapshot or tag by creating a new latest snapshot. Later snapshots and tags are preserved.

What is changed

  • Add RestoreAsLatestProcedure and register it in the Flink procedure factory list.
  • Add commit support to create a new latest snapshot from the complete data manifests of the target snapshot.
  • Add IT coverage for restoring from snapshot and tag, preserving later snapshots, and writing after restore.
  • Document the procedure in Flink procedures and snapshot/tag maintenance docs.

Tests

mvn -pl paimon-flink/paimon-flink-common -am -Pfast-build -DfailIfNoTests=false -Dtest=RestoreAsLatestProcedureITCase test
git diff --check

Notes

This PR is not associated with an issue yet. If the community prefers following the discussion-first flow strictly, I can open or join an issue/discussion and adjust the design accordingly.

@zhuxiangyi zhuxiangyi force-pushed the feat/restore-as-latest-procedure branch 2 times, most recently from ecd0935 to 0f821ed Compare June 5, 2026 23:37
@zhuxiangyi zhuxiangyi force-pushed the feat/restore-as-latest-procedure branch from 0f821ed to 1492902 Compare June 6, 2026 01:03
@JingsongLi
Copy link
Copy Markdown
Contributor

I think restoreAsLatest can be invisible to streaming readers that handle overwrite snapshots.

The new snapshot writes the target snapshot's files into the base manifest list, but writes an empty delta manifest list and marks the commit as CommitKind.OVERWRITE (FileStoreCommitImpl.java:1174-1200). DataTableStreamScan first handles overwrite snapshots via the overwrite-change path, and if the returned plan is empty it advances past the snapshot. Since the restore snapshot has no delta, a streaming reader with streaming-read-overwrite=true can skip the restore entirely, missing both files/rows that should be removed from the current latest snapshot and files/rows that should be restored from the target snapshot.

Could restoreAsLatest produce an overwrite delta relative to the previous latest snapshot (DELETE previous-only files and ADD target-only files), or introduce a dedicated commit kind/streaming-scan handling for restore snapshots?

@zhuxiangyi
Copy link
Copy Markdown
Author

@JingsongLi
Thanks for pointing this out. I agree that the current restore snapshot is not sufficient for streaming readers with streaming-read-overwrite=true.

The new snapshot currently has the target snapshot's complete data manifests in baseManifestList, but its deltaManifestList is empty. This makes the final table state correct for batch/full scans, but the restore can be invisible to streaming overwrite readers.

I will update restoreAsLatest to generate an overwrite delta relative to the previous latest snapshot: DELETE files that exist only in the previous latest snapshot, and ADD files that exist only in the target snapshot. The baseManifestList will contain the previous latest snapshot's merged effective ADD files, while deltaManifestList will describe the previous-latest-to-target transition.

Ensure restore_as_latest writes an overwrite delta so streaming overwrite readers can observe restored file changes.
@zhuxiangyi zhuxiangyi force-pushed the feat/restore-as-latest-procedure branch from 439db01 to 1c523ac Compare June 8, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants