[spark] Support compact_chain_table procedure by juntaozhang · Pull Request #7313 · apache/paimon

juntaozhang · 2026-02-27T07:24:17Z

Purpose

Linked issue: close #7312

Tests

CompactChainTableProcedureTest.scala

API and Format

Documentation

docs/content/primary-key-table/chain-table.md
docs/content/spark/procedures.md

Generative AI tooling

No

JingsongLi

Thanks for adding the Spark procedure for chain table compaction.

Comments:

PR description is minimal — the "Purpose" section just links issue #7312 without explaining the approach. Please describe:
- How does compact_chain_table differ from regular compact?
- Does it compact both snapshot and delta branches?
- What's the merge strategy for chain compaction?
File changes: The PR touches ChainGroupReadTable, FallbackReadFileStoreTable, and ChainSplit. What changes were needed to support compaction vs. just read? Are these refactoring prerequisites or functional changes?
613 additions is significant. A test file (CompactChainTableProcedureTest.scala) + procedure + supporting core changes — consider whether the core changes could be a separate prerequisite PR.
Documentation: Good that both chain-table.md and procedures.md are updated.
ChainSplitTest: What cases does it cover? Chain tables have complex split logic due to cross-branch data dependencies.

Please fill in the PR description with the design approach before requesting final review.

juntaozhang · 2026-05-25T04:40:33Z

Thanks for adding the Spark procedure for chain table compaction.

Comments:

PR description is minimal — the "Purpose" section just links issue [Feature] [spark] Support compact_chain_table procedure #7312 without explaining the approach. Please describe:

How does compact_chain_table differ from regular compact?

Does it compact both snapshot and delta branches?

What's the merge strategy for chain compaction?

File changes: The PR touches ChainGroupReadTable, FallbackReadFileStoreTable, and ChainSplit. What changes were needed to support compaction vs. just read? Are these refactoring prerequisites or functional changes?

613 additions is significant. A test file (CompactChainTableProcedureTest.scala) + procedure + supporting core changes — consider whether the core changes could be a separate prerequisite PR.

Documentation: Good that both chain-table.md and procedures.md are updated.

ChainSplitTest: What cases does it cover? Chain tables have complex split logic due to cross-branch data dependencies.

Please fill in the PR description with the design approach before requesting final review.

Thank you for the thorough review and valuable feedback.

PR description
Already updated in [Feature] [spark] Support compact_chain_table procedure #7312.
Core file changes
Extracted to prerequisite PR [core] Minor refactor partition predicates in FallbackReadScan #7950.
~~ChainSplitTest / Split support~~
ScanPlanHelper.createNewScanPlan already supports the Split interface, no additional changes needed.

This PR is a prerequisite for [`[spark] Support compact_chain_table procedure`](#7313). `FallbackReadScan` currently uses a single `partitionPredicate` for both main and fallback scans. However, in chain table compaction scenarios, we need to apply different partition filters to the main (snapshot) branch and the fallback (delta) branch. For example, when overwriting an existing partition, we need to: - Exclude the target partition from the main scan - Include only the target partition in the fallback scan This PR refactors `FallbackReadScan` to support separate partition predicates for main and fallback scans.

JingsongLi · 2026-06-09T09:36:31Z

Can you rebase master?

juntaozhang · 2026-06-09T13:45:34Z

Can you rebase master?
Hi @JingsongLi, thanks for the reminder, done.

JingsongLi

I found a couple of issues while reviewing this change.

JingsongLi · 2026-06-09T15:30:09Z

+        boolean partitionExists = checkPartitionExists(snapshotTable, partition, relation);
+        if (partitionExists) {
+            if (overwrite) {
+                scan.withPartitionFilter(


This overwrite path appears to read too much data. When the target partition already exists, the snapshot-side predicate is changed to NOT (target partition), while the delta side still uses the target partition. However, ChainGroupReadTable.plan() adds all mainScan.plan() splits directly before planning the delta/anchor splits, and this procedure later rewrites every output row's partition columns to the target partition. That means unrelated snapshot partitions can be copied into the compacted target partition during overwrite. The overwrite path should only read the chain-merge result needed for the target partition, not every snapshot partition except the target.

Thanks a lot for your review. I added a flag of preloadTargetSnapshot to skip executing mainScan.plan() when overwriting the target partition.
Appreciate your reminder, thanks.

JingsongLi · 2026-06-09T15:30:09Z

+
+you will get the following result:
+```text
+---+----+-----+ 


Please remove the trailing whitespace in this added result block. git diff --check origin/master...HEAD reports trailing whitespace on this block, which will fail whitespace/style checks.

Done, thanks for your reminder.

juntaozhang force-pushed the pr-chain-table-compaction branch from 58cfdb0 to d381de3 Compare February 27, 2026 07:47

JingsongLi reviewed May 24, 2026

View reviewed changes

juntaozhang mentioned this pull request May 25, 2026

[core] Minor refactor partition predicates in FallbackReadScan #7950

Merged

juntaozhang force-pushed the pr-chain-table-compaction branch from d381de3 to 7ac2113 Compare May 25, 2026 04:37

juntaozhang mentioned this pull request May 26, 2026

[core] Support compact for chain table #7888

Closed

juntaozhang force-pushed the pr-chain-table-compaction branch from 7ac2113 to 6da3d39 Compare May 27, 2026 06:01

juntaozhang requested a review from JingsongLi May 27, 2026 09:10

juntaozhang added 2 commits June 9, 2026 17:55

[spark] Support compact_chain_table procedure

5aa039e

[spark] Optimize compact_chain_table procedure

35672ea

juntaozhang force-pushed the pr-chain-table-compaction branch from c6e5450 to 35672ea Compare June 9, 2026 09:57

JingsongLi reviewed Jun 9, 2026

View reviewed changes

Fix bug

835c5f4

juntaozhang requested a review from JingsongLi June 10, 2026 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Support compact_chain_table procedure#7313

[spark] Support compact_chain_table procedure#7313
juntaozhang wants to merge 3 commits into
apache:masterfrom
juntaozhang:pr-chain-table-compaction

juntaozhang commented Feb 27, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

juntaozhang commented May 25, 2026

Uh oh!

JingsongLi commented Jun 9, 2026

Uh oh!

juntaozhang commented Jun 9, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

JingsongLi Jun 9, 2026

Uh oh!

juntaozhang Jun 10, 2026

Uh oh!

JingsongLi Jun 9, 2026

Uh oh!

juntaozhang Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

juntaozhang commented Feb 27, 2026

Purpose

Tests

API and Format

Documentation

Generative AI tooling

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

juntaozhang commented May 25, 2026

Uh oh!

JingsongLi commented Jun 9, 2026

Uh oh!

juntaozhang commented Jun 9, 2026

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

juntaozhang Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

juntaozhang Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants