[FLINK-XXXXX][table] Refactor changelog inference program#28308

Draft

bvarghese1 wants to merge 6 commits into

apache:masterfrom

bvarghese1:refactor_changelog_inference_program

Contributor

bvarghese1 commented Jun 3, 2026

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

The TaskInfo is stored in the blob store on job creation time as a persistent artifact
Deployments RPC transmits only the blob storage reference
TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (100MB)
Extended integration test for recovery after master (JobManager) failure
Added test that validates that TaskInfo is transferred only once across recoveries
Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

bvarghese1 added 6 commits

June 3, 2026 11:42


          [FLINK-XXXXX][table] Extract changelog mode inference helpers to Java…

7c49fe5

… util

- The following helper methods are pulled out into a new ChangelogModeInferenceUtils.java
    - getModifyKindSet
    - getDeleteKind
    - isNonUpsertKeyCondition
- The Scala program now delegates these three methods to the Java util
- This commit has no functional change and the ChangelogModeInferenceTest remains unchanged


          [FLINK-XXXXX][table] Move changelog inference PTF helpers to Java util

0f2b119

- Move the following ptf changelog helpers methods to ChangelogModeInferenceUtils.java
    - toChangelogMode
    - ptfRequiresUpdateBefore
    - extractPtfTableArgComponents
    - toPtfChangelogContext
    - queryPtfChangelogMode
    - verifyPtfTableArgsForUpdates
- The Scala program keeps thin forwarders until the migration is complete


          [FLINK-XXXXX][table] Move SatisfyDeleteKindTraitVisitor into a top le…

917790f

…vel Java class


          [FLINK-XXXXX][table] Move SatisfyUpdateKindTraitVisitor into a top le…

ba59ea2

…vel Java class


          [FLINK-XXXXX][table] Move SatisfyModifyKindSetTraitVisitor to top lev…

…el Java class


          [FLINK-XXXXX][table] Port FlinkChangelogModeInferenceProgram to java

4aeb20c

Collaborator

flinkbot commented Jun 3, 2026 •

edited

Loading

CI report:

4aeb20c Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

gustavodemorais reviewed

View reviewed changes

Contributor

gustavodemorais left a comment

I agree the file's readability isn't great. We've now split the logic for update, modify, and delete across three files. When working in FlinkChangelogModeInferenceProgram I usually jump to one node and want all of its logic in one place, so I'd suggest other paths:

Convert to Java - definitely +1.
Move each node's logic into a registry (Map<Class, NodeHandler> or visitor-per-node). This keeps a node's logic together across traits and drops the instanceof ladder entirely, so "what does node X do" becomes a single lookup.
Give repeated branches a name - many nodes just forward or do full-delete-if-updates.
Name specific branches for readability, e.g. visitWithFallback(child, preferred, fallback) instead of Optional.or(() -> visit(child, fallback)).

Another option is splitting the instanceof ladder into one small private method per node (visitSink, visitJoin, visitCalc), so the top-level visit() becomes a short dispatch and each handler reads on its own. We already use this pattern in StreamNonDeterministicUpdatePlanVisitor. I lean toward the registry first, but glad to hear other opinions

Contributor Author

bvarghese1 commented Jun 5, 2026

I agree the file's readability isn't great. We've now split the logic for update, modify, and delete across three files. When working in FlinkChangelogModeInferenceProgram I usually jump to one node and want all of its logic in one place, so I'd suggest other paths:

Convert to Java - definitely +1.

Move each node's logic into a registry (Map<Class, NodeHandler> or visitor-per-node). This keeps a node's logic together across traits and drops the instanceof ladder entirely, so "what does node X do" becomes a single lookup.

Give repeated branches a name - many nodes just forward or do full-delete-if-updates.

Name specific branches for readability, e.g. visitWithFallback(child, preferred, fallback) instead of Optional.or(() -> visit(child, fallback)).

Another option is splitting the instanceof ladder into one small private method per node (visitSink, visitJoin, visitCalc), so the top-level visit() becomes a short dispatch and each handler reads on its own. We already use this pattern in StreamNonDeterministicUpdatePlanVisitor. I lean toward the registry first, but glad to hear other opinions

Hi @gustavodemorais , yes that's definitely the plan :-) . But I want to do this refactoring in 2 phases. The first PR simply converts to Java with a few newly introduced classes like in this PR. The second PR will move each node into a registry (this is more involved and I did not want to complicate this initial PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet