[FLINK-XXXXX][table] Refactor changelog inference program#28308
[FLINK-XXXXX][table] Refactor changelog inference program#28308bvarghese1 wants to merge 6 commits into
Conversation
… util
- The following helper methods are pulled out into a new ChangelogModeInferenceUtils.java
- getModifyKindSet
- getDeleteKind
- isNonUpsertKeyCondition
- The Scala program now delegates these three methods to the Java util
- This commit has no functional change and the ChangelogModeInferenceTest remains unchanged
- Move the following ptf changelog helpers methods to ChangelogModeInferenceUtils.java
- toChangelogMode
- ptfRequiresUpdateBefore
- extractPtfTableArgComponents
- toPtfChangelogContext
- queryPtfChangelogMode
- verifyPtfTableArgsForUpdates
- The Scala program keeps thin forwarders until the migration is complete
gustavodemorais
left a comment
There was a problem hiding this comment.
I agree the file's readability isn't great. We've now split the logic for update, modify, and delete across three files. When working in FlinkChangelogModeInferenceProgram I usually jump to one node and want all of its logic in one place, so I'd suggest other paths:
- Convert to Java - definitely +1.
- Move each node's logic into a registry (
Map<Class, NodeHandler>or visitor-per-node). This keeps a node's logic together across traits and drops theinstanceofladder entirely, so "what does node X do" becomes a single lookup. - Give repeated branches a name - many nodes just forward or do full-delete-if-updates.
- Name specific branches for readability, e.g.
visitWithFallback(child, preferred, fallback)instead ofOptional.or(() -> visit(child, fallback)).
Another option is splitting the instanceof ladder into one small private method per node (visitSink, visitJoin, visitCalc), so the top-level visit() becomes a short dispatch and each handler reads on its own. We already use this pattern in StreamNonDeterministicUpdatePlanVisitor. I lean toward the registry first, but glad to hear other opinions
Hi @gustavodemorais , yes that's definitely the plan :-) . But I want to do this refactoring in 2 phases. The first PR simply converts to Java with a few newly introduced classes like in this PR. The second PR will move each node into a registry (this is more involved and I did not want to complicate this initial PR). |
What is the purpose of the change
(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)
Brief change log
(for example:)
Verifying this change
Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation
Was generative AI tooling used to co-author this PR?