Skip to content

IGNITE-28711 Introduced IgniteFeature related data structures#13195

Open
petrov-mg wants to merge 11 commits into
apache:masterfrom
petrov-mg:IGNITE-28711
Open

IGNITE-28711 Introduced IgniteFeature related data structures#13195
petrov-mg wants to merge 11 commits into
apache:masterfrom
petrov-mg:IGNITE-28711

Conversation

@petrov-mg

Copy link
Copy Markdown
Contributor

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

  • There is a single JIRA ticket related to the pull request.
  • The web-link to the pull request is attached to the JIRA ticket.
  • The JIRA ticket has the Patch Available state.
  • The pull request body describes changes that have been made.
    The description explains WHAT and WHY was made instead of HOW.
  • The pull request title is treated as the final commit message.
    The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • A reviewer has been mentioned through the JIRA comments
    (see the Maintainers list)
  • The pull request has been checked by the Teamcity Bot and
    the green visa attached to the JIRA ticket (see tab PR Check at TC.Bot - Instance 1 or TC.Bot - Instance 2)

Notes

If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.

@petrov-mg petrov-mg force-pushed the IGNITE-28711 branch 5 times, most recently from 7446109 to 3abdb8e Compare June 2, 2026 11:18
Comment on lines +80 to +83
private volatile boolean isNodeFenceActive;

/** Pair with current and target versions. {@code null} when rolling upgrade is disabled. */
@Nullable private volatile IgnitePair<IgniteProductVersion> rollUpVers;
/** */
private volatile boolean isVersionUpgradeEnabled;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep these two fields in metastorage to restore them in persistence mode after full restart of cluster?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing RU state in metastorage has a drawback - any data stored in metastorage requires full backward compatibility because it is written to PDS. Furthermore, there is currently no strong reasons to support RU state recovery after a full cluster restart. This has also been confirmed by potential users of the RU mechanism. In any case, if such a need arises, it can be easily implemented.

@Override public @Nullable IgniteNodeValidationResult validateNode(ClusterNode node) {
synchronized (lock) {
lastJoiningNode = node;
joiningNodes.add(joiningNode);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that a joining node may leave the cluster without producing any discovery event.

See org.apache.ignite.spi.discovery.tcp.ServerImpl.RingMessageWorker#processJoinRequestMessage, around lines 4522-4595:

err = spi.getSpiContext().validateNode(node);
...
if (!Objects.equals(locMarsh, rmtMarsh)) {
    ...
    // Send message "Local node's marshaller differs from remote node's marshaller"
    trySendMessageDirectly(node, new TcpDiscoveryCheckFailedMessage(locNodeId, sndMsg));
    return;
}

In this path the node has already passed component validation, but it is rejected later by TCP discovery checks and never joins the topology. As a result, the coordinator does not receive NODE_JOINED/NODE_LEFT/NODE_FAILED for this node, so it cannot remove it from joiningNodes by listening to discovery events only.

This is why the previous implementation tracked lastJoiningNode and lastJoiningNodeTimestamp. They were needed to eventually forget such abandoned join attempts

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant after IGNITE-28751

/** Pair with current and target versions. {@code null} when rolling upgrade is disabled. */
@Nullable private volatile IgnitePair<IgniteProductVersion> rollUpVers;
/** */
private volatile boolean isVersionUpgradeEnabled;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private volatile boolean isVersionUpgradeEnabled;
private volatile boolean isVerUpgradeEnabled;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@petrov-mg petrov-mg force-pushed the IGNITE-28711 branch 6 times, most recently from 3790ced to 24e6cec Compare June 10, 2026 22:27
Comment on lines +397 to +417
private IgniteInternalFuture<Message> executePreparePhase(Message req) {
synchronized (topGuard) {
if (isNodeFenceActive) {
return new GridFinishedFuture<>(new IgniteCheckedException(
"Cluster version finalization procedure is already in progress"));
}

synchronized (lock) {
minMaxVerPair = resolveMinMaxNodeVersions();
Set<IgniteProductVersion> distinctNodeVersions = distinctClusterProductVersions();

if (!minMaxVerPair.get1().equals(minMaxVerPair.get2()))
throw new IgniteCheckedException("Can't disable rolling upgrade with different versions in cluster: "
+ minMaxVerPair.get1() + ", " + minMaxVerPair.get2());
if (distinctNodeVersions.size() > 1) {
return new GridFinishedFuture<>(new IgniteCheckedException(
"Cluster version finalization failed. The topology contains nodes running multiple different" +
" versions [distinctNodeVersions=" + distinctNodeVersions + "]"
));
}

if (lastJoiningNode != null) {
IgniteProductVersion lastJoiningNodeVer = IgniteProductVersion.fromString(lastJoiningNode.attribute(ATTR_BUILD_VER));
isNodeFenceActive = true;

if (!minMaxVerPair.get1().equals(lastJoiningNodeVer))
throw new IgniteCheckedException("Can't disable rolling upgrade with different versions in cluster: "
+ minMaxVerPair.get1() + ", " + lastJoiningNodeVer);
return new GridFinishedFuture<>();
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to execute the prepare phase on every node?

Consider a 3-node cluster A, B, C where A is the coordinator. A starts finalization and sends the prepare phase to all nodes. If prepare succeeds on B but fails on C for some reasons, B has already set isNodeFenceActive=true, while the coordinator aborts finalization because one node failed. Since the complete phase is not started, B never resets the fence. A later finalization attempt will fail because B now reports "Cluster version finalization procedure is already in progress", and joins routed through B will also be rejected.

Could you clarify why prepare must validate cluster and set the fence independently on every node? If this is intended to protect coordinator changes, we probably still need a rollback mechanism for isNodeFenceActive when prepare fails on any node

@petrov-mg petrov-mg Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If prepare succeeds on B but fails on C for some reasons

For what reasons? This is crucial, as we rely on message delivery guarantee provided by Discovery.

coordinator aborts finalization because one node failed

Currently the distributed process is not aborted if nodes leave the cluster.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is not about discovery messaging.

The prepare phase can finish with an exception in GridFinishedFuture (the version check). So isNodeFenceActive becomes false on the node that failed, but true on the nodes that passed.

In my scenario, coordinator A may fail its own prepare phase, if its local joiningNodes still has a node with a different version. Only the coordinator handles joiningNodes, so the discovery delivery guarantee does not help here.

Because A returns an error, finishPreparePhase does not start completePhase. And completePhase is the only place that sets isNodeFenceActive back to false. So B and C keep the fence forever, while A is not even fenced

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


if (log.isInfoEnabled())
log.info("Cluster version was successfully finalized [activeLogicalVer=" + clusterLogicalVersion() + ']');
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow finalizeProc.start() from non-coordinator nodes?

finalizeClusterVersion can currently be called from different nodes at the same time. In that case we start two different distributed processes, for example finProc1 and finProc2.

finProc1 may execute prepare first and set isInProgress=true and isNodeFenceActive=true. Then finProc2 executes prepare and fails with finalization procedure is already in progress. However, when finProc2 finishes, finishProcess resets isInProgress=false for any reqId. Also, the prepare error path resets isNodeFenceActive=false.

As a result, the failed concurrent finalization can clear the fence of the still running finalization process.

Maybe we should allow finalize only from the coordinator, like the old RU implementation did, or keep isInProgress and isNodeFenceActive to a specific reqId and reset them only for the process that set them

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

spi(grid(1)).blockMessages((node, msg) -> msg instanceof SingleNodeMessage);

try {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +473 to +474
else if (U.isLocalNodeCoordinator(ctx.discovery()))
completePhase.start(reqId, null);

@chesnokoff chesnokoff Jun 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts about these lines.
Let's assume we start finalization and executePreparePhase completed on all nodes. After that we start finishPreparePhase on all nodes. If coordinator disconnects at this moment, other nodes may not execute start of completePhase. As a result, none of grids will start the completePhase. Moreover, if we try to call finalization again, it will be canceled because of enabled fences. And we also won't be able to connect new nodes to cluster. And probably the only way to fix it will be full restart of cluster during rolling upgrade

Of course, looks like the scenario is rare and we have the same pattern in other processes. But here it will break concept of rolling upgrade

WDYT?

@petrov-mg petrov-mg Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue you describe can indeed occur. A known example is when the coordinator enqueues an InitMessage corresponding to the finalization completion phase to the RingMessageWorker and then fails. There may be other such cases.

Even so, I propose keeping the current logic as is. There's no point in further complicating the RU logic to fix this issue. As you noted, the chances of such a scenario are low.

I think it worth it to implement @sergey-chugunov-1985 suggestion and introduce a mechanism for manually aborting the cluster version finalization process. This will give us a backup plan for such corner cases.

The PR contains an implementation of the mentioned proposal.

if (isNodeFenceActive) {
return new IgniteNodeValidationResult(
joiningNode.id(),
"Node joins are not allowed during cluster version finalization [joiningNode=" + joiningNode + ']');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have received user feedback that some error messages do not explain what to do next. In this case, the finalization process is an internal operation, so it is unclear whether the user should retry or take another action. Could we add an actionable hint, such as Wait for the current finalization process to complete and try again later?

We can also add similar tips to other RU errors, but it is up to you

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants