Skip to content

RATIS-2433. Cancel transaction in case of failure to append#1382

Open
spacemonkd wants to merge 2 commits intoapache:masterfrom
spacemonkd:RATIS-2433
Open

RATIS-2433. Cancel transaction in case of failure to append#1382
spacemonkd wants to merge 2 commits intoapache:masterfrom
spacemonkd:RATIS-2433

Conversation

@spacemonkd
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Currently in RaftServerImpl#writeAsyncImpl() the client request is added to the pending requests asynchronously.

In between if there is any failure/exception in appendTransactions() then we are not cancelling the transaction. The failure is returned to the client/retry-cache, but the statemachine is not notified.
This can cause partial state in the statemachine.

We should handle this such that in case of exceptions the statemachine is notified via cancelTransaction().

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2433

How was this patch tested?

Patch was tested via unit tests

@spacemonkd spacemonkd marked this pull request as ready for review March 21, 2026 07:20
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spacemonkd , thanks for working on this!

Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13081527/1382_review.patch

Comment on lines 844 to +864
final PendingRequests.Permit unsyncedPermit = unsyncedLeaderState.tryAcquirePendingRequest(request.getMessage());
if (unsyncedPermit == null) {
return getResourceUnavailableReply(request, cacheEntry);
final ResourceUnavailableException e = new ResourceUnavailableException(
getMemberId() + ": Failed to acquire a pending write request for " + request);
cancelTransaction(context, e);
return cacheEntry.failWithException(e);
}

final LeaderStateImpl leaderState;
final PendingRequest pending;
LeaderStateImpl leaderState = null;
PendingRequest pending = null;
CompletableFuture<RaftClientReply> failure = null;
Exception cancelException = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep returning immediately in case of a failure. This change make the code harder to read.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this method and call cancelTransaction inside. We should use it for the other ResourceUnavailableException cases.

Comment on lines +850 to 851
cancelTransaction(context, nle);
return RetryCacheImpl.failWithReply(reply, cacheEntry);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the failWithReply method to RaftServerImpl:

  private CompletableFuture<RaftClientReply> failWithReply(RaftClientReply reply, CacheEntry entry,
      TransactionContextImpl context) {
    cancelTransaction(context, reply.getException());
    if (entry == null) {
      return CompletableFuture.completedFuture(reply);
    }
    entry.failWithReply(reply);
    return entry.getReplyFuture();
  }

}

JavaUtils.attemptRepeatedly(() -> {
Assertions.assertTrue(numCancelTransaction.get() > 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we check the exact number instead of "> 0"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants