Skip to content

[server] Fix table deletion stuck when StopReplicaRequest send fails#3359

Open
lilei1128 wants to merge 1 commit into
apache:mainfrom
lilei1128:fix
Open

[server] Fix table deletion stuck when StopReplicaRequest send fails#3359
lilei1128 wants to merge 1 commit into
apache:mainfrom
lilei1128:fix

Conversation

@lilei1128
Copy link
Copy Markdown

Purpose

Linked issue: close #xxx

When a StopReplicaRequest send fails with a network-level throwable,
the coordinator had already transitioned replicas to ReplicaDeletionStarted
but silently dropped the error. This permanently stuck table deletion because:

  • isEligibleForDeletion() blocks tables with any replica in ReplicaDeletionStarted
  • resumeDeletions() only triggers from processDeleteReplicaResponseReceived,
    which is only reached when a DeleteReplicaResponseReceivedEvent is emitted
  • processDeadTabletServer() skips replicas that are toBeDeleted

Brief change log

Fix: when sendStopRequest() fails with a throwable, emit a
DeleteReplicaResponseReceivedEvent with error results for the
delete-flagged buckets (delete=true && deleteRemote=true). This feeds
into the existing retry/give-up logic in processDeleteReplicaResponseReceived,
which retries up to DELETE_TRY_TIMES (5) and then force-marks replicas as
ReplicaDeletionSuccessful, allowing deletion to complete.

Non-deletion stop replicas (delete=false) continue to silently ignore
send failures as before, since they are best-effort.

Tests

Add testDeleteTableWithSendFailure() to verify that deletion completes
even when all StopReplicaRequest sends fail at the network level (i.e.,
CompletableFuture.failedFuture, not a response-level error code). The
existing testDeleteReplicaStateChange() only covered response-level errors.

API and Format

Documentation

@lilei1128
Copy link
Copy Markdown
Author

close #3357

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant