You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
wardlican
changed the title
[Subtask]: Optimizations and troubleshooting for the Master-Slave mode. #4171
[Subtask]: Optimizations and troubleshooting for the Master-Slave mode.
Apr 10, 2026
Offline nodes with missing last_update_time may never be reclaimed
In AmsAssignService.detectNodeChanges (around lines 528-545), a node is marked offline only when lastUpdateTime > 0 && (currentTime - lastUpdateTime) > nodeOfflineTimeoutMs.
However, both DBBucketAssignStore#getLastUpdateTime and ZkBucketAssignStore#getLastUpdateTime return 0 when the timestamp is missing. In that case, a node that is already absent from the alive-node list will never enter the offline branch, so its buckets are never redistributed. This can leave bucket ownership stranded and prevent load recovery.
Treat lastUpdateTime <= 0 as an offline-eligible case when the node is not in aliveNodeKeys, and reclaim buckets directly; or Keep a short grace period, but after the grace period, force offline even if timestamp is missing.
Offline nodes with missing last_update_time may never be reclaimed
In AmsAssignService.detectNodeChanges (around lines 528-545), a node is marked offline only when lastUpdateTime > 0 && (currentTime - lastUpdateTime) > nodeOfflineTimeoutMs.
However, both DBBucketAssignStore#getLastUpdateTime and ZkBucketAssignStore#getLastUpdateTime return 0 when the timestamp is missing. In that case, a node that is already absent from the alive-node list will never enter the offline branch, so its buckets are never redistributed. This can leave bucket ownership stranded and prevent load recovery.
Treat lastUpdateTime <= 0 as an offline-eligible case when the node is not in aliveNodeKeys, and reclaim buckets directly; or Keep a short grace period, but after the grace period, force offline even if timestamp is missing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
Close #4171 .
Brief change log
bucket_idmust be assigned to them.bucket_id, leading to bucket imbalance.-msmflag was not passed during the optimizer's startup.How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before making a pull request
Documentation