fix(ha): Fix flaky TestWatchPrefixNilPanicWithMemberlist#7493
Merged
friedrichg merged 1 commit intomasterfrom May 8, 2026
Merged
fix(ha): Fix flaky TestWatchPrefixNilPanicWithMemberlist#7493friedrichg merged 1 commit intomasterfrom
friedrichg merged 1 commit intomasterfrom
Conversation
The test was flaky due to a race between the WatchPrefix watcher registration in loop() and the CheckReplica call. StartAndAwaitRunning returns before the WatchPrefix goroutine registers its watcher channel in the memberlist KV. If CheckReplica's CAS + notifyWatchers fires before the watcher is registered, the notification is lost and the key never appears in the elected cache. Fix by adding a 100ms sleep before CheckReplica to allow the WatchPrefix goroutine to register its watcher channel (same pattern used in memberlist_client_test.go), and increasing the poll timeout from 3s to 5s for CI robustness. Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com>
d17174d to
82b5283
Compare
friedrichg
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix flaky
TestWatchPrefixNilPanicWithMemberlisttest inpkg/ha.Root Cause
The test has a race condition between the
WatchPrefixwatcher registration inloop()and theCheckReplicacall. When the HATracker starts,StartAndAwaitRunningreturns as soon as the service transitions to Running state, but theloop()goroutine hasn't yet registered its watcher channel in the memberlist KV. IfCheckReplica's CAS +notifyWatchersfires before the watcher is registered, the notification is lost and the key never appears in theelectedcache, causing the 3-second poll to time out.Fix
CheckReplicato allow theWatchPrefixgoroutine to register its watcher channel. This is the same pattern used inpkg/ring/kv/memberlist/memberlist_client_test.go(line 1650).Testing
-count=30— all pass.pkg/hatest suite passes.