[test] fix flaky client-evict.tcl tests on slow CI machines#402
Closed
warrick1016 wants to merge 3 commits intomainfrom
Closed
[test] fix flaky client-evict.tcl tests on slow CI machines#402warrick1016 wants to merge 3 commits intomainfrom
warrick1016 wants to merge 3 commits intomainfrom
Conversation
- 'evict clients only until below limit': remove assert_range check that required all clients to use memory within 1% of each other. Different allocators (especially libc) can return significantly different allocation sizes for the same malloc argument, causing a ~1.9x ratio on some CI machines. The test logic already tracks max_client_mem correctly so the assertion is not needed. - 'evict clients in right order (large to small)': increase the per-iteration threshold buffer from 1000 bytes to 64kb, and add wait_for_condition to ensure evictions complete before verifying client counts. The 1000-byte buffer was too small to absorb natural memory growth between initial measurement and eviction time, causing one extra (smaller) client to be incorrectly evicted on slow machines. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace strict assert on total client memory with wait_for_condition. On slow CI machines, remaining clients can accumulate a small amount of memory between the config-set-triggered eviction and the subsequent clients_sum measurement, causing the strict assert to fail spuriously (e.g. 13291642 > 13271146 -- over by only ~20KB). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
'evict clients only until below limit': - Compute the eviction threshold from the actual current client memory sum rather than (max_client_mem * count / 2). The old formula assumed all clients used the same memory, which is violated on CI machines where the allocator returns different sizes for the same request. - Replace the exact count check (== client_count/2) with a broader assertion: total memory dropped within the limit, some clients were evicted, and not all clients were evicted. 'evict clients in right order (large to small)': - Replace the aggregate clients_sum wait with a targeted wait_for_condition that polls the specific client-index count == 0. The aggregate sum can transiently pass while a target client is still being disconnected, causing the subsequent assert to see count == 1 instead of 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
'evict clients only until below limit': remove assert_range check that required all clients to use memory within 1% of each other. Different allocators (especially libc) can return significantly different allocation sizes for the same malloc argument, causing a ~1.9x ratio on some CI machines. The test logic already tracks max_client_mem correctly so the assertion is not needed.
'evict clients in right order (large to small)': increase the per-iteration threshold buffer from 1000 bytes to 64kb, and add wait_for_condition to ensure evictions complete before verifying client counts. The 1000-byte buffer was too small to absorb natural memory growth between initial measurement and eviction time, causing one extra (smaller) client to be incorrectly evicted on slow machines.