Make immediate-cancel task termination test deterministic#36967
Open
junhyeong9812 wants to merge 1 commit into
Open
Make immediate-cancel task termination test deterministic#36967junhyeong9812 wants to merge 1 commit into
junhyeong9812 wants to merge 1 commit into
Conversation
The taskTerminationTimeoutWithImmediateCancel test submitted a task and immediately closed the executor, then asserted that the future was cancelled. The cancellation flag is set by close() on the calling thread, while it is checked at the start of the task on a separate worker thread. With no ordering guarantee between the two, a quickly scheduled worker could pass the cancellation check before close() set the flag, complete the trivial task normally, and leave the future uncancelled, making the test fail intermittently under load. Override doExecute to capture the task-tracking wrapper instead of running it on a background thread, then run it on the test thread after close() has set the cancellation flag. This exercises the same cancellation path deterministically, with no reliance on thread scheduling. Signed-off-by: junhyeong9812 <pickjog@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
As discussed in #36965, this is a separate, test-only fix for the flaky
SimpleAsyncTaskExecutorTests.taskTerminationTimeoutWithImmediateCanceltest (the unrelated failure that turned CI red on that pull request). Production behavior is unchanged.Problem
The test submits a task and immediately closes the executor, then asserts the future was cancelled. The cancellation flag is set by
close()on the calling thread, while it is read at the start of the task inTaskTrackingRunnable.run()on a separate worker thread. There is no ordering guarantee between the two: if the worker thread reaches the cancellation check beforeclose()sets the flag, it passes the check, the trivial task completes normally, and the future is never cancelled, sofuture.get()returns instead of throwing. Under CI load the worker is more likely to win that race, which is why it fails intermittently.I reproduced this with a tight loop of the same scenario: roughly 2 to 4 failures per 50,000 iterations locally, which becomes more frequent on a loaded runner.
Fix
Override
doExecuteto capture the task-tracking wrapper instead of starting a background thread, then run the wrapper on the test thread afterclose()has set the cancellation flag. The same production cancellation path (checkCancelledtofuture.cancel(false)toCancellationException) is exercised, but the read and write now happen in program order on a single thread, so the race is gone. With the wrapper never started,activeThreadsis empty and the termination wait is skipped, so the test is also faster. The same capture technique is already used bycancelledThrottledTaskReleasesPermit. The deterministic version passes 50,000 of 50,000 iterations.