fix: decouple background command execution from caller cancellation token (#7224)#7283
Conversation
Greptile OverviewGreptile SummaryThis PR fixes a critical
This change aligns with the fire-and-forget semantics of the Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| src/common/Elsa.Mediator/HostedServices/BackgroundCommandSenderHostedService.cs | Added pre-execution cancellation check and decoupled background command execution from caller's cancellation token to prevent TaskCanceledException under high load |
Sequence Diagram
sequenceDiagram
participant Client as HTTP Request/Caller
participant BG as BackgroundStrategy
participant Channel as CommandsChannel
participant Worker as BackgroundCommandSenderHostedService
participant Scope as Service Scope
participant Sender as ICommandSender
participant DB as Database/EF Core
Client->>BG: SendAsync(command, Background, callerToken)
BG->>Channel: WriteAsync(CommandContext with callerToken)
Note over BG,Client: Returns immediately (fire-and-forget)
Client->>Client: May timeout/cancel
Channel->>Worker: Dequeue CommandContext
Worker->>Worker: Check if callerToken.IsCancellationRequested
alt Caller already cancelled
Worker->>Worker: Log & skip command
else Caller still active
Worker->>Scope: CreateScope()
Scope->>Sender: GetService<ICommandSender>()
Worker->>Sender: SendAsync(command, hostToken)
Note over Worker,Sender: Uses hostToken instead of callerToken
Sender->>DB: Execute command
DB-->>Sender: Success
Sender-->>Worker: Complete
Worker->>Scope: Dispose()
end
|
@dotnet-policy-service agree |
|
Hi @sfmskywalker, I noticed that version 3.6.0 has been released and, from my initial testing, the issue addressed in this PR (related to #7224) seems to be resolved in the new release. Could you please confirm if a fix for this was included in the 3.6.0 release through a different commit? If the issue is officially covered, feel free to close this PR. Thanks! |
|
Hi @sfmskywalker, I have further evidence that the TaskCanceledException issue persists in version 3.6.0. After testing under high load (dispatching 200+ workflows concurrently), I consistently encounter the following error which leads to silent dispatch failures. Logs & Stack Trace Analysis `2026-04-01 16:56:24.746 [INF] Invoking DispatchWorkflowDefinitionCommand System.Threading.Tasks.TaskCanceledException: A task was canceled. Key Observations
This confirms that background commands must be decoupled from the caller's cancellation token and should only observe the application's lifetime token. My PR #7283 addresses this decoupling to ensure background reliability. Could we please reconsider this fix for the next patch? |
There was a problem hiding this comment.
Pull request overview
This PR updates the mediator’s background command worker to avoid TaskCanceledException under load by preventing background command execution from being tied to an originating caller’s CancellationToken (e.g., an HTTP request timing out).
Changes:
- Added a pre-execution cancellation check for queued commands in
ReadOutputAsync. - Changed background command execution to use only the hosted service lifetime
CancellationToken(instead of a linked token that includes the caller token).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Description
This PR addresses a
TaskCanceledExceptionoccurring inBackgroundCommandSenderHostedServiceunder high load.The issue was caused by linking the background command execution to the original caller's
CancellationToken. If the caller (e.g., an HTTP request) timed out or was cancelled while the command was still queued in the mediator's internal channel, the execution would fail immediately upon reaching the database layer (EF Core).Changes
ReadOutputAsyncto skip commands that are already cancelled, preventing unnecessary processing and noisy logs.ICommandSender.SendAsyncfrom the caller's token, using only the host's lifetime token to ensure background tasks (like workflow dispatching) complete even if the triggering context is gone.Related Issue
Fixes #7224