Skip to content

fix: decouple background command execution from caller cancellation token (#7224)#7283

Merged
sfmskywalker merged 2 commits intoelsa-workflows:release/3.6.1from
cristiandolf:fix/7224-background-cancellation
Apr 9, 2026
Merged

fix: decouple background command execution from caller cancellation token (#7224)#7283
sfmskywalker merged 2 commits intoelsa-workflows:release/3.6.1from
cristiandolf:fix/7224-background-cancellation

Conversation

@cristiandolf
Copy link
Copy Markdown
Contributor

Description

This PR addresses a TaskCanceledException occurring in BackgroundCommandSenderHostedService under high load.

The issue was caused by linking the background command execution to the original caller's CancellationToken. If the caller (e.g., an HTTP request) timed out or was cancelled while the command was still queued in the mediator's internal channel, the execution would fail immediately upon reaching the database layer (EF Core).

Changes

  • Added a pre-execution check in ReadOutputAsync to skip commands that are already cancelled, preventing unnecessary processing and noisy logs.
  • Decoupled ICommandSender.SendAsync from the caller's token, using only the host's lifetime token to ensure background tasks (like workflow dispatching) complete even if the triggering context is gone.

Related Issue

Fixes #7224

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

This PR fixes a critical TaskCanceledException that occurred when background commands were executed after their caller (e.g., HTTP request) had already timed out or been cancelled. The fix implements two key changes:

  • Pre-execution cancellation check: Added a check before processing each command to skip already-cancelled commands, preventing downstream failures in the database layer and reducing noisy error logs
  • Decoupled cancellation token: Replaced the linked cancellation token (caller + host) with only the host's lifetime token, ensuring background workflows complete even if the original HTTP request times out

This change aligns with the fire-and-forget semantics of the BackgroundStrategy - once a command is queued, it should complete regardless of the caller's state. The caller's token is still preserved in the CommandContext for the pre-execution check, allowing graceful skipping of stale commands while ensuring valid commands run to completion.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The fix correctly addresses the root cause by decoupling background command execution from the caller's cancellation token, which is the appropriate solution for fire-and-forget background tasks. The pre-execution check adds defensive handling for already-cancelled commands. The changes are minimal, focused, well-commented, and align with the expected behavior of background command processing.
  • No files require special attention

Important Files Changed

Filename Overview
src/common/Elsa.Mediator/HostedServices/BackgroundCommandSenderHostedService.cs Added pre-execution cancellation check and decoupled background command execution from caller's cancellation token to prevent TaskCanceledException under high load

Sequence Diagram

sequenceDiagram
    participant Client as HTTP Request/Caller
    participant BG as BackgroundStrategy
    participant Channel as CommandsChannel
    participant Worker as BackgroundCommandSenderHostedService
    participant Scope as Service Scope
    participant Sender as ICommandSender
    participant DB as Database/EF Core
    
    Client->>BG: SendAsync(command, Background, callerToken)
    BG->>Channel: WriteAsync(CommandContext with callerToken)
    Note over BG,Client: Returns immediately (fire-and-forget)
    Client->>Client: May timeout/cancel
    
    Channel->>Worker: Dequeue CommandContext
    Worker->>Worker: Check if callerToken.IsCancellationRequested
    alt Caller already cancelled
        Worker->>Worker: Log & skip command
    else Caller still active
        Worker->>Scope: CreateScope()
        Scope->>Sender: GetService<ICommandSender>()
        Worker->>Sender: SendAsync(command, hostToken)
        Note over Worker,Sender: Uses hostToken instead of callerToken
        Sender->>DB: Execute command
        DB-->>Sender: Success
        Sender-->>Worker: Complete
        Worker->>Scope: Dispose()
    end
Loading

@cristiandolf
Copy link
Copy Markdown
Contributor Author

@dotnet-policy-service agree

@cristiandolf
Copy link
Copy Markdown
Contributor Author

Hi @sfmskywalker,

I noticed that version 3.6.0 has been released and, from my initial testing, the issue addressed in this PR (related to #7224) seems to be resolved in the new release.

Could you please confirm if a fix for this was included in the 3.6.0 release through a different commit? If the issue is officially covered, feel free to close this PR.

Thanks!

@cristiandolf
Copy link
Copy Markdown
Contributor Author

Hi @sfmskywalker,

I have further evidence that the TaskCanceledException issue persists in version 3.6.0. After testing under high load (dispatching 200+ workflows concurrently), I consistently encounter the following error which leads to silent dispatch failures.

Logs & Stack Trace Analysis
The logs show that the BackgroundCommandSenderHostedService fails to open a database connection because it is using a CancellationToken that has already been signaled for cancellation (likely inherited from the original timed-out request).

`2026-04-01 16:56:24.746 [INF] Invoking DispatchWorkflowDefinitionCommand
2026-04-01 16:56:24.752 [ERR] An error occurred using the connection to database 'EGLUE_ELSA3_DEV'
2026-04-01 16:56:24.747 [ERR] An unhandled exception occurred while processing the queue

System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync(...)
at Elsa.Persistence.EFCore.Store2.QueryAsync(..., CancellationToken cancellationToken) at Elsa.Persistence.EFCore.Modules.Management.EFCoreWorkflowDefinitionStore.FindAsync(WorkflowDefinitionFilter filter, CancellationToken cancellationToken) at Elsa.Workflows.Runtime.Handlers.DispatchWorkflowCommandHandler.HandleAsync(DispatchWorkflowDefinitionCommand command, CancellationToken cancellationToken) at Elsa.Mediator.Middleware.Command.Components.CommandHandlerInvokerMiddleware.InvokeAsync(CommandContext context) at Elsa.Mediator.HostedServices.BackgroundCommandSenderHostedService.ReadOutputAsync(Channel1 output, CancellationToken cancellationToken)`

Key Observations

  1. Context: The error occurs strictly within the BackgroundCommandSenderHostedService worker.
  2. Root Cause: The TaskCanceledException proves that the CancellationToken passed down to EF Core is already cancelled when the background worker tries to fetch the workflow definition.
  3. Impact: Under load, some dispatches are lost entirely. For instance, in a test of 100 concurrent dispatches, only 99 records reached the database, confirming that the cancellation kills the operation before the transaction can even begin or complete.

This confirms that background commands must be decoupled from the caller's cancellation token and should only observe the application's lifetime token. My PR #7283 addresses this decoupling to ensure background reliability.

Could we please reconsider this fix for the next patch?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the mediator’s background command worker to avoid TaskCanceledException under load by preventing background command execution from being tied to an originating caller’s CancellationToken (e.g., an HTTP request timing out).

Changes:

  • Added a pre-execution cancellation check for queued commands in ReadOutputAsync.
  • Changed background command execution to use only the hosted service lifetime CancellationToken (instead of a linked token that includes the caller token).

Comment thread src/common/Elsa.Mediator/HostedServices/BackgroundCommandSenderHostedService.cs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sfmskywalker sfmskywalker changed the base branch from release/3.6.0 to release/3.6.1 April 9, 2026 09:44
@sfmskywalker sfmskywalker merged commit 6bc0cf2 into elsa-workflows:release/3.6.1 Apr 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants