Skip to content

fix: allow flow runs to transition when deployment is deleted#19845

Closed
zzstoatzz wants to merge 1 commit intomainfrom
fix-orphan-awaiting-concurrency-slot-runs
Closed

fix: allow flow runs to transition when deployment is deleted#19845
zzstoatzz wants to merge 1 commit intomainfrom
fix-orphan-awaiting-concurrency-slot-runs

Conversation

@zzstoatzz
Copy link
Copy Markdown
Collaborator

@zzstoatzz zzstoatzz commented Dec 17, 2025

Summary

This PR fixes deployment concurrency issues that cause flow runs to get stuck in AwaitingConcurrencySlot state.

Bug 1: Cleanup incorrectly decremented active_slots when runs were rejected

Root cause: When a flow run was rejected to AwaitingConcurrencySlot because the deployment concurrency limit was reached, the cleanup method in SecureFlowConcurrencySlots was unconditionally decrementing active_slots even though no slot was ever acquired. This caused:

  • active_slots to go negative
  • Concurrency tracking to break completely
  • Runs in AwaitingConcurrencySlot could never acquire slots even when they became available

Fix: Only cleanup (decrement slots and revoke lease) if we actually acquired a slot, which is indicated by having a lease ID in the validated state.

Bug 2: Orphaned runs when deployment is deleted

Root cause: When a deployment was deleted while flow runs were in AwaitingConcurrencySlot, the orchestration rule would ABORT with "Deployment not found" when the worker tried to transition them, leaving runs stuck forever.

Fix: Cancel runs gracefully when their deployment is deleted instead of aborting.

Test plan

  • Added regression test test_rejected_to_awaiting_concurrency_slot_does_not_decrement_slots that verifies active_slots doesn't go negative when runs are rejected
  • Existing test test_deleted_deployment_allows_transition_instead_of_abort verifies orphaned runs are cancelled
  • All 67 TestFlowConcurrencyLimits tests pass

Impact

This fixes the deployment concurrency integration test that was failing in the OSS testbed with "Expected 4 completed runs, got 2" - where 2 runs would complete but the other 2 would stay stuck in AwaitingConcurrencySlot forever.

🤖 Generated with Claude Code

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 17, 2025

CodSpeed Performance Report

Merging #19845 will not alter performance

Comparing fix-orphan-awaiting-concurrency-slot-runs (a7691f2) with main (ed83c5a)

Summary

✅ 2 untouched

@zzstoatzz zzstoatzz force-pushed the fix-orphan-awaiting-concurrency-slot-runs branch from 0278348 to fd62f0c Compare December 17, 2025 18:04
When a deployment is deleted while flow runs are in AwaitingConcurrencySlot
state, those runs would get stuck forever. The SecureFlowConcurrencySlots
orchestration rule was aborting with "Deployment not found" instead of
allowing the transition to proceed.

Since the run can't execute without its deployment anyway, the fix now
cancels the run with a clear message instead of leaving it stuck in
AwaitingConcurrencySlot permanently.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@zzstoatzz zzstoatzz force-pushed the fix-orphan-awaiting-concurrency-slot-runs branch from f94389b to a7691f2 Compare December 17, 2025 19:36
@github-actions
Copy link
Copy Markdown
Contributor

This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request was closed because it has been stale for 14 days with no activity. If this pull request is important or you have more to add feel free to re-open it.

@github-actions github-actions Bot closed this Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant