Summary
A long, synchronous ExecuteMenuItem (longer than FrameIOTimeoutMs = 30s) gets executed N times back-to-back instead of once, because StdioBridgeHost.HandleClientAsync times out the awaiting TCS but does not cancel/remove the queued command, and the broker reconnects and enqueues duplicates that all fire when the original finally completes.
In our repro: a Unity menu item that runs BuildPipeline.BuildPlayer for ~6 minutes ran 6 consecutive Android builds before we manually cancelled. Each was a full ~6-min build.
Repro
- Define an
EditorApplication.ExecuteMenuItem-callable menu that does any synchronous work taking longer than FrameIOTimeoutMs (currently 30000 ms). Easy stub: a menu that does Thread.Sleep(35_000). Realistic case: any BuildPipeline.BuildPlayer invocation.
- From an MCP client (e.g. Claude Code), invoke the menu via the
execute_menu_item tool.
- Observe in
Editor.log that the menu item runs N times sequentially, where N equals the number of broker reconnect attempts before it gives up.
The Editor.log shows the same stack repeated for each iteration:
<your menu method>
MCPForUnity.Editor.Tools.ExecuteMenuItem:HandleCommand
MCPForUnity.Editor.Tools.CommandRegistry:ExecuteCommand
TransportCommandDispatcher:ProcessCommand → ProcessQueue
StdioBridgeHost:ProcessCommands
UnityEditor.EditorApplication:Internal_CallUpdateFunctions
Verified against branch beta.
Root cause
There are two independent timeout paths, both set to the same 30s value (FrameIOTimeoutMs = 30_000 in StdioBridgeHost.cs), and one of them creates the retry-storm.
Path 1 — outer per-frame TCS timeout in StdioBridgeHost.HandleClientAsync (~lines 582-637)
HandleClientAsync reads a framed command, enqueues it into commandQueue with a TaskCompletionSource<string>, then awaits that TCS with a Task.Delay(FrameIOTimeoutMs) race.
If the queued command takes longer than 30s to complete (e.g. because ExecuteMenuItem is synchronously running a 6-minute build on the Unity main thread, blocking EditorApplication.update where ProcessCommands is hooked), the outer await sees a timeout, returns a "Command processing timed out after 30000 ms" error frame to the broker, and the loop continues trying to read the next frame.
Crucially: the queued command is NOT cancelled. ExecuteQueuedCommand fires its own Runner() (async void) which only removes the entry from commandQueue in its finally (~lines 989-993) — i.e. when the build eventually finishes.
Path 2 — broker reconnects, enqueueing duplicate commands
When the broker (Python side) sees the timeout response and the original operation never completed semantically, it reconnects. HandleClientAsync detects the new connection and explicitly closes stale clients (~lines 525-537, "Closing N stale client(s) after new connection"). The new connection produces a new commandId for the retried request, which is enqueued behind the still-running first one in commandQueue.
When the first 6-minute build finally returns control to Unity, processingCommands clears, the next EditorApplication.update tick picks up duplicate #2, marks it IsExecuting = true, and runs another full build. And so on, once per stalled retry that the broker queued up. Six retries = six builds.
Why the inner dispatcher does NOT itself loop
TransportCommandDispatcher.ProcessQueue is well-behaved on this axis:
- Sets
pending.IsExecuting = true before dispatch (~lines 298-299).
- Calls
RemovePending immediately for sync results (~line 498).
- Schedules
RemovePending via delayCall from a ContinueWith for async results (~lines 442-475).
- Guards with
_processingFlag (~lines 273-316) so an in-flight ProcessQueue cannot re-enter from another EditorApplication.update tick.
The user-supplied stack repeats not because the dispatcher re-fires the same PendingCommand id, but because ProcessCommands → ExecuteQueuedCommand → ExecuteCommandJsonAsync is invoked once per duplicate commandId that the broker enqueued at the StdioBridgeHost layer.
Suggested fixes (ordered smallest → most correct)
C — Idempotency / dedup on the bridge side (smallest patch, fully fixes symptom)
When HandleClientAsync enqueues a command, hash the payload + tool name and deduplicate against the in-flight set. If an identical command is already executing, return a "duplicate suppressed" response instead of enqueueing a second copy.
A — Make outer timeout cancel the queued work (most correct)
In HandleClientAsync, when the outer TCS race times out:
- Look up the
QueuedCommand by commandId and call Tcs.TrySetCanceled (or set a "client gone" flag).
- Remove the entry from
commandQueue immediately so a future ProcessCommands tick won't even start it. If it has already started (IsExecuting == true), pass cancellation down through the CancellationTokenSource already created in ExecuteQueuedCommand (~line 961) so TransportCommandDispatcher.ExecuteCommandJsonAsync honours it. The dispatcher already supports cancellation via its CancellationToken parameter.
- Note that
EditorApplication.ExecuteMenuItem itself is synchronous and not cancellable, so cancellation only prevents future duplicates, not the in-flight build. That is acceptable — the goal is "don't run it 6 times", not "stop the running build".
B — Decouple the timeouts (architectural)
The 30s FrameIOTimeoutMs is a reasonable I/O frame timeout (don't sit on a half-read header forever), but it is the wrong ceiling for "how long an MCP command may run". Possible refactors:
- Send the broker a periodic in-band keepalive frame while a command is executing, so the broker doesn't think the channel is stalled.
- Configurable per-tool execution timeout (with a much higher default, e.g. 10 min) separate from the I/O frame timeout.
- The broker side should not auto-retry
execute_menu_item-class tools at all; they are not idempotent. Idempotency hint should be set on the tool metadata (McpForUnityTool attribute) and respected by the broker.
Code pointers
| Concern |
File |
Approx. lines |
| Outer per-command TCS timeout (Path 1) |
MCPForUnity/Editor/Services/Transport/Transports/StdioBridgeHost.cs |
582-637 |
commandQueue dequeue / IsExecuting set |
same |
856-897 |
ExecuteQueuedCommand.Runner finally → only place stuck commands are reaped |
same |
955-997 |
| New-connection stale-client close |
same |
525-537 |
Inner dispatcher reentrancy guard _processingFlag |
MCPForUnity/Editor/Services/Transport/TransportCommandDispatcher.cs |
273-316 |
| Inner dispatcher pending bookkeeping |
same |
320-506 |
| The synchronous, non-cancellable menu invocation |
MCPForUnity/Editor/Tools/ExecuteMenuItem.cs |
38 |
The dead-letter / dedup / cancellation logic should live in StdioBridgeHost.HandleClientAsync (the only layer that knows the broker disconnected), not in TransportCommandDispatcher (which has no concept of a broker connection).
Suggested regression test
A stub tool that Thread.Sleeps 35s, invoked via the MCP client; assert it executes exactly once on the Editor side regardless of broker timeout.
Environment
mcp-unity branch beta
- Unity 2021.3.45f2
- Windows host, stdio transport
- MCP client: Claude Code (Anthropic)
Summary
A long, synchronous
ExecuteMenuItem(longer thanFrameIOTimeoutMs = 30s) gets executed N times back-to-back instead of once, becauseStdioBridgeHost.HandleClientAsynctimes out the awaiting TCS but does not cancel/remove the queued command, and the broker reconnects and enqueues duplicates that all fire when the original finally completes.In our repro: a Unity menu item that runs
BuildPipeline.BuildPlayerfor ~6 minutes ran 6 consecutive Android builds before we manually cancelled. Each was a full ~6-min build.Repro
EditorApplication.ExecuteMenuItem-callable menu that does any synchronous work taking longer thanFrameIOTimeoutMs(currently 30000 ms). Easy stub: a menu that doesThread.Sleep(35_000). Realistic case: anyBuildPipeline.BuildPlayerinvocation.execute_menu_itemtool.Editor.logthat the menu item runs N times sequentially, where N equals the number of broker reconnect attempts before it gives up.The Editor.log shows the same stack repeated for each iteration:
Verified against branch
beta.Root cause
There are two independent timeout paths, both set to the same 30s value (
FrameIOTimeoutMs = 30_000inStdioBridgeHost.cs), and one of them creates the retry-storm.Path 1 — outer per-frame TCS timeout in
StdioBridgeHost.HandleClientAsync(~lines 582-637)HandleClientAsyncreads a framed command, enqueues it intocommandQueuewith aTaskCompletionSource<string>, then awaits that TCS with aTask.Delay(FrameIOTimeoutMs)race.If the queued command takes longer than 30s to complete (e.g. because
ExecuteMenuItemis synchronously running a 6-minute build on the Unity main thread, blockingEditorApplication.updatewhereProcessCommandsis hooked), the outer await sees a timeout, returns a"Command processing timed out after 30000 ms"error frame to the broker, and the loop continues trying to read the next frame.Crucially: the queued command is NOT cancelled.
ExecuteQueuedCommandfires its ownRunner()(async void) which only removes the entry fromcommandQueuein itsfinally(~lines 989-993) — i.e. when the build eventually finishes.Path 2 — broker reconnects, enqueueing duplicate commands
When the broker (Python side) sees the timeout response and the original operation never completed semantically, it reconnects.
HandleClientAsyncdetects the new connection and explicitly closes stale clients (~lines 525-537, "Closing N stale client(s) after new connection"). The new connection produces a newcommandIdfor the retried request, which is enqueued behind the still-running first one incommandQueue.When the first 6-minute build finally returns control to Unity,
processingCommandsclears, the nextEditorApplication.updatetick picks up duplicate #2, marks itIsExecuting = true, and runs another full build. And so on, once per stalled retry that the broker queued up. Six retries = six builds.Why the inner dispatcher does NOT itself loop
TransportCommandDispatcher.ProcessQueueis well-behaved on this axis:pending.IsExecuting = truebefore dispatch (~lines 298-299).RemovePendingimmediately for sync results (~line 498).RemovePendingviadelayCallfrom aContinueWithfor async results (~lines 442-475)._processingFlag(~lines 273-316) so an in-flightProcessQueuecannot re-enter from anotherEditorApplication.updatetick.The user-supplied stack repeats not because the dispatcher re-fires the same
PendingCommandid, but becauseProcessCommands→ExecuteQueuedCommand→ExecuteCommandJsonAsyncis invoked once per duplicatecommandIdthat the broker enqueued at the StdioBridgeHost layer.Suggested fixes (ordered smallest → most correct)
C — Idempotency / dedup on the bridge side (smallest patch, fully fixes symptom)
When
HandleClientAsyncenqueues a command, hash the payload + tool name and deduplicate against the in-flight set. If an identical command is already executing, return a "duplicate suppressed" response instead of enqueueing a second copy.A — Make outer timeout cancel the queued work (most correct)
In
HandleClientAsync, when the outer TCS race times out:QueuedCommandbycommandIdand callTcs.TrySetCanceled(or set a "client gone" flag).commandQueueimmediately so a futureProcessCommandstick won't even start it. If it has already started (IsExecuting == true), pass cancellation down through theCancellationTokenSourcealready created inExecuteQueuedCommand(~line 961) soTransportCommandDispatcher.ExecuteCommandJsonAsynchonours it. The dispatcher already supports cancellation via itsCancellationTokenparameter.EditorApplication.ExecuteMenuItemitself is synchronous and not cancellable, so cancellation only prevents future duplicates, not the in-flight build. That is acceptable — the goal is "don't run it 6 times", not "stop the running build".B — Decouple the timeouts (architectural)
The 30s
FrameIOTimeoutMsis a reasonable I/O frame timeout (don't sit on a half-read header forever), but it is the wrong ceiling for "how long an MCP command may run". Possible refactors:execute_menu_item-class tools at all; they are not idempotent. Idempotency hint should be set on the tool metadata (McpForUnityToolattribute) and respected by the broker.Code pointers
MCPForUnity/Editor/Services/Transport/Transports/StdioBridgeHost.cscommandQueuedequeue /IsExecutingsetExecuteQueuedCommand.Runnerfinally → only place stuck commands are reaped_processingFlagMCPForUnity/Editor/Services/Transport/TransportCommandDispatcher.csMCPForUnity/Editor/Tools/ExecuteMenuItem.csThe dead-letter / dedup / cancellation logic should live in
StdioBridgeHost.HandleClientAsync(the only layer that knows the broker disconnected), not inTransportCommandDispatcher(which has no concept of a broker connection).Suggested regression test
A stub tool that
Thread.Sleeps 35s, invoked via the MCP client; assert it executes exactly once on the Editor side regardless of broker timeout.Environment
mcp-unitybranchbeta