You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When an approval-gated tool (e.g. microsoft_docs_search, approval_mode="always_require") is surfaced as an
AG-UI confirm_changes human-in-the-loop card and the user clicks Approve, the backend rejects the
approval with:
WARNING:agent_framework_ag_ui._agent_run:Rejected approval response id=call_MPgkkd1maUPTs4ToqH4Pj7ja:
no matching pending approval request
The gated tool then never executes. Two symptoms follow from that single failure:
UI: the tool chip is stuck on "Running" forever (no result ever arrives).
Backend crash on the next model call:
agent_framework.exceptions.ChatClientException: FoundryChatClient service failed to complete the prompt:
Error code: 400 - {'error': {'message': 'No tool output found for function call
call_MPgkkd1maUPTs4ToqH4Pj7ja.', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}
The root cause is that the server-side pending-approval registry key is built from two different thread_id values: the registration (during the pausing run) uses the provider's server-side
conversation id (conv_id), while the resolution (during the resume run) uses the client-supplied thread_id. With a server-side-stateful backend (Azure AI Foundry / OpenAI Responses API) plus a client that
pins its own thread id (CopilotKit), these two values diverge and the lookup always misses.
The incoming request logged Thread ID: 95dd2877…, but RUN_STARTED advertised thread_id: resp_02a1… — direct evidence of the mid-stream reassignment.
Preconditions that arm the trap
AgentFrameworkAgent(require_confirmation=True) turns each gated tool into a confirm_changes HITL card.
Several tools are gated (approval_mode="always_require").
The Foundry Responses API is server-side stateful: run_agent_stream sets run_kwargs["options"] = {"metadata": …, "store": True}
(_agent_run.pyrun_agent_stream),
and each response carries its own conversation_id.
A single in-memory registry pending_approvals (an OrderedDict on the AgentFrameworkAgent instance)
is keyed "{thread_id}:{request_id}" and persists across runs
(_agent.pyself._pending_approvals).
Step-by-step build-up
Run 1 — pause (request → approval card)
UI sends the first request with stable thread_id = 95dd2877….
The confirm_changes card (carrying function_call_id = call_MPgk…) is emitted; the run finishes,
pausing for input.
Trap armed: the pending approval is stored under the service id resp_02a1…:call_MPgk….
User clicks Approve
CopilotKit resumes with a new request, using its stable thread id 95dd2877…, and includes the
approval response for call_MPgk… (a role="user" turn carrying function_approval_response).
Run 2 — resume (where it breaks)
run_agent_stream again sets thread_id = 95dd2877… from the incoming request.
Before the stream starts, _resolve_approval_responses(…, thread_id=95dd2877…) builds the lookup key:
registry_key=f"{thread_id}:{resp_id}"# "95dd2877…:call_MPgk…"ifregistry_keynotinpending_approvals: # registered key was "resp_02a1…:call_MPgk…"logger.warning("Rejected approval response id=%s: no matching pending approval request", resp_id)
Key mismatch → the approval is rejected and stripped from messages. The tool is never executed; no function_call_output is produced for call_MPgk….
The compounding defect → hard 400
Separately, the history sanitizer _sanitize_tool_history sees the function_approval_response for call_MPgk…, removes it from the pending set assuming "the framework will execute it," and therefore does not inject a synthetic skip result for it
(_message_adapters.py).
The other, ungated calls that are still pending do receive Tool execution skipped - user provided follow-up message.
The two subsystems disagree:
sanitizer: "don't add a result — it will be executed";
approval validator: "rejected — I will not execute it".
Neither produces an output for call_MPgk….
The outbound transcript now has 7 function_call items but only 6 function_call_output items — the
missing one is exactly call_MPgk….
The Foundry Responses API validates the request input, finds a function_call with no matching output,
and returns 400 - No tool output found for function call call_MPgkkd1maUPTs4ToqH4Pj7ja, surfaced as ChatClientException / BadRequestError.
Diagram
sequenceDiagram
autonumber
participant UI as CopilotKit UI
participant RUN as run_agent_stream
participant REG as pending_approvals
participant LLM as Foundry Responses API
rect rgb(245,245,255)
Note over UI,LLM: RUN 1 — pause
UI->>RUN: POST / (thread_id=95dd2877…, "landing zone…")
Note over RUN: thread_id = 95dd2877…
RUN->>RUN: _resolve_approval_responses (nothing pending)
RUN->>LLM: agent.run(stream=True, store=True)
LLM-->>RUN: first update → conv_id = resp_02a1…
Note over RUN: thread_id = conv_id ⇒ resp_02a1…
LLM-->>RUN: commentary preamble + gated microsoft_docs_search(call_MPgk…)
RUN->>REG: register "resp_02a1…:call_MPgk…"
RUN-->>UI: confirm_changes(function_call_id=call_MPgk…) + RUN_FINISHED
end
rect rgb(255,245,245)
Note over UI,LLM: RUN 2 — resume (Approve)
UI->>RUN: POST / (thread_id=95dd2877…, approval for call_MPgk…)
Note over RUN: thread_id = 95dd2877… (pre-stream)
RUN->>REG: lookup "95dd2877…:call_MPgk…"
REG-->>RUN: MISS (registered under resp_02a1…)
Note over RUN: reject + strip approval ⇒ tool NOT executed
RUN->>RUN: sanitizer skipped injecting a result (assumed execution)
Note over RUN: transcript has function_call call_MPgk… with NO output
RUN->>LLM: next request (7 calls, 6 outputs)
LLM-->>RUN: 400 No tool output found for call_MPgk…
RUN-->>UI: ChatClientException (chip stuck "Running")
end
Loading
Why the reassignment exists (it is intentional)
The thread_id = conv_id rewrite is a deliberate conversation-continuity mechanism for stateful backends,
not a mistake:
The AG-UI client contract is "the server must handle history via thread_id"
(_client.py docstring). For a Responses-API
backend, the only handle that can resume the stored conversation is the provider's conversation_id, so the
server rewrites thread_id to conv_id and advertises it in RUN_STARTED.
The provider only reveals its conversation_id in the first streamed update, so the rewrite must happen
mid-stream (hence "emit RunStarted after first update to get service IDs").
After the rewrite, the same thread_id also keys the snapshot store and the RUN_FINISHED event,
aligning one durable id across client thread, provider conversation, and server snapshot.
The defect is that the pending-approval registry reused this lifecycle-sensitive variable without
accounting for the fact that it is mutated mid-run and is sourced differently across runs. The stable,
run-invariant id already exists: the original client thread id, captured before the rewrite into base_metadata["ag_ui_thread_id"] (and listed in AG_UI_INTERNAL_METADATA_KEYS).
Registry usage surface (blast radius)
#
Site
Operation
Keyed by
1
_agent.pyself._pending_approvals
allocation (one per agent, persists across runs)
—
2
_agent.pyrun_agent_stream(pending_approvals=…)
pass by reference
—
3
_agent_run.py registration
write (only write site)
reassigned thread_id (=conv_id)
4
_agent_run.py_evict_oldest_approvals
LRU eviction (popitem(last=False))
oldest key
5
_agent_run.py_resolve_approval_responses
membership check in
incoming thread_id
6
_agent_run.py_resolve_approval_responses
read + validate (name + canonical args)
incoming thread_id
7
_agent_run.py_resolve_approval_responses
consume del
incoming thread_id
_pending_approvals is private; there are no external readers or serialization. The dual-key change touches
only sites 3–7 plus the entry TypedDict.
Solution analysis
The registry only works when the key it is registered under (pause run) equals the key the client sends back
(resume run). Clients fall into two families:
Case A — pins its own thread id across turns (CopilotKit sends 95dd2877…).
Case B — echoes the server-advertised conv_id (the AG-UI reference client's documented direct pattern: thread_id = response.additional_properties.get("thread_id") → resend).
Resolution already keys off the incoming id; only registration uses the reassigned conv_id.
Backend
Client
register key
resolve key
Today
Option 1: register under client id
Option 2: request-id only
Option 3: dual-key
Stateless
any
client id (no reassignment)
client id
✅
✅
✅
✅
Stateful
A: pins own id
conv_id
client id
❌
✅
✅
✅
Stateful
B: echoes conv_id
conv_id
conv_id
✅
❌ breaks
✅
✅
Option 1 — register under the stable client id (ag_ui_thread_id). Fixes Case A but breaks Case B
(including the framework's own reference client) on stateful backends. Rejected.
Option 2 — drop the thread prefix; key by request_id (call id) only, relying on the existing
name + canonical-argument validation. Works everywhere, minimal, but discards the per-thread scoping the
maintainers added as defense-in-depth. Viable fallback, but weakens isolation.
Option 3 — dual-key: one entry registered under BOTH the client id and the conv_id. Works across all
combinations, preserves thread scoping and name/args validation, and collapses to a single key in the
stateless case. Requires the entry to carry its own key list so consume/eviction can purge all aliases.
Stateless LLMs are unaffected by the bug (no reassignment → both sites use the client id) and remain correct
under Options 2 and 3.
Proposed solution — thread-safe dual-key registry
Replace the bare shared OrderedDict with a small PendingApprovalRegistry container that (a) registers each
pending approval as one entry referenced by up to two keys, (b) makes the entry self-describing so
consumption and eviction remove all of its aliases, and (c) guards every compound operation with a threading.Lock so it is safe under concurrent runs on the same agent instance.
Extend the entry TypedDict: _PendingApproval = {name, arguments, keys: list[str]}.
PendingApprovalRegistry wraps an OrderedDict[str, _PendingApproval] plus a threading.Lock. It
exposes three atomic operations plus read-only dict-like dunders (__iter__, __contains__, __len__, __getitem__) for introspection, and a compatibility __setitem__ that wraps a legacy bare-str value
into a single-key entry:
register(keys, name, arguments) — dedupe keys, store the same entry object under each, then evict.
consume(key, name, arguments) -> (status, entry) — membership + name + canonical-argument validation and removal of all sibling keys, performed atomically under the lock. status is one of ok | missing | name_mismatch | arguments_mismatch.
entry-aware LRU eviction — when trimming to max_size, pop the oldest entry and all its alias keys.
Register (site 3): compute both candidate keys and hand them to the registry:
client_key=f"{client_thread_id}:{content.id}"# stable client id (captured pre-reassignment)conv_key=f"{thread_id}:{content.id}"# reassigned conv_id (post-reassignment)pending_approvals.register([client_key, conv_key], name, canonical_args) # dedupes when stateless
Resolve/consume (sites 5–7): one atomic call replaces the previous membership-check → validate → del
sequence:
status, entry=pending_approvals.consume(f"{thread_id}:{resp_id}", resp_name, response_arguments)
# log + strip on missing/name_mismatch/arguments_mismatch; accept on ok
Validation semantics unchanged (name + canonical arguments), preserving anti-spoof/replay guarantees;
they now execute inside the lock so validate-then-consume cannot race another run.
Properties:
Case A: resolve under client id → hits client_key. ✅
Case B: resolve under conv_id → hits conv_key. ✅
Stateless:client_key == conv_key → single key, identical to today. ✅
Replay protection:consume removes all sibling keys atomically → single-use even under concurrency.
Thread safety: every read/insert/delete/evict runs under one threading.Lock; no await is held inside
the lock, so it is safe for both concurrent asyncio tasks and true OS threads sharing the agent instance.
str legacy variant: the compatibility __setitem__ wraps a bare-str value into a {name, keys}
entry, so existing direct-assignment call sites keep working.
Related: snapshot-store keying (same root, separate structure)
The AG-UI thread snapshot store exhibits the samethread_id divergence but in a different structure:
Save uses the post-reassignment thread_id (conv_id): _save_thread_snapshot(…, thread_id=thread_id).
Hydrate/get uses the incoming thread_id: config.snapshot_store.get(…, thread_id=thread_id) and _hydrate_thread_snapshot(…) — both before the stream reassigns it.
Consequences:
A Case A client (pins its own id) that relies on the server snapshot store would miss its own
snapshot across turns, for the same reason approvals miss.
It does not currently bite this deployment because CopilotKit replays history client-side, so the
server snapshot store is effectively unused here. It remains a latent inconsistency.
Recommended handling: fix consistently by keying the snapshot store on the stable client id
(ag_ui_thread_id) as well, or by explicitly documenting that the snapshot store requires clients to echo the
advertised conv_id.
Description
With CopilotKit and Foundry provider ...
When an approval-gated tool (e.g.
microsoft_docs_search,approval_mode="always_require") is surfaced as anAG-UI
confirm_changeshuman-in-the-loop card and the user clicks Approve, the backend rejects theapproval with:
The gated tool then never executes. Two symptoms follow from that single failure:
UI: the tool chip is stuck on "Running" forever (no result ever arrives).
Backend crash on the next model call:
The root cause is that the server-side pending-approval registry key is built from two different
thread_idvalues: the registration (during the pausing run) uses the provider's server-sideconversation id (
conv_id), while the resolution (during the resume run) uses the client-suppliedthread_id. With a server-side-stateful backend (Azure AI Foundry / OpenAI Responses API) plus a client thatpins its own thread id (CopilotKit), these two values diverge and the lookup always misses.
Code Sample
Error Messages / Stack Traces
Package Versions
agent-framework-ag-ui: 1.0.0rc6, agent-framework-core: 1.10.0, agent-framework-foundry: 1.10.0
Python Version
Python 3.12
Additional Context
Reproduction
aspire), provider = Foundry, modelgpt-5.4-mini.microsoft_docs_search.confirm_changescard.aspire logs:Rejected approval response id=… : no matching pending approval request400 … No tool output found for function call ….Key identifiers from the observed run
95dd2877-ea85-4e63-9eee-878c6f162759resp_02a1c1deeef7d5b2006a464e5927888197b1714203c4e1da6fcall_MPgkkd1maUPTs4ToqH4Pj7ja(microsoft_docs_search)The incoming request logged
Thread ID: 95dd2877…, butRUN_STARTEDadvertisedthread_id: resp_02a1…— direct evidence of the mid-stream reassignment.Preconditions that arm the trap
AgentFrameworkAgent(require_confirmation=True)turns each gated tool into aconfirm_changesHITL card.approval_mode="always_require").run_agent_streamsetsrun_kwargs["options"] = {"metadata": …, "store": True}(_agent_run.py
run_agent_stream),and each response carries its own
conversation_id.pending_approvals(anOrderedDicton theAgentFrameworkAgentinstance)is keyed
"{thread_id}:{request_id}"and persists across runs(_agent.py
self._pending_approvals).Step-by-step build-up
Run 1 — pause (request → approval card)
thread_id = 95dd2877….run_agent_streamsetsthread_id = input_data["thread_id"]→95dd2877…._resolve_approval_responses(…, thread_id=95dd2877…)runs — no approvals yet, no-op.overwritten:
phase='commentary'preamble, then parallel tool calls incl. the gatedmicrosoft_docs_search(call_MPgk…).function_approval_request; the registry entry is written using thenow-reassigned
thread_id:confirm_changescard (carryingfunction_call_id = call_MPgk…) is emitted; the run finishes,pausing for input.
Trap armed: the pending approval is stored under the service id
resp_02a1…:call_MPgk….User clicks Approve
95dd2877…, and includes theapproval response for
call_MPgk…(arole="user"turn carryingfunction_approval_response).Run 2 — resume (where it breaks)
run_agent_streamagain setsthread_id = 95dd2877…from the incoming request._resolve_approval_responses(…, thread_id=95dd2877…)builds the lookup key:no
function_call_outputis produced forcall_MPgk….The compounding defect → hard 400
_sanitize_tool_historysees thefunction_approval_responseforcall_MPgk…, removes it from the pending set assuming "the framework will execute it," and thereforedoes not inject a synthetic skip result for it
(_message_adapters.py).
The other, ungated calls that are still pending do receive
Tool execution skipped - user provided follow-up message.Neither produces an output for
call_MPgk….function_callitems but only 6function_call_outputitems — themissing one is exactly
call_MPgk….input, finds afunction_callwith no matching output,and returns
400 - No tool output found for function call call_MPgkkd1maUPTs4ToqH4Pj7ja, surfaced asChatClientException/BadRequestError.Diagram
Why the reassignment exists (it is intentional)
The
thread_id = conv_idrewrite is a deliberate conversation-continuity mechanism for stateful backends,not a mistake:
thread_id"(_client.py docstring). For a Responses-API
backend, the only handle that can resume the stored conversation is the provider's
conversation_id, so theserver rewrites
thread_idtoconv_idand advertises it inRUN_STARTED.conversation_idin the first streamed update, so the rewrite must happenmid-stream (hence "emit RunStarted after first update to get service IDs").
thread_idalso keys the snapshot store and theRUN_FINISHEDevent,aligning one durable id across client thread, provider conversation, and server snapshot.
The defect is that the pending-approval registry reused this lifecycle-sensitive variable without
accounting for the fact that it is mutated mid-run and is sourced differently across runs. The stable,
run-invariant id already exists: the original client thread id, captured before the rewrite into
base_metadata["ag_ui_thread_id"](and listed inAG_UI_INTERNAL_METADATA_KEYS).Registry usage surface (blast radius)
_agent.pyself._pending_approvals_agent.pyrun_agent_stream(pending_approvals=…)_agent_run.pyregistrationthread_id(=conv_id)_agent_run.py_evict_oldest_approvalspopitem(last=False))_agent_run.py_resolve_approval_responsesinthread_id_agent_run.py_resolve_approval_responsesthread_id_agent_run.py_resolve_approval_responsesdelthread_id_pending_approvalsis private; there are no external readers or serialization. The dual-key change touchesonly sites 3–7 plus the entry TypedDict.
Solution analysis
The registry only works when the key it is registered under (pause run) equals the key the client sends back
(resume run). Clients fall into two families:
95dd2877…).conv_id(the AG-UI reference client's documented direct pattern:thread_id = response.additional_properties.get("thread_id")→ resend).Resolution already keys off the incoming id; only registration uses the reassigned
conv_id.conv_idconv_idconv_idconv_idag_ui_thread_id). Fixes Case A but breaks Case B(including the framework's own reference client) on stateful backends. Rejected.
request_id(call id) only, relying on the existingname + canonical-argument validation. Works everywhere, minimal, but discards the per-thread scoping the
maintainers added as defense-in-depth. Viable fallback, but weakens isolation.
conv_id. Works across allcombinations, preserves thread scoping and name/args validation, and collapses to a single key in the
stateless case. Requires the entry to carry its own key list so consume/eviction can purge all aliases.
Stateless LLMs are unaffected by the bug (no reassignment → both sites use the client id) and remain correct
under Options 2 and 3.
Proposed solution — thread-safe dual-key registry
Replace the bare shared
OrderedDictwith a smallPendingApprovalRegistrycontainer that (a) registers eachpending approval as one entry referenced by up to two keys, (b) makes the entry self-describing so
consumption and eviction remove all of its aliases, and (c) guards every compound operation with a
threading.Lockso it is safe under concurrent runs on the same agent instance._PendingApproval = {name, arguments, keys: list[str]}.PendingApprovalRegistrywraps anOrderedDict[str, _PendingApproval]plus athreading.Lock. Itexposes three atomic operations plus read-only dict-like dunders (
__iter__,__contains__,__len__,__getitem__) for introspection, and a compatibility__setitem__that wraps a legacy bare-strvalueinto a single-key entry:
register(keys, name, arguments)— dedupe keys, store the same entry object under each, then evict.consume(key, name, arguments) -> (status, entry)— membership + name + canonical-argument validationand removal of all sibling keys, performed atomically under the lock.
statusis one ofok | missing | name_mismatch | arguments_mismatch.max_size, pop the oldest entry and all its alias keys.delsequence:
they now execute inside the lock so validate-then-consume cannot race another run.
Properties:
client_key. ✅conv_id→ hitsconv_key. ✅client_key == conv_key→ single key, identical to today. ✅consumeremoves all sibling keys atomically → single-use even under concurrency.threading.Lock; noawaitis held insidethe lock, so it is safe for both concurrent asyncio tasks and true OS threads sharing the agent instance.
max_size = 10_000).strlegacy variant: the compatibility__setitem__wraps a bare-strvalue into a{name, keys}entry, so existing direct-assignment call sites keep working.
Related: snapshot-store keying (same root, separate structure)
The AG-UI thread snapshot store exhibits the same
thread_iddivergence but in a different structure:thread_id(conv_id):_save_thread_snapshot(…, thread_id=thread_id).thread_id:config.snapshot_store.get(…, thread_id=thread_id)and_hydrate_thread_snapshot(…)— both before the stream reassigns it.Consequences:
snapshot across turns, for the same reason approvals miss.
server snapshot store is effectively unused here. It remains a latent inconsistency.
Recommended handling: fix consistently by keying the snapshot store on the stable client id
(
ag_ui_thread_id) as well, or by explicitly documenting that the snapshot store requires clients to echo theadvertised
conv_id.