Python: [Bug]: AG-UI: 'No tool output found' on Foundry provider

### Description

**With CopilotKit and Foundry provider ...**

When an approval-gated tool (e.g. `microsoft_docs_search`, `approval_mode="always_require"`) is surfaced as an
AG-UI `confirm_changes` human-in-the-loop card and the user clicks **Approve**, the backend **rejects the
approval** with:

```
WARNING:agent_framework_ag_ui._agent_run:Rejected approval response id=call_MPgkkd1maUPTs4ToqH4Pj7ja:
    no matching pending approval request
```

The gated tool then never executes. Two symptoms follow from that single failure:

1. **UI:** the tool chip is stuck on **"Running"** forever (no result ever arrives).
2. **Backend crash on the next model call:**

   ```
   agent_framework.exceptions.ChatClientException: FoundryChatClient service failed to complete the prompt:
   Error code: 400 - {'error': {'message': 'No tool output found for function call
   call_MPgkkd1maUPTs4ToqH4Pj7ja.', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}
   ```

The root cause is that the **server-side pending-approval registry key is built from two different
`thread_id` values**: the registration (during the pausing run) uses the provider's *server-side
conversation id* (`conv_id`), while the resolution (during the resume run) uses the *client-supplied*
`thread_id`. With a server-side-stateful backend (Azure AI Foundry / OpenAI Responses API) plus a client that
pins its own thread id (CopilotKit), these two values diverge and the lookup always misses.



### Code Sample

```markdown

```

### Error Messages / Stack Traces

```markdown

```

### Package Versions

agent-framework-ag-ui: 1.0.0rc6, agent-framework-core: 1.10.0, agent-framework-foundry: 1.10.0

### Python Version

Python 3.12

### Additional Context

## Reproduction

1. Run the full stack (`aspire`), provider = Foundry, model `gpt-5.4-mini`.
2. In the web UI send: *"I want a landing zone for my AI app like Microsoft Learn recommends"*.
3. The model emits a preamble + parallel tool calls, including approval-gated `microsoft_docs_search`.
4. Click **Approve** on the `confirm_changes` card.
5. Observe `aspire logs`:
   - `Rejected approval response id=… : no matching pending approval request`
   - the chip stays "Running"
   - a subsequent request fails with `400 … No tool output found for function call …`.

## Key identifiers from the observed run

| Role | Value |
|---|---|
| Client (CopilotKit) thread id | `95dd2877-ea85-4e63-9eee-878c6f162759` |
| Foundry service conversation id (run 1) | `resp_02a1c1deeef7d5b2006a464e5927888197b1714203c4e1da6f` |
| Gated tool call id | `call_MPgkkd1maUPTs4ToqH4Pj7ja` (`microsoft_docs_search`) |

The incoming request logged `Thread ID: 95dd2877…`, but `RUN_STARTED` advertised
`thread_id: resp_02a1…` — direct evidence of the mid-stream reassignment.

## Preconditions that arm the trap

1. `AgentFrameworkAgent(require_confirmation=True)` turns each gated tool into a `confirm_changes` HITL card.
2. Several tools are gated (`approval_mode="always_require"`).
3. The Foundry Responses API is **server-side stateful**: `run_agent_stream` sets
   `run_kwargs["options"] = {"metadata": …, "store": True}`
   ([_agent_run.py](../../.venv/Lib/site-packages/agent_framework_ag_ui/_agent_run.py) `run_agent_stream`),
   and each response carries its own `conversation_id`.
4. A single in-memory registry `pending_approvals` (an `OrderedDict` on the `AgentFrameworkAgent` instance)
   is keyed `"{thread_id}:{request_id}"` and persists across runs
   ([_agent.py](../../.venv/Lib/site-packages/agent_framework_ag_ui/_agent.py) `self._pending_approvals`).

## Step-by-step build-up

### Run 1 — pause (request → approval card)

1. UI sends the first request with stable `thread_id = 95dd2877…`.
2. `run_agent_stream` sets `thread_id = input_data["thread_id"]` → `95dd2877…`.
3. `_resolve_approval_responses(…, thread_id=95dd2877…)` runs — no approvals yet, no-op.
4. The stream starts. On the **first update** the provider's conversation id is read and the variable is
   **overwritten**:
   ```python
   conv_id = get_conversation_id_from_update(update)   # resp_02a1…
   if conv_id:
       thread_id = conv_id                              # 95dd2877… → resp_02a1…
   # NOW emit RunStarted with proper IDs
   yield RunStartedEvent(run_id=run_id, thread_id=thread_id)
   ```
5. The model streams a `phase='commentary'` preamble, then parallel tool calls incl. the gated
   `microsoft_docs_search` (`call_MPgk…`).
6. The gated call surfaces as a `function_approval_request`; the registry entry is written **using the
   now-reassigned `thread_id`**:
   ```python
   pending_approvals[f"{thread_id}:{content.id}"] = …   # key = "resp_02a1…:call_MPgk…"
   ```
7. The `confirm_changes` card (carrying `function_call_id = call_MPgk…`) is emitted; the run finishes,
   pausing for input.

**Trap armed:** the pending approval is stored under the **service** id `resp_02a1…:call_MPgk…`.

### User clicks Approve

8. CopilotKit resumes with a **new request**, using its **stable** thread id `95dd2877…`, and includes the
   approval response for `call_MPgk…` (a `role="user"` turn carrying `function_approval_response`).

### Run 2 — resume (where it breaks)

9. `run_agent_stream` again sets `thread_id = 95dd2877…` from the incoming request.
10. **Before** the stream starts, `_resolve_approval_responses(…, thread_id=95dd2877…)` builds the lookup key:
    ```python
    registry_key = f"{thread_id}:{resp_id}"     # "95dd2877…:call_MPgk…"
    if registry_key not in pending_approvals:    # registered key was "resp_02a1…:call_MPgk…"
        logger.warning("Rejected approval response id=%s: no matching pending approval request", resp_id)
    ```
11. **Key mismatch** → the approval is rejected and **stripped** from messages. The tool is **never executed**;
    **no `function_call_output`** is produced for `call_MPgk…`.

### The compounding defect → hard 400

12. Separately, the history sanitizer `_sanitize_tool_history` sees the `function_approval_response` for
    `call_MPgk…`, **removes it from the pending set** assuming "the framework will execute it," and therefore
    **does not inject a synthetic skip result** for it
    ([_message_adapters.py](../../.venv/Lib/site-packages/agent_framework_ag_ui/_message_adapters.py)).
    The other, ungated calls that are still pending *do* receive
    `Tool execution skipped - user provided follow-up message`.
13. The two subsystems disagree:
    - sanitizer: "don't add a result — it will be executed";
    - approval validator: "rejected — I will not execute it".
    Neither produces an output for `call_MPgk…`.
14. The outbound transcript now has **7 `function_call` items but only 6 `function_call_output` items** — the
    missing one is exactly `call_MPgk…`.
15. The Foundry Responses API validates the request `input`, finds a `function_call` with no matching output,
    and returns `400 - No tool output found for function call call_MPgkkd1maUPTs4ToqH4Pj7ja`, surfaced as
    `ChatClientException` / `BadRequestError`.

### Diagram

```mermaid
sequenceDiagram
  autonumber
  participant UI as CopilotKit UI
  participant RUN as run_agent_stream
  participant REG as pending_approvals
  participant LLM as Foundry Responses API

  rect rgb(245,245,255)
  Note over UI,LLM: RUN 1 — pause
  UI->>RUN: POST / (thread_id=95dd2877…, "landing zone…")
  Note over RUN: thread_id = 95dd2877…
  RUN->>RUN: _resolve_approval_responses (nothing pending)
  RUN->>LLM: agent.run(stream=True, store=True)
  LLM-->>RUN: first update → conv_id = resp_02a1…
  Note over RUN: thread_id = conv_id ⇒ resp_02a1…
  LLM-->>RUN: commentary preamble + gated microsoft_docs_search(call_MPgk…)
  RUN->>REG: register "resp_02a1…:call_MPgk…"
  RUN-->>UI: confirm_changes(function_call_id=call_MPgk…) + RUN_FINISHED
  end

  rect rgb(255,245,245)
  Note over UI,LLM: RUN 2 — resume (Approve)
  UI->>RUN: POST / (thread_id=95dd2877…, approval for call_MPgk…)
  Note over RUN: thread_id = 95dd2877… (pre-stream)
  RUN->>REG: lookup "95dd2877…:call_MPgk…"
  REG-->>RUN: MISS (registered under resp_02a1…)
  Note over RUN: reject + strip approval ⇒ tool NOT executed
  RUN->>RUN: sanitizer skipped injecting a result (assumed execution)
  Note over RUN: transcript has function_call call_MPgk… with NO output
  RUN->>LLM: next request (7 calls, 6 outputs)
  LLM-->>RUN: 400 No tool output found for call_MPgk…
  RUN-->>UI: ChatClientException (chip stuck "Running")
  end
```

## Why the reassignment exists (it is intentional)

The `thread_id = conv_id` rewrite is a deliberate **conversation-continuity** mechanism for stateful backends,
not a mistake:

- The AG-UI client contract is *"the server must handle history via `thread_id`"*
  ([_client.py](../../.venv/Lib/site-packages/agent_framework_ag_ui/_client.py) docstring). For a Responses-API
  backend, the only handle that can resume the stored conversation is the provider's `conversation_id`, so the
  server rewrites `thread_id` to `conv_id` and advertises it in `RUN_STARTED`.
- The provider only reveals its `conversation_id` in the **first streamed update**, so the rewrite must happen
  mid-stream (hence "emit RunStarted after first update to get service IDs").
- After the rewrite, the same `thread_id` also keys the **snapshot store** and the `RUN_FINISHED` event,
  aligning one durable id across client thread, provider conversation, and server snapshot.

The defect is that the **pending-approval registry reused this lifecycle-sensitive variable** without
accounting for the fact that it is mutated mid-run and is sourced differently across runs. The stable,
run-invariant id already exists: the original client thread id, captured **before** the rewrite into
`base_metadata["ag_ui_thread_id"]` (and listed in `AG_UI_INTERNAL_METADATA_KEYS`).

## Registry usage surface (blast radius)

| # | Site | Operation | Keyed by |
|---|---|---|---|
| 1 | `_agent.py` `self._pending_approvals` | allocation (one per agent, persists across runs) | — |
| 2 | `_agent.py` `run_agent_stream(pending_approvals=…)` | pass by reference | — |
| 3 | `_agent_run.py` registration | **write** (only write site) | reassigned `thread_id` (=`conv_id`) |
| 4 | `_agent_run.py` `_evict_oldest_approvals` | LRU eviction (`popitem(last=False)`) | oldest key |
| 5 | `_agent_run.py` `_resolve_approval_responses` | membership check `in` | incoming `thread_id` |
| 6 | `_agent_run.py` `_resolve_approval_responses` | read + validate (name + canonical args) | incoming `thread_id` |
| 7 | `_agent_run.py` `_resolve_approval_responses` | consume `del` | incoming `thread_id` |

`_pending_approvals` is private; there are no external readers or serialization. The dual-key change touches
only sites 3–7 plus the entry TypedDict.

## Solution analysis

The registry only works when *the key it is registered under (pause run)* equals *the key the client sends back
(resume run)*. Clients fall into two families:

- **Case A — pins its own thread id** across turns (CopilotKit sends `95dd2877…`).
- **Case B — echoes the server-advertised `conv_id`** (the AG-UI reference client's documented direct pattern:
  `thread_id = response.additional_properties.get("thread_id")` → resend).

Resolution already keys off the **incoming** id; only registration uses the reassigned `conv_id`.

| Backend | Client | register key | resolve key | Today | Option 1: register under client id | Option 2: request-id only | Option 3: dual-key |
|---|---|---|---|---|---|---|---|
| Stateless | any | client id (no reassignment) | client id | ✅ | ✅ | ✅ | ✅ |
| Stateful | A: pins own id | `conv_id` | client id | ❌ | ✅ | ✅ | ✅ |
| Stateful | B: echoes `conv_id` | `conv_id` | `conv_id` | ✅ | ❌ **breaks** | ✅ | ✅ |

- **Option 1 — register under the stable client id (`ag_ui_thread_id`).** Fixes Case A but **breaks Case B**
  (including the framework's own reference client) on stateful backends. Rejected.
- **Option 2 — drop the thread prefix; key by `request_id` (call id) only**, relying on the existing
  name + canonical-argument validation. Works everywhere, minimal, but **discards the per-thread scoping** the
  maintainers added as defense-in-depth. Viable fallback, but weakens isolation.
- **Option 3 — dual-key: one entry registered under BOTH the client id and the `conv_id`.** Works across all
  combinations, preserves thread scoping and name/args validation, and collapses to a single key in the
  stateless case. Requires the entry to carry its own key list so consume/eviction can purge all aliases.

Stateless LLMs are unaffected by the bug (no reassignment → both sites use the client id) and remain correct
under Options 2 and 3.

## Proposed solution — thread-safe dual-key registry

Replace the bare shared `OrderedDict` with a small `PendingApprovalRegistry` container that (a) registers each
pending approval as **one entry referenced by up to two keys**, (b) makes the entry self-describing so
consumption and eviction remove all of its aliases, and (c) guards every compound operation with a
`threading.Lock` so it is safe under concurrent runs on the same agent instance.

1. Extend the entry TypedDict: `_PendingApproval = {name, arguments, keys: list[str]}`.
2. **`PendingApprovalRegistry`** wraps an `OrderedDict[str, _PendingApproval]` plus a `threading.Lock`. It
   exposes three atomic operations plus read-only dict-like dunders (`__iter__`, `__contains__`, `__len__`,
   `__getitem__`) for introspection, and a compatibility `__setitem__` that wraps a legacy bare-`str` value
   into a single-key entry:
   - `register(keys, name, arguments)` — dedupe keys, store the **same** entry object under each, then evict.
   - `consume(key, name, arguments) -> (status, entry)` — membership + name + canonical-argument validation
     **and** removal of all sibling keys, performed atomically under the lock. `status` is one of
     `ok | missing | name_mismatch | arguments_mismatch`.
   - entry-aware LRU eviction — when trimming to `max_size`, pop the oldest entry **and all its alias keys**.
3. **Register (site 3):** compute both candidate keys and hand them to the registry:
   ```python
   client_key = f"{client_thread_id}:{content.id}"   # stable client id (captured pre-reassignment)
   conv_key   = f"{thread_id}:{content.id}"           # reassigned conv_id (post-reassignment)
   pending_approvals.register([client_key, conv_key], name, canonical_args)  # dedupes when stateless
   ```
4. **Resolve/consume (sites 5–7):** one atomic call replaces the previous membership-check → validate → `del`
   sequence:
   ```python
   status, entry = pending_approvals.consume(f"{thread_id}:{resp_id}", resp_name, response_arguments)
   # log + strip on missing/name_mismatch/arguments_mismatch; accept on ok
   ```
5. **Validation semantics unchanged** (name + canonical arguments), preserving anti-spoof/replay guarantees;
   they now execute *inside* the lock so validate-then-consume cannot race another run.

Properties:

- **Case A:** resolve under client id → hits `client_key`. ✅
- **Case B:** resolve under `conv_id` → hits `conv_key`. ✅
- **Stateless:** `client_key == conv_key` → single key, identical to today. ✅
- **Replay protection:** `consume` removes all sibling keys atomically → single-use even under concurrency.
- **Thread safety:** every read/insert/delete/evict runs under one `threading.Lock`; no `await` is held inside
  the lock, so it is safe for both concurrent asyncio tasks and true OS threads sharing the agent instance.
- **Memory:** ~2 keys/entry, bounded by entry-aware LRU eviction (`max_size = 10_000`).
- **`str` legacy variant:** the compatibility `__setitem__` wraps a bare-`str` value into a `{name, keys}`
  entry, so existing direct-assignment call sites keep working.

## Related: snapshot-store keying (same root, separate structure)

The AG-UI thread **snapshot store** exhibits the *same* `thread_id` divergence but in a different structure:

- **Save** uses the post-reassignment `thread_id` (`conv_id`): `_save_thread_snapshot(…, thread_id=thread_id)`.
- **Hydrate/get** uses the incoming `thread_id`: `config.snapshot_store.get(…, thread_id=thread_id)` and
  `_hydrate_thread_snapshot(…)` — both before the stream reassigns it.

Consequences:

- A **Case A** client (pins its own id) that relies on the server snapshot store would **miss its own
  snapshot** across turns, for the same reason approvals miss.
- It does **not** currently bite this deployment because CopilotKit replays history **client-side**, so the
  server snapshot store is effectively unused here. It remains a latent inconsistency.

Recommended handling: fix consistently by keying the snapshot store on the stable client id
(`ag_ui_thread_id`) as well, or by explicitly documenting that the snapshot store requires clients to echo the
advertised `conv_id`.

#	Site	Operation	Keyed by
1	`_agent.py` `self._pending_approvals`	allocation (one per agent, persists across runs)	—
2	`_agent.py` `run_agent_stream(pending_approvals=…)`	pass by reference	—
3	`_agent_run.py` registration	write (only write site)	reassigned `thread_id` (=`conv_id`)
4	`_agent_run.py` `_evict_oldest_approvals`	LRU eviction (`popitem(last=False)`)	oldest key
5	`_agent_run.py` `_resolve_approval_responses`	membership check `in`	incoming `thread_id`
6	`_agent_run.py` `_resolve_approval_responses`	read + validate (name + canonical args)	incoming `thread_id`
7	`_agent_run.py` `_resolve_approval_responses`	consume `del`	incoming `thread_id`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python: [Bug]: AG-UI: 'No tool output found' on Foundry provider #6894

Description

Code Sample

Error Messages / Stack Traces

Package Versions

Python Version

Additional Context

Reproduction

Key identifiers from the observed run

Preconditions that arm the trap

Step-by-step build-up

Run 1 — pause (request → approval card)

User clicks Approve

Run 2 — resume (where it breaks)

The compounding defect → hard 400

Diagram

Why the reassignment exists (it is intentional)

Registry usage surface (blast radius)

Solution analysis

Proposed solution — thread-safe dual-key registry

Related: snapshot-store keying (same root, separate structure)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Role	Value
Client (CopilotKit) thread id	`95dd2877-ea85-4e63-9eee-878c6f162759`
Foundry service conversation id (run 1)	`resp_02a1c1deeef7d5b2006a464e5927888197b1714203c4e1da6f`
Gated tool call id	`call_MPgkkd1maUPTs4ToqH4Pj7ja` (`microsoft_docs_search`)

Backend	Client	register key	resolve key	Today	Option 1: register under client id	Option 2: request-id only	Option 3: dual-key
Stateless	any	client id (no reassignment)	client id	✅	✅	✅	✅
Stateful	A: pins own id	`conv_id`	client id	❌	✅	✅	✅
Stateful	B: echoes `conv_id`	`conv_id`	`conv_id`	✅	❌ breaks	✅	✅

Uh oh!

Python: [Bug]: AG-UI: 'No tool output found' on Foundry provider #6894

Description

Description

Code Sample

Error Messages / Stack Traces

Package Versions

Python Version

Additional Context

Reproduction

Key identifiers from the observed run

Preconditions that arm the trap

Step-by-step build-up

Run 1 — pause (request → approval card)

User clicks Approve

Run 2 — resume (where it breaks)

The compounding defect → hard 400

Diagram

Why the reassignment exists (it is intentional)

Registry usage surface (blast radius)

Solution analysis

Proposed solution — thread-safe dual-key registry

Related: snapshot-store keying (same root, separate structure)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions