Skip to content

.NET: Python: [Bug]: AG-UI appends approval-resolved tool results out of order — invalid history for strict chat providers #6909

Description

@antsok

Description

1. Summary

When a run resumes after a tool-approval response, agent_framework_ag_ui executes the approved call and appends its result message to the end of the reconstructed history instead of seating it directly after the assistant message that contains the matching tool_calls entry. Because the thread snapshot stores the assistant's streamed text as a separate message after the tool-calls message (the deliberate format from issue #3619), the appended result lands after that text:

assistant: tool_calls [...]
assistant: (streamed text)          <- snapshot's split-off text message
tool:      approved-call result     <- appended at the end by the resume path

OpenAI-style chat APIs validate that every role: "tool" message immediately follows the assistant message carrying its tool_call_id. Strict providers (OpenAI, Azure OpenAI / Foundry) reject such a payload with 400 invalid_request_error; lenient providers (Ollama) accept it silently, which is why the defect can go unnoticed until the provider is swapped.

2. Symptom — observed model payload

Live capture (2026-07-04, harness agent + Microsoft Learn MCP over AG-UI/CopilotKit, ENABLE_SENSITIVE_DATA logging, Ollama provider). The model-bound request assembled by the resume run after approving microsoft_docs_fetch:

user:      I want a landing zone for my AI app like Microsoft Learn recommends
assistant: tool_calls [load_skill, microsoft_docs_search, caf_methodologies]
tool:      caf_methodologies       -> {real result}
tool:      load_skill              -> {real result}
tool:      microsoft_docs_search   -> {real result}
assistant: "I'll help you design an Azure landing zone ..."
assistant: tool_calls [microsoft_docs_fetch, waf_pillars]
tool:      waf_pillars             -> {real result}
assistant: "Great — I've got the foundational Microsoft guidance ..."  <- split text
tool:      microsoft_docs_fetch    -> {real result}                    <- out of order

The microsoft_docs_fetch result — the call the user just approved — is separated from its tool_calls message by an assistant text message. On OpenAI/Azure OpenAI the next model call fails with the "messages with role 'tool' must be a response to a preceding message with 'tool_calls'" validation class; on Ollama the request succeeds with degraded message ordering.

Code Sample

Error Messages / Stack Traces

## 3. Reproduction

Parallel calls are **not** required; a single gated call plus streamed text suffices.

1. Host any agent over AG-UI (`add_agent_framework_fastapi_endpoint`, `require_confirmation=True`) with a thread snapshot store configured, against a strict provider (OpenAI or Azure OpenAI).
2. Send a prompt that makes the model stream some text **and** issue one approval-gated tool call in the same turn.
3. Approve the call. The resume run executes it and appends the result to the end of the history; the persisted snapshot now reads `assistant(tool_calls) → assistant(text) → tool(result)`.
4. The resumed run's next model call (or the next user turn) sends that history → `400 invalid_request_error` on strict providers.

Package Versions

agent-framework-ag-ui: 1.0.0rc7, agent-framework-core: 1.10.0

Python Version

Python 3.12

Additional Context

4. Root cause

Two individually-reasonable behaviors compose into an invalid message sequence:

  1. Snapshot text splitting. _build_messages_snapshot (_agent_run.py) stores the assistant's tool calls and its streamed text as two separate messages, text after calls — intentional, per issue #3619, so the AG-UI client renders them distinctly.
  2. Append-at-end result placement. The approval-resume path (_resolve_approval_responses and the message assembly around it in run_agent_stream) executes the approved call and appends the resulting tool message to the end of the message list, with no attempt to locate the originating tool_calls message.

Either behavior alone is harmless: without the split, appending lands the result directly after the calls message; with correct seating, the split text is legal. Together they produce a tool message that does not follow its tool_calls message.

sequenceDiagram
    participant C as Client (approval resume request)
    participant T as AG-UI transport
    participant M as Model payload (next call)

    C->>T: history from snapshot - assistant(tool_calls fetch+waf), tool(waf result), assistant(text)
    Note over C,T: the snapshot stores the assistant text as a SEPARATE message after the tool-calls message (issue 3619 format)
    T->>T: _resolve_approval_responses executes the approved fetch call
    T->>T: appends tool(fetch result) at the END of the history
    T->>M: assistant(tool_calls), tool(waf), assistant(text), tool(fetch)
    Note over M: strict providers reject with 400 - a tool message must immediately follow its tool_calls message
Loading

5. Proposed fix

In the approval-resume message assembly, insert each resolved result message immediately after the assistant message containing the matching function_call (and after any sibling tool messages already seated there), instead of appending to the end. Results whose originating call message is not present in the history keep the current append behavior.

Equivalent alternative: a single normalization pass over the outbound message list (mirroring where _sanitize_tool_history already runs) that re-seats any tool-result message separated from its call message. This also heals histories that arrive mis-ordered from older persisted snapshots.

Both variants are local to agent-framework-ag-ui message assembly, change no events on the wire, and compose with (but do not depend on) the state-loss fix branch.

6. Impact on non-AG-UI hosts

None. The defective assembly is internal to the AG-UI package's approval-resume path; console, DevUI, and headless hosting build model payloads through the normal function-invocation loop, which seats results correctly.

7. Related upstream issues (not duplicates)

Ref State Relation
#5941 open Multi-turn tool calls 400 on Foundry via AG-UI — results not paired correctly on replay. Same failure class (strict-provider validation), different mechanism (missing/unpaired vs. present-but-mis-ordered). Plausibly the same fix locus; cross-link
#5855 closed Replay produced tool_calls without matching tool messages; fixed via synthetic-result injection (_sanitize_tool_history). The sanitizer checks presence, not position, so it does not catch this defect
#2699 open .NET: AG-UI multi-turn replay produces invalid OpenAI tool_call history — .NET twin of the family; the .NET port should seat results positionally, not just pair them
#3619 closed Origin of the snapshot's split text-message format — the enabling layout (not itself a bug)
#6266 open MessagesSnapshotEvent splits/reassigns streamed text on mixed turns — adjacent snapshot-layout concern
#5600 open .NET GroupChat approval flow → HTTP 400 "tool message without a preceding 'tool_calls'" — the same validation class triggered by a different approval path

Metadata

Metadata

Assignees

No one assigned

    Labels

    .NETUsage: [Issues, PRs], Target: .NetpythonUsage: [Issues, PRs], Target: PythonreproducedUsage: [Issues], Target: all issues that can be reproduced by the triage workflowtriageUsage: [Issues], Target: All issues that still need to be triaged

    Type

    Fields

    No fields configured for Bug.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions