Skip to content

Fix(chat): resolve pre-tool text accumulation in stream_response#439

Open
Jean-Regis-M wants to merge 1 commit intoGenAI-Security-Project:mainfrom
Jean-Regis-M:patch-40
Open

Fix(chat): resolve pre-tool text accumulation in stream_response#439
Jean-Regis-M wants to merge 1 commit intoGenAI-Security-Project:mainfrom
Jean-Regis-M:patch-40

Conversation

@Jean-Regis-M
Copy link
Copy Markdown
Contributor

Summary

Fixes #412

Pre-tool partial text streamed by the LLM in round 1 leaked into full_response and
appeared in the final saved response alongside the post-tool reply from round 2.


Problem

In ChatAssistantBase.stream_response, full_response was used as an unconditional
cross-round accumulator:

async for event in stream:
    if event.type == "response.output_text.delta":
        full_response += event.delta  # appended regardless of round outcome

When the LLM streamed partial text (e.g. "Checking now... ") before deciding to call
a tool, that text was permanently fused into full_response before the tool round began.
The terminal round then appended its reply (e.g. "Done."), producing a corrupted final
response: "Checking now... Done." instead of "Done.".

In a financial context, this is a meaningful defect; it can:

  • Expose internal workflow logic to vendors ("Let me look that up...")
  • Reveal that the agent consulted a database or external tool
  • Produce confusing, unprofessional output in vendor-facing chat

Root Cause

full_response served dual purpose as both a per-round buffer and the cross-round final
accumulator. The round outcome tool call vs. terminal was only known after the
stream completed, by which point the intermediate text had already been appended and could
not be selectively removed.


Solution

Introduce round_text as a per-round accumulator that is only committed to full_response
when the round terminates without a tool call:

round_text = ""

async for event in stream:
    if event.type == "response.output_text.delta":
        round_text += event.delta          # buffer current round only
        yield f"data: ..."                 # SSE stream to client unchanged

if not pending_tool_calls:
    full_response += round_text            # commit only on terminal round
    break

Intermediate round text is discarded. The SSE token stream yielded to the client is
not affected tokens still flow in real time.


Impact

  • No breaking changes
  • Minimal diff 2 lines added, 1 assignment repositioned
  • SSE output shape to the client is unchanged
  • Deterministic across all round counts (single round, multi-tool, max-round exhaustion)
  • Zero regression risk to existing tool execution or history persistence logic

Testing

# Acceptance test (must pass)
pytest tests/integration/agents/test_chat_layer3.py::TestL3QAFindings::test_chat_l3_qa_001_text_before_tool_call_leaks_into_final_response -v

# Full layer regression
pytest tests/integration/agents/test_chat_layer3.py -v

Before: full_response == "Checking now... Done."
After: full_response == "Done."

Root cause:
full_response accumulated text tokens unconditionally across all rounds,
including round 1 text emitted before a tool call was detected.

Solution:
Introduce round_text as a per-round accumulator. Commit to full_response
only when the round terminates without a tool call (terminal round).

Impact:
Pre-tool streamed text is discarded from the saved response. Post-tool
terminal text is preserved. SSE token stream to client is unchanged.
Deterministic, no side effects.

Signed-off-by: JEAN REGIS <240509606@firat.edu.tr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug_191_EVALUATE: CHAT-L3-QA-001 — Pre-tool partial text leaks into final response

1 participant