Skip to content

server: fix implicit </think> close — suppress DSML from streaming output#332

Open
TipKnuckle wants to merge 2 commits into
antirez:mainfrom
TipKnuckle:fix/implicit-think-close-on-dsml
Open

server: fix implicit </think> close — suppress DSML from streaming output#332
TipKnuckle wants to merge 2 commits into
antirez:mainfrom
TipKnuckle:fix/implicit-think-close-on-dsml

Conversation

@TipKnuckle
Copy link
Copy Markdown

Fixes #318.

When the model emits a DSML tool call without closing </think> first, the original fix (#500c7df) correctly parses the tool call in parse_generated_message_ex but the three live streaming paths (OpenAI, Anthropic, Responses) still had a problem: since thinking.inside stays true for the whole generation, the raw DSML markers get streamed to the client as reasoning content. Clients like pi and opencode end up displaying the tool call XML in the thinking block.

This commit applies the same implicit-close logic to each streaming state machine. On the final update, if there's no </think> but a DSML block is present, the reasoning is clipped before the DSML start and the state transitions to TEXT mode instead of SUPPRESS — same path as a normal </think>. For the Responses path, reasoning_closed_naturally is also set so the reasoning item comes back as "completed" rather than "incomplete".

The two commits together cover both streaming and non-streaming clients. Tested on pi and opencode (Anthropic endpoint) against the q2-imatrix on M4 Max 128GB — the implicit close case now produces a clean thinking block with no leaking DSML, and the tool call executes normally. fry69 has been running the first commit for 2+ hours without regressions (see issue).

Changes are limited to the three STREAM_THINKING branches in ds4_server.c, 23 lines total.

When the model emits a DSML tool-call block without closing </think> first,
the previous behavior silently discarded the tool call and returned finish=stop,
leaving agent clients with no actionable output and a stalled session.

The model is almost certainly intending a real tool call in this case — it just
dropped the </think> closer, which happens more frequently at longer context
depths. Treat the DSML start position as an implicit thinking boundary: the
pre-DSML text becomes reasoning content and the tool call is parsed normally.

When no DSML is present either (pure unclosed thinking), preserve the existing
behavior of returning the output as reasoning only.

Both cases now log a DS4_LOG_WARNING so the condition is visible in server
output and --trace sessions.

Refs antirez#167
When </think> is absent but a DSML tool block is present, the three
live streaming paths (OpenAI, Anthropic, Responses) were emitting the
raw DSML markers as reasoning content because thinking.inside remained
true for the entire generation.

Apply the same implicit-close logic already in parse_generated_message_ex
to each streaming state machine: on final update, if no </think> but DSML
is present, clip the reasoning limit before the DSML start and transition
to TEXT mode instead of SUPPRESS. The TEXT state then handles the tool
block normally. For the Responses path, set reasoning_closed_naturally so
the reasoning item is marked completed rather than incomplete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unclosed </think> before tool call stalls agent session (finish=stop, no tool call delivered)

1 participant