server: fix implicit </think> close — suppress DSML from streaming output by TipKnuckle · Pull Request #332 · antirez/ds4

TipKnuckle · 2026-06-03T02:36:47Z

Fixes #318.

When the model emits a DSML tool call without closing </think> first, the original fix (#500c7df) correctly parses the tool call in parse_generated_message_ex but the three live streaming paths (OpenAI, Anthropic, Responses) still had a problem: since thinking.inside stays true for the whole generation, the raw DSML markers get streamed to the client as reasoning content. Clients like pi and opencode end up displaying the tool call XML in the thinking block.

This commit applies the same implicit-close logic to each streaming state machine. On the final update, if there's no </think> but a DSML block is present, the reasoning is clipped before the DSML start and the state transitions to TEXT mode instead of SUPPRESS — same path as a normal </think>. For the Responses path, reasoning_closed_naturally is also set so the reasoning item comes back as "completed" rather than "incomplete".

The two commits together cover both streaming and non-streaming clients. Tested on pi and opencode (Anthropic endpoint) against the q2-imatrix on M4 Max 128GB — the implicit close case now produces a clean thinking block with no leaking DSML, and the tool call executes normally. fry69 has been running the first commit for 2+ hours without regressions (see issue).

Changes are limited to the three STREAM_THINKING branches in ds4_server.c, 23 lines total.

When the model emits a DSML tool-call block without closing </think> first, the previous behavior silently discarded the tool call and returned finish=stop, leaving agent clients with no actionable output and a stalled session. The model is almost certainly intending a real tool call in this case — it just dropped the </think> closer, which happens more frequently at longer context depths. Treat the DSML start position as an implicit thinking boundary: the pre-DSML text becomes reasoning content and the tool call is parsed normally. When no DSML is present either (pure unclosed thinking), preserve the existing behavior of returning the output as reasoning only. Both cases now log a DS4_LOG_WARNING so the condition is visible in server output and --trace sessions. Refs antirez#167

When </think> is absent but a DSML tool block is present, the three live streaming paths (OpenAI, Anthropic, Responses) were emitting the raw DSML markers as reasoning content because thinking.inside remained true for the entire generation. Apply the same implicit-close logic already in parse_generated_message_ex to each streaming state machine: on final update, if no </think> but DSML is present, clip the reasoning limit before the DSML start and transition to TEXT mode instead of SUPPRESS. The TEXT state then handles the tool block normally. For the Responses path, set reasoning_closed_naturally so the reasoning item is marked completed rather than incomplete.

TipKnuckle added 2 commits May 31, 2026 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: fix implicit </think> close — suppress DSML from streaming output#332

server: fix implicit </think> close — suppress DSML from streaming output#332
TipKnuckle wants to merge 2 commits into
antirez:mainfrom
TipKnuckle:fix/implicit-think-close-on-dsml

TipKnuckle commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TipKnuckle commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant