server: fix implicit </think> close — suppress DSML from streaming output#332
Open
TipKnuckle wants to merge 2 commits into
Open
server: fix implicit </think> close — suppress DSML from streaming output#332TipKnuckle wants to merge 2 commits into
TipKnuckle wants to merge 2 commits into
Conversation
When the model emits a DSML tool-call block without closing </think> first, the previous behavior silently discarded the tool call and returned finish=stop, leaving agent clients with no actionable output and a stalled session. The model is almost certainly intending a real tool call in this case — it just dropped the </think> closer, which happens more frequently at longer context depths. Treat the DSML start position as an implicit thinking boundary: the pre-DSML text becomes reasoning content and the tool call is parsed normally. When no DSML is present either (pure unclosed thinking), preserve the existing behavior of returning the output as reasoning only. Both cases now log a DS4_LOG_WARNING so the condition is visible in server output and --trace sessions. Refs antirez#167
When </think> is absent but a DSML tool block is present, the three live streaming paths (OpenAI, Anthropic, Responses) were emitting the raw DSML markers as reasoning content because thinking.inside remained true for the entire generation. Apply the same implicit-close logic already in parse_generated_message_ex to each streaming state machine: on final update, if no </think> but DSML is present, clip the reasoning limit before the DSML start and transition to TEXT mode instead of SUPPRESS. The TEXT state then handles the tool block normally. For the Responses path, set reasoning_closed_naturally so the reasoning item is marked completed rather than incomplete.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #318.
When the model emits a DSML tool call without closing
</think>first, the original fix (#500c7df) correctly parses the tool call inparse_generated_message_exbut the three live streaming paths (OpenAI, Anthropic, Responses) still had a problem: sincethinking.insidestays true for the whole generation, the raw DSML markers get streamed to the client as reasoning content. Clients like pi and opencode end up displaying the tool call XML in the thinking block.This commit applies the same implicit-close logic to each streaming state machine. On the final update, if there's no
</think>but a DSML block is present, the reasoning is clipped before the DSML start and the state transitions to TEXT mode instead of SUPPRESS — same path as a normal</think>. For the Responses path,reasoning_closed_naturallyis also set so the reasoning item comes back as"completed"rather than"incomplete".The two commits together cover both streaming and non-streaming clients. Tested on pi and opencode (Anthropic endpoint) against the q2-imatrix on M4 Max 128GB — the implicit close case now produces a clean thinking block with no leaking DSML, and the tool call executes normally. fry69 has been running the first commit for 2+ hours without regressions (see issue).
Changes are limited to the three
STREAM_THINKINGbranches inds4_server.c, 23 lines total.