fix(parsing): strip trailing EOS token from body_text after tool/reasoning parsing for code agent rl by none0663 · Pull Request #2049 · THUDM/slime

none0663 · 2026-06-10T10:06:37Z

Problem

parse_model_output leaks <|im_end|> into the text field in two cases:

Tool turns with tool_parser=None (XML regex fallback): the tail after
</tool_call> is preserved as-is, so <|im_end|> ends up in text
Text-only turns (no tool_call): has_tool_call is false so the body is
returned unchanged regardless of which tool_parser is set — <|im_end|>
always leaks here.
which causes

[trajectory] merge prefix drift; truncating 55 unstitched response tokens
[trajectory] merge prefix drift; truncating 56 unstitched response tokens
[trajectory] merge prefix drift; truncating 84 unstitched response tokens
[trajectory] merge prefix drift; truncating 90 unstitched response tokens

many times

tool_parser=qwen3_coder already strips <|im_end|> on tool turns by
discarding everything after the tool_call block, but that does not help for
text-only turns. reasoning_parser has no effect on EOS stripping in either
case.

Fix

Add a _strip_trailing_eos step that runs on body_text after both tool and
reasoning parsing are complete, covering all turn types uniformly.

_resolve_eos_token(tokenizer) reads tokenizer.eos_token when available,
falling back to the hardcoded <|im_end|> default.
_strip_trailing_eos(text, eos_token) removes exactly one trailing EOS token.
The tokenizer parameter is threaded through parse_model_output
(default None for backward compatibility) and passed in from both call
sites: slime/agent/adapters/openai.py and
slime/agent/adapters/anthropic.py.

Test

sandbox/test_parsing.py covers all five parser combinations and can be run
with or without sglang (cases that need it are skipped gracefully).

"""Compare parse_model_output (from ``slime.agent.parsing``) results across
combinations of reasoning_parser / tool_parser — focusing on the
reasoning/text/tool_uses fields and whether ``<|im_end|>`` leaks into text.

Findings verified on real hardware (sglang):

* ``tool_parser=None`` (XML regex fallback): preserves the tail after
  ``</tool_call>`` → ``<|im_end|>`` **still leaks** into text.
* ``tool_parser=qwen3_coder`` (native sglang): discards everything after the
  tool_call block, including ``<|im_end|>`` → text on tool turns **does not**
  contain ``<|im_end|>``.
* **Text-only turns (no tool_call)**: ``has_tool_call`` is false, body is
  returned as-is → even with tool_parser set, ``<|im_end|>`` **still leaks**.
  EOS-stripping therefore remains valuable as a fallback.
* reasoning_parser only splits ``<think>...</think>`` into the ``reasoning``
  field; it does not affect whether ``<|im_end|>`` is stripped.

"""

from __future__ import annotations

import sys
from pathlib import Path

from slime.agent.parsing import parse_model_output  # noqa: E402

IM_END = "<|im_end|>"

# Tool turn: reasoning text + </think> + tool_call + trailing stop token
# (the opening <think> tag lives in the previous prompt turn).
RAW = (
    "Let me look that up.\n</think>\n\n"
    "<tool_call>\n<function=web_search>\n<parameter=query>\n"
    "Tencent stock price\n</parameter>\n</function>\n</tool_call>"
    + IM_END
)

# Text-only turn: no tool_call, just content + trailing stop token.
RAW_TEXT = "Tencent's stock price is approximately HKD 453." + IM_END

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "search the web",
            "parameters": {"type": "object", "properties": {"query": {"type": "string"}}},
        },
    }
]


def _show(title: str, parsed) -> None:
    print(f"\n=== {title} ===")
    print(f"reasoning = {parsed.reasoning!r}")
    print(f"text      = {parsed.text!r}")
    print(f"tool_uses = {parsed.tool_uses}")
    print(f"text contains <|im_end|>? {IM_END in parsed.text}")


def _check(
    title: str,
    reasoning_parser: str | None,
    tool_parser: str | None,
    *,
    expect_im_end: bool,
    expect_reasoning: bool,
    expect_tool: bool,
    raw: str = RAW,
):
    """Run one case; skip silently if sglang is unavailable."""
    try:
        p = parse_model_output(
            raw, tools_schema=TOOLS, tool_parser_name=tool_parser, reasoning_parser_name=reasoning_parser
        )
    except Exception as e:  # sglang missing, etc.
        print(f"\n=== {title} ===  skipped: {type(e).__name__}: {e}")
        return None
    _show(title, p)
    if expect_tool:
        assert p.tool_uses and p.tool_uses[0]["name"] == "web_search", p.tool_uses
    else:
        assert not p.tool_uses, p.tool_uses
    assert bool(p.reasoning) == expect_reasoning, f"reasoning={p.reasoning!r}"
    assert (IM_END in p.text) == expect_im_end, f"text={p.text!r}"
    return p


# ---------------------------------------------------------------------------
# pytest test cases
# ---------------------------------------------------------------------------
def test_no_parsers():
    """reasoning=None, tool=None: XML fallback parses tool; reasoning stays in
    text; <|im_end|> still leaks."""
    _check("reasoning=None, tool=None", None, None, expect_im_end=True, expect_reasoning=False, expect_tool=True)


def test_tool_only():
    """reasoning=None, tool=qwen3_coder: native parser discards everything after
    tool_call → <|im_end|> does not leak."""
    _check(
        "reasoning=None, tool=qwen3_coder", None, "qwen3_coder",
        expect_im_end=False, expect_reasoning=False, expect_tool=True,
    )


def test_reasoning_only():
    """reasoning=qwen3, tool=None: reasoning is extracted; XML fallback preserves
    the tail → <|im_end|> still leaks."""
    _check(
        "reasoning=qwen3, tool=None", "qwen3", None,
        expect_im_end=True, expect_reasoning=True, expect_tool=True,
    )


def test_both():
    """reasoning=qwen3, tool=qwen3_coder: reasoning extracted + native parser
    discards tail → <|im_end|> does not leak."""
    _check(
        "reasoning=qwen3, tool=qwen3_coder", "qwen3", "qwen3_coder",
        expect_im_end=False, expect_reasoning=True, expect_tool=True,
    )


def test_text_only_turn_still_leaks():
    """Text-only turn (no tool_call): tool_parser is not triggered even when set
    → <|im_end|> still leaks.

    This confirms that EOS-stripping as a fallback is still worthwhile even
    when qwen3_coder is in use.
    """
    _check(
        "text-only turn, tool=qwen3_coder", None, "qwen3_coder",
        expect_im_end=True, expect_reasoning=False, expect_tool=False, raw=RAW_TEXT,
    )


if __name__ == "__main__":
    test_no_parsers()
    test_tool_only()
    test_reasoning_only()
    test_both()
    test_text_only_turn_still_leaks()
    print(
        "\nSummary: tool_parser=None (XML fallback) leaks <|im_end|> on tool turns; "
        "qwen3_coder (native) does not leak on tool turns, but text-only turns still leak "
        "→ EOS-stripping as a fallback should be kept."
    )

Output before this fix:

=== reasoning=None, tool=None ===
reasoning = ''
text      = 'Let me look that up.\n</think>\n\n<|im_end|>'
tool_uses = [{'name': 'web_search', 'input': {'query': 'Tencent stock price'}}]
text contains <|im_end|>? True
[debug] requested func='web_search', available tools=['web_search']

=== reasoning=None, tool=qwen3_coder ===
reasoning = ''
text      = 'Let me look that up.\n</think>'
tool_uses = [{'name': 'web_search', 'input': {'query': 'Tencent stock price'}}]
text contains <|im_end|>? False

=== reasoning=qwen3, tool=None ===
reasoning = 'Let me look that up.\n'
text      = '<|im_end|>'
tool_uses = [{'name': 'web_search', 'input': {'query': 'Tencent stock price'}}]
text contains <|im_end|>? True
[debug] requested func='web_search', available tools=['web_search']

=== reasoning=qwen3, tool=qwen3_coder ===
reasoning = 'Let me look that up.\n'
text      = ''
tool_uses = [{'name': 'web_search', 'input': {'query': 'Tencent stock price'}}]
text contains <|im_end|>? False

=== text-only turn, tool=qwen3_coder ===
reasoning = ''
text      = "Tencent's stock price is approximately HKD 453.<|im_end|>"
tool_uses = []
text contains <|im_end|>? True

Summary: tool_parser=None (XML fallback) leaks <|im_end|> on tool turns; qwen3_coder (native) does not leak on tool turns, but text-only turns still leak → EOS-stripping as a fallback should be kept.

Result

Case	Before	After
tool turn, `tool_parser=None`	leaks `<\|im_end\|>` ❌	clean ✅
tool turn, `tool_parser=qwen3_coder`	clean ✅	clean ✅
text-only turn, any `tool_parser`	leaks `<\|im_end\|>` ❌	clean ✅

Notes

The strip runs after tool_parser, so qwen3_coder's existing behaviour is
unchanged — it becomes a no-op when the token is already gone.
Only one trailing EOS token is removed to avoid over-stripping edge cases
where the token appears in actual content.

…S token from assistant text

fix: model output parsing more consistent by removing one trailing EO…

0fd1c6e

…S token from assistant text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parsing): strip trailing EOS token from body_text after tool/reasoning parsing for code agent rl#2049

fix(parsing): strip trailing EOS token from body_text after tool/reasoning parsing for code agent rl#2049
none0663 wants to merge 1 commit into
THUDM:mainfrom
none0663:fix-paring-code-agent

none0663 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

none0663 commented Jun 10, 2026

Problem

Fix

Test

Result

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant