Skip to content

fix(parsing): strip trailing EOS token from body_text after tool/reasoning parsing for code agent rl#2049

Open
none0663 wants to merge 1 commit into
THUDM:mainfrom
none0663:fix-paring-code-agent
Open

fix(parsing): strip trailing EOS token from body_text after tool/reasoning parsing for code agent rl#2049
none0663 wants to merge 1 commit into
THUDM:mainfrom
none0663:fix-paring-code-agent

Conversation

@none0663

Copy link
Copy Markdown
Contributor

Problem

parse_model_output leaks <|im_end|> into the text field in two cases:

  • Tool turns with tool_parser=None (XML regex fallback): the tail after
    </tool_call> is preserved as-is, so <|im_end|> ends up in text
  • Text-only turns (no tool_call): has_tool_call is false so the body is
    returned unchanged regardless of which tool_parser is set — <|im_end|>
    always leaks here.
    which causes
[trajectory] merge prefix drift; truncating 55 unstitched response tokens
[trajectory] merge prefix drift; truncating 56 unstitched response tokens
[trajectory] merge prefix drift; truncating 84 unstitched response tokens
[trajectory] merge prefix drift; truncating 90 unstitched response tokens

many times

tool_parser=qwen3_coder already strips <|im_end|> on tool turns by
discarding everything after the tool_call block, but that does not help for
text-only turns. reasoning_parser has no effect on EOS stripping in either
case.

Fix

Add a _strip_trailing_eos step that runs on body_text after both tool and
reasoning parsing are complete, covering all turn types uniformly.

  • _resolve_eos_token(tokenizer) reads tokenizer.eos_token when available,
    falling back to the hardcoded <|im_end|> default.
  • _strip_trailing_eos(text, eos_token) removes exactly one trailing EOS token.
  • The tokenizer parameter is threaded through parse_model_output
    (default None for backward compatibility) and passed in from both call
    sites: slime/agent/adapters/openai.py and
    slime/agent/adapters/anthropic.py.

Test

sandbox/test_parsing.py covers all five parser combinations and can be run
with or without sglang (cases that need it are skipped gracefully).

"""Compare parse_model_output (from ``slime.agent.parsing``) results across
combinations of reasoning_parser / tool_parser — focusing on the
reasoning/text/tool_uses fields and whether ``<|im_end|>`` leaks into text.

Findings verified on real hardware (sglang):

* ``tool_parser=None`` (XML regex fallback): preserves the tail after
  ``</tool_call>`` → ``<|im_end|>`` **still leaks** into text.
* ``tool_parser=qwen3_coder`` (native sglang): discards everything after the
  tool_call block, including ``<|im_end|>`` → text on tool turns **does not**
  contain ``<|im_end|>``.
* **Text-only turns (no tool_call)**: ``has_tool_call`` is false, body is
  returned as-is → even with tool_parser set, ``<|im_end|>`` **still leaks**.
  EOS-stripping therefore remains valuable as a fallback.
* reasoning_parser only splits ``<think>...</think>`` into the ``reasoning``
  field; it does not affect whether ``<|im_end|>`` is stripped.

"""

from __future__ import annotations

import sys
from pathlib import Path

from slime.agent.parsing import parse_model_output  # noqa: E402

IM_END = "<|im_end|>"

# Tool turn: reasoning text + </think> + tool_call + trailing stop token
# (the opening <think> tag lives in the previous prompt turn).
RAW = (
    "Let me look that up.\n</think>\n\n"
    "<tool_call>\n<function=web_search>\n<parameter=query>\n"
    "Tencent stock price\n</parameter>\n</function>\n</tool_call>"
    + IM_END
)

# Text-only turn: no tool_call, just content + trailing stop token.
RAW_TEXT = "Tencent's stock price is approximately HKD 453." + IM_END

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "search the web",
            "parameters": {"type": "object", "properties": {"query": {"type": "string"}}},
        },
    }
]


def _show(title: str, parsed) -> None:
    print(f"\n=== {title} ===")
    print(f"reasoning = {parsed.reasoning!r}")
    print(f"text      = {parsed.text!r}")
    print(f"tool_uses = {parsed.tool_uses}")
    print(f"text contains <|im_end|>? {IM_END in parsed.text}")


def _check(
    title: str,
    reasoning_parser: str | None,
    tool_parser: str | None,
    *,
    expect_im_end: bool,
    expect_reasoning: bool,
    expect_tool: bool,
    raw: str = RAW,
):
    """Run one case; skip silently if sglang is unavailable."""
    try:
        p = parse_model_output(
            raw, tools_schema=TOOLS, tool_parser_name=tool_parser, reasoning_parser_name=reasoning_parser
        )
    except Exception as e:  # sglang missing, etc.
        print(f"\n=== {title} ===  skipped: {type(e).__name__}: {e}")
        return None
    _show(title, p)
    if expect_tool:
        assert p.tool_uses and p.tool_uses[0]["name"] == "web_search", p.tool_uses
    else:
        assert not p.tool_uses, p.tool_uses
    assert bool(p.reasoning) == expect_reasoning, f"reasoning={p.reasoning!r}"
    assert (IM_END in p.text) == expect_im_end, f"text={p.text!r}"
    return p


# ---------------------------------------------------------------------------
# pytest test cases
# ---------------------------------------------------------------------------
def test_no_parsers():
    """reasoning=None, tool=None: XML fallback parses tool; reasoning stays in
    text; <|im_end|> still leaks."""
    _check("reasoning=None, tool=None", None, None, expect_im_end=True, expect_reasoning=False, expect_tool=True)


def test_tool_only():
    """reasoning=None, tool=qwen3_coder: native parser discards everything after
    tool_call → <|im_end|> does not leak."""
    _check(
        "reasoning=None, tool=qwen3_coder", None, "qwen3_coder",
        expect_im_end=False, expect_reasoning=False, expect_tool=True,
    )


def test_reasoning_only():
    """reasoning=qwen3, tool=None: reasoning is extracted; XML fallback preserves
    the tail → <|im_end|> still leaks."""
    _check(
        "reasoning=qwen3, tool=None", "qwen3", None,
        expect_im_end=True, expect_reasoning=True, expect_tool=True,
    )


def test_both():
    """reasoning=qwen3, tool=qwen3_coder: reasoning extracted + native parser
    discards tail → <|im_end|> does not leak."""
    _check(
        "reasoning=qwen3, tool=qwen3_coder", "qwen3", "qwen3_coder",
        expect_im_end=False, expect_reasoning=True, expect_tool=True,
    )


def test_text_only_turn_still_leaks():
    """Text-only turn (no tool_call): tool_parser is not triggered even when set
    → <|im_end|> still leaks.

    This confirms that EOS-stripping as a fallback is still worthwhile even
    when qwen3_coder is in use.
    """
    _check(
        "text-only turn, tool=qwen3_coder", None, "qwen3_coder",
        expect_im_end=True, expect_reasoning=False, expect_tool=False, raw=RAW_TEXT,
    )


if __name__ == "__main__":
    test_no_parsers()
    test_tool_only()
    test_reasoning_only()
    test_both()
    test_text_only_turn_still_leaks()
    print(
        "\nSummary: tool_parser=None (XML fallback) leaks <|im_end|> on tool turns; "
        "qwen3_coder (native) does not leak on tool turns, but text-only turns still leak "
        "→ EOS-stripping as a fallback should be kept."
    )

Output before this fix:

=== reasoning=None, tool=None ===
reasoning = ''
text      = 'Let me look that up.\n</think>\n\n<|im_end|>'
tool_uses = [{'name': 'web_search', 'input': {'query': 'Tencent stock price'}}]
text contains <|im_end|>? True
[debug] requested func='web_search', available tools=['web_search']

=== reasoning=None, tool=qwen3_coder ===
reasoning = ''
text      = 'Let me look that up.\n</think>'
tool_uses = [{'name': 'web_search', 'input': {'query': 'Tencent stock price'}}]
text contains <|im_end|>? False

=== reasoning=qwen3, tool=None ===
reasoning = 'Let me look that up.\n'
text      = '<|im_end|>'
tool_uses = [{'name': 'web_search', 'input': {'query': 'Tencent stock price'}}]
text contains <|im_end|>? True
[debug] requested func='web_search', available tools=['web_search']

=== reasoning=qwen3, tool=qwen3_coder ===
reasoning = 'Let me look that up.\n'
text      = ''
tool_uses = [{'name': 'web_search', 'input': {'query': 'Tencent stock price'}}]
text contains <|im_end|>? False

=== text-only turn, tool=qwen3_coder ===
reasoning = ''
text      = "Tencent's stock price is approximately HKD 453.<|im_end|>"
tool_uses = []
text contains <|im_end|>? True

Summary: tool_parser=None (XML fallback) leaks <|im_end|> on tool turns; qwen3_coder (native) does not leak on tool turns, but text-only turns still leak → EOS-stripping as a fallback should be kept.

Result

Case Before After
tool turn, tool_parser=None leaks <|im_end|> clean ✅
tool turn, tool_parser=qwen3_coder clean ✅ clean ✅
text-only turn, any tool_parser leaks <|im_end|> clean ✅

Notes

  • The strip runs after tool_parser, so qwen3_coder's existing behaviour is
    unchanged — it becomes a no-op when the token is already gone.
  • Only one trailing EOS token is removed to avoid over-stripping edge cases
    where the token appears in actual content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant