Skip to content

fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52

Open
hallerite wants to merge 4 commits into
mainfrom
fix/qwen35-arg-coercion-vllm-parity
Open

fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52
hallerite wants to merge 4 commits into
mainfrom
fix/qwen35-arg-coercion-vllm-parity

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 18, 2026

Summary

Closes #47. Renderers' XML-style parsers (Qwen3.5, GLM, MiniMax, Laguna) flagged <parameter=x>True</parameter> as INVALID_JSON for boolean params because json.loads("True") fails — even though both SGLang's Qwen3CoderDetector and vLLM's Qwen3CoderToolParser accept Pythonic literals via case-folded comparison. Models freely emit True/False at inference because the reference parsers normalize them, but the same outputs came back malformed through renderers.

This swaps _coerce_arg_value from a string-or-json.loads dispatch to the full _convert_param_value ladder both reference parsers share:

Declared type Coercion
(none) — no schema passed json.loads, fallback to raw text + INVALID_JSON (historical behavior preserved)
string / str / text / varchar / char / enum return verbatim
int / uint / long / short / unsigned int(text), fall back to string + INVALID_JSON
num / float float(text) with SGLang's source-string int-demotion heuristic, fall back to string + INVALID_JSON
boolean / bool / binary text.lower() == "true"; non-true/falseFalse + INVALID_JSON
object / array / dict* / list* / anyOf json.loads, then ast.literal_eval fallback (Python literals like {'k': 1}), then string + INVALID_JSON
Unknown ast.literal_eval, fall back to string + INVALID_JSON
null literal (any case) Noneexcept for declared string-family types (see deviation below)

INVALID_JSON is preserved as the verifier / RL-loss signal whenever the value had to degenerate (e.g. yes for a bool, abc for an int).

Deliberate deviation from vLLM/SGLang

Both reference parsers null-coerce "null" before checking type, so a string-typed arg of "null" returns Python None. Renderers keeps "null" verbatim for type: "string" so the existing tests/test_tool_arg_type_preservation.py::test_string_arg_preserves_type[*-string-null] contract holds across all five XML renderers. The XML wire format already can't distinguish the string "null" from JSON null, but when the schema explicitly says type: "string" we honour it.

Reference parser sources

The two are functionally near-identical for this code path; vLLM is slightly more permissive (extra anyOf → object branch). Renderers now matches vLLM's superset, with the one string-null deviation noted above.

Test plan

  • pytest tests/test_arg_coercion.py — 43 new unit tests cover every branch of the ladder, including the issue's True/False regression through full parse_qwen35 round-trip.
  • pytest tests/test_tool_arg_type_preservation.py — 30/30 string-preservation cases pass across all five XML renderers (the string-null case still round-trips verbatim).
  • Full suite: 1371 passed, 53 skipped, 1 xfailed.

🤖 Generated with Claude Code


Note

Medium Risk
Changes tool-call extraction and argument coercion for XML-based renderers, which may alter parsed values/statuses (e.g., "null"None, booleans/int/float casting) and affect downstream tool execution or training masks if assumptions differ.

Overview
Brings Qwen3.5-style XML tool-call parsing into closer parity with vLLM/SGLang by replacing the prior json.loads-centric coercion with a schema-driven type ladder (case-insensitive null, bool/int/float handling, and ast.literal_eval fallback for Python literals).

Refactors Qwen3.5 extraction to use shared regex-based <function=…> block parsing (_parse_xml_function_blocks), improving tolerance for malformed/unclosed tags and adding vLLM-style behaviors like unstripped content prefixes, backoff parsing when <tool_call> wrappers are missing, and per-call ids in chatcmpl-tool-… format. Adds comprehensive unit/integration coverage for the new coercion ladder and edge cases, and updates the existing string-type preservation test to account for XML "null" ambiguity.

Reviewed by Cursor Bugbot for commit b58fa63. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Fix XML tool-call argument coercion to match vLLM type ladder in Qwen3.5 parser

  • Replaces json.loads-based arg coercion in parsing.py with a schema-aware type ladder: null short-circuit, string passthrough, int/float parsing, boolean via lower() == 'true', and object/array via json.loads with ast.literal_eval fallback.
  • Adds _parse_xml_function_blocks to parse <function=> blocks without requiring a <tool_call> wrapper, with tolerance for missing closing tags and per-call id/status assignment.
  • Refactors Qwen35ToolParser.extract() in parsers.py to delegate to _parse_xml_function_blocks, enabling multiple ParsedToolCall entries per <tool_call> and more precise status codes (UNCLOSED_BLOCK, MALFORMED_STRUCTURE).
  • Tool-call ids now follow the vLLM chatcmpl-tool-<16 hex> format via _make_tool_call_id.
  • Behavioral Change: content text is no longer stripped (whitespace preserved); null literal is coerced to None even when the declared type is string; on any parse exception, the full post-think text is returned as content with no tool calls.

Macroscope summarized b58fa63.

Renderers' Qwen3.5/GLM/MiniMax/Laguna XML parsers flagged
``<parameter=x>True</parameter>`` as INVALID_JSON for boolean params
because ``json.loads("True")`` fails. Both SGLang's
``Qwen3CoderDetector`` and vLLM's ``Qwen3CoderToolParser`` accept it via
``param_value.lower() == "true"`` — so Pythonic literals the model
freely emits round-trip cleanly at inference but came back malformed
through renderers.

Replaces the string-or-``json.loads`` dispatch with the full
``_convert_param_value`` ladder shared by both reference parsers:
case-insensitive ``null``, ``int()`` / ``float()`` for numeric families
with int demotion, case-folded bool, ``json.loads`` → ``ast.literal_eval``
for objects/arrays/anyOf, ``ast.literal_eval`` catch-all. INVALID_JSON
remains as the verifier / RL-loss signal for values that had to
degenerate (e.g. ``yes`` for bool, ``abc`` for int).

One deliberate deviation from vLLM/SGLang: declared ``string`` types
preserve ``"null"`` verbatim instead of coercing to Python ``None``, so
the existing ``test_tool_arg_type_preservation`` string-verbatim
contract still holds.

Fixes #47.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror vLLM's Qwen3CoderToolParser end to end (chosen over qwen3_xml
because qwen3_coder is a strict superset for offline value coercion;
qwen3_xml's edge over qwen3_coder is its streaming state machine,
which does not apply here yet — module comment notes the TODO to
switch when streaming lands). Closes the three remaining structural
deltas:

- Adopt vLLM's three tolerant regex patterns (tool_call / function /
  parameter) so unclosed </parameter>, </function>, and </tool_call>
  tags are recovered instead of silently dropping their content.
- Add the qwen3coder_tool_parser.py:269-271 back-off: when output has
  no <tool_call> markers, scan the whole completion for raw
  <function=...></function> blocks and recover the call. Content text
  is whatever remains after the function blocks are stripped.
- Strip exactly one leading and one trailing newline from parameter
  values (qwen3coder_tool_parser.py:246-250); interior whitespace
  round-trips verbatim — previous .strip() was too aggressive.
- Fix the no-schema branch of _coerce_arg_value to match vLLM
  (qwen3coder_tool_parser.py:128-137): return the raw text verbatim
  after the case-insensitive null short-circuit instead of attempting
  json.loads.

Qwen3.6 needs no separate parser changes — Qwen36Renderer inherits
parse_response from Qwen35Renderer and shares the same parse_qwen35
plumbing, so the vLLM-parity ladder + structural tolerance applies
to both 3.5 (Python repr literals) and 3.6 (JSON literals) in one
shot.

Qwen35ToolParser in parsers.py (DefaultRenderer + tool_parser=
'qwen3.5' path) is updated to share the new _parse_xml_function_blocks
helper so both code paths produce identical output.

Tests: 7 new structural-parity tests pin the regex tolerance, no-tag
back-off, and whitespace stripping against the vLLM reference shape.
Existing test_no_schema_falls_back_to_json_loads is renamed and
inverted to match the new (vLLM-faithful) behavior.
Comment thread renderers/parsing.py

_TOOL_CALL_BLOCK_RE = re.compile(
r"<tool_call>(.*?)</tool_call>|<tool_call>(.*?)$", re.DOTALL
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused compiled regex _TOOL_CALL_BLOCK_RE is dead code

Low Severity

_TOOL_CALL_BLOCK_RE is defined at module level as a compiled regex but is never referenced anywhere in the codebase. A grep confirms the only occurrence is its definition. The sibling patterns _FUNCTION_BLOCK_RE and _PARAMETER_BLOCK_RE are both actively used, but this one appears to have been carried over from the vLLM source listing without an actual consumer. It adds unnecessary clutter and a small startup cost for the unused re.compile.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a338f9d. Configure here.

@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 20, 2026

Approvability

Verdict: Needs human review

This PR modifies core parsing logic with significant runtime behavior changes to argument type coercion. An unresolved medium-severity comment identifies a potential regression where Nemotron3 argument parsing may break due to the no-schema path now returning verbatim strings instead of attempting JSON parsing.

You can customize Macroscope's approvability policy. Learn more.

…xception path

Close the four remaining behavioral deltas vs vLLM's
Qwen3CoderToolParser non-streaming extract path:

- Content text is now the raw slice (no .strip()). vLLM
  qwen3coder_tool_parser.py:316-319 explicitly leaves the commented-out
  .rstrip() in place; surrounding whitespace before/after <tool_call>
  round-trips verbatim.
- Backoff content (no <tool_call> markers) is the text up to the first
  <function= position only, matching vLLM line 317-319. Removed
  _strip_function_blocks helper that concatenated leftovers between
  function blocks — different from vLLM.
- Each successfully-parsed tool call is assigned an id in vLLM's
  default format: chatcmpl-tool-<16 hex chars> via
  uuid.uuid4().int & MASK_64_BITS, matching vLLM's make_tool_call_id /
  random_uuid pair. Malformed entries (no function name) intentionally
  stay id-less since vLLM would have dropped them entirely.
- Tool-call extraction is wrapped in try/except: on any unhandled
  error, surface tool_calls=[] and content=full_decoded_text — the
  vLLM qwen3coder_tool_parser.py:327-331 catch-all behavior. Reasoning
  extraction stays outside the wrapper since vLLM doesn't do reasoning
  in this parser.

Tests: 4 new cases pin content-no-strip, backoff content slice,
chatcmpl-tool-<hex> id format, and the exception fallback.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 96a1c98. Configure here.

Comment thread renderers/parsing.py
return None, False

if param_schema is None or declared is None or declared in _STRING_TYPES:
return text, False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No-schema coercion change breaks Nemotron3 argument parsing

Medium Severity

The no-schema path in _coerce_arg_value changed from attempting json.loads (with text fallback) to returning verbatim strings. Nemotron3Renderer.parse_response deliberately does not forward tools to parse_qwen35 (marked # noqa: ARG002), so every parameter hits the no-schema branch. Previously "42" parsed to 42 and "true" to True via json.loads; now they stay as raw strings "42" and "true". This silently changes the types in the arguments dict for all Nemotron3 tool calls.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 96a1c98. Configure here.

…byte

vLLM's _convert_param_value (qwen3coder_tool_parser.py:125) runs the
case-insensitive "null" short-circuit BEFORE any type check —
including the string family. That means an XML wire value of
<parameter=x>null</parameter> against a {"type": "string"} schema
collapses to Python None, not the string "null".

Previously this repo carried a deliberate deviation to preserve the
string verbatim across the four XML renderers (Qwen3.5/3.6, GLM-5,
MiniMax-M2.5, Laguna-XS.2). The motivation was a tidier round-trip
contract: "string params always round-trip as strings." But the XML
wire format genuinely cannot distinguish the string "null" from a
JSON null, and matching vLLM is more important than matching our
sibling renderers — downstream consumers (verifier, RL-loss) align
with vLLM's interpretation, not ours.

Test updates:
- test_arg_coercion::test_null_is_preserved_verbatim_for_string_type
  is renamed to test_null_short_circuit_applies_even_for_string_type
  and inverted: a string-typed "null" now returns None.
- test_tool_arg_type_preservation::test_string_arg_preserves_type
  branches on _JSON_FORMAT_MODELS (Qwen3 Hermes, Kimi K2 section JSON)
  vs XML renderers for the string-null case only. JSON renderers
  preserve "null" verbatim because the wire bytes quote strings; XML
  renderers null-coerce. Other string cases (bool/int/array/object)
  still round-trip verbatim everywhere.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3.5 tool-call parser is stricter than SGLang qwen3_coder on boolean args

1 participant