fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52
fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52hallerite wants to merge 4 commits into
Conversation
Renderers' Qwen3.5/GLM/MiniMax/Laguna XML parsers flagged
``<parameter=x>True</parameter>`` as INVALID_JSON for boolean params
because ``json.loads("True")`` fails. Both SGLang's
``Qwen3CoderDetector`` and vLLM's ``Qwen3CoderToolParser`` accept it via
``param_value.lower() == "true"`` — so Pythonic literals the model
freely emits round-trip cleanly at inference but came back malformed
through renderers.
Replaces the string-or-``json.loads`` dispatch with the full
``_convert_param_value`` ladder shared by both reference parsers:
case-insensitive ``null``, ``int()`` / ``float()`` for numeric families
with int demotion, case-folded bool, ``json.loads`` → ``ast.literal_eval``
for objects/arrays/anyOf, ``ast.literal_eval`` catch-all. INVALID_JSON
remains as the verifier / RL-loss signal for values that had to
degenerate (e.g. ``yes`` for bool, ``abc`` for int).
One deliberate deviation from vLLM/SGLang: declared ``string`` types
preserve ``"null"`` verbatim instead of coercing to Python ``None``, so
the existing ``test_tool_arg_type_preservation`` string-verbatim
contract still holds.
Fixes #47.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror vLLM's Qwen3CoderToolParser end to end (chosen over qwen3_xml because qwen3_coder is a strict superset for offline value coercion; qwen3_xml's edge over qwen3_coder is its streaming state machine, which does not apply here yet — module comment notes the TODO to switch when streaming lands). Closes the three remaining structural deltas: - Adopt vLLM's three tolerant regex patterns (tool_call / function / parameter) so unclosed </parameter>, </function>, and </tool_call> tags are recovered instead of silently dropping their content. - Add the qwen3coder_tool_parser.py:269-271 back-off: when output has no <tool_call> markers, scan the whole completion for raw <function=...></function> blocks and recover the call. Content text is whatever remains after the function blocks are stripped. - Strip exactly one leading and one trailing newline from parameter values (qwen3coder_tool_parser.py:246-250); interior whitespace round-trips verbatim — previous .strip() was too aggressive. - Fix the no-schema branch of _coerce_arg_value to match vLLM (qwen3coder_tool_parser.py:128-137): return the raw text verbatim after the case-insensitive null short-circuit instead of attempting json.loads. Qwen3.6 needs no separate parser changes — Qwen36Renderer inherits parse_response from Qwen35Renderer and shares the same parse_qwen35 plumbing, so the vLLM-parity ladder + structural tolerance applies to both 3.5 (Python repr literals) and 3.6 (JSON literals) in one shot. Qwen35ToolParser in parsers.py (DefaultRenderer + tool_parser= 'qwen3.5' path) is updated to share the new _parse_xml_function_blocks helper so both code paths produce identical output. Tests: 7 new structural-parity tests pin the regex tolerance, no-tag back-off, and whitespace stripping against the vLLM reference shape. Existing test_no_schema_falls_back_to_json_loads is renamed and inverted to match the new (vLLM-faithful) behavior.
|
|
||
| _TOOL_CALL_BLOCK_RE = re.compile( | ||
| r"<tool_call>(.*?)</tool_call>|<tool_call>(.*?)$", re.DOTALL | ||
| ) |
There was a problem hiding this comment.
Unused compiled regex _TOOL_CALL_BLOCK_RE is dead code
Low Severity
_TOOL_CALL_BLOCK_RE is defined at module level as a compiled regex but is never referenced anywhere in the codebase. A grep confirms the only occurrence is its definition. The sibling patterns _FUNCTION_BLOCK_RE and _PARAMETER_BLOCK_RE are both actively used, but this one appears to have been carried over from the vLLM source listing without an actual consumer. It adds unnecessary clutter and a small startup cost for the unused re.compile.
Reviewed by Cursor Bugbot for commit a338f9d. Configure here.
ApprovabilityVerdict: Needs human review This PR modifies core parsing logic with significant runtime behavior changes to argument type coercion. An unresolved medium-severity comment identifies a potential regression where Nemotron3 argument parsing may break due to the no-schema path now returning verbatim strings instead of attempting JSON parsing. You can customize Macroscope's approvability policy. Learn more. |
…xception path Close the four remaining behavioral deltas vs vLLM's Qwen3CoderToolParser non-streaming extract path: - Content text is now the raw slice (no .strip()). vLLM qwen3coder_tool_parser.py:316-319 explicitly leaves the commented-out .rstrip() in place; surrounding whitespace before/after <tool_call> round-trips verbatim. - Backoff content (no <tool_call> markers) is the text up to the first <function= position only, matching vLLM line 317-319. Removed _strip_function_blocks helper that concatenated leftovers between function blocks — different from vLLM. - Each successfully-parsed tool call is assigned an id in vLLM's default format: chatcmpl-tool-<16 hex chars> via uuid.uuid4().int & MASK_64_BITS, matching vLLM's make_tool_call_id / random_uuid pair. Malformed entries (no function name) intentionally stay id-less since vLLM would have dropped them entirely. - Tool-call extraction is wrapped in try/except: on any unhandled error, surface tool_calls=[] and content=full_decoded_text — the vLLM qwen3coder_tool_parser.py:327-331 catch-all behavior. Reasoning extraction stays outside the wrapper since vLLM doesn't do reasoning in this parser. Tests: 4 new cases pin content-no-strip, backoff content slice, chatcmpl-tool-<hex> id format, and the exception fallback.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 96a1c98. Configure here.
| return None, False | ||
|
|
||
| if param_schema is None or declared is None or declared in _STRING_TYPES: | ||
| return text, False |
There was a problem hiding this comment.
No-schema coercion change breaks Nemotron3 argument parsing
Medium Severity
The no-schema path in _coerce_arg_value changed from attempting json.loads (with text fallback) to returning verbatim strings. Nemotron3Renderer.parse_response deliberately does not forward tools to parse_qwen35 (marked # noqa: ARG002), so every parameter hits the no-schema branch. Previously "42" parsed to 42 and "true" to True via json.loads; now they stay as raw strings "42" and "true". This silently changes the types in the arguments dict for all Nemotron3 tool calls.
Reviewed by Cursor Bugbot for commit 96a1c98. Configure here.
…byte
vLLM's _convert_param_value (qwen3coder_tool_parser.py:125) runs the
case-insensitive "null" short-circuit BEFORE any type check —
including the string family. That means an XML wire value of
<parameter=x>null</parameter> against a {"type": "string"} schema
collapses to Python None, not the string "null".
Previously this repo carried a deliberate deviation to preserve the
string verbatim across the four XML renderers (Qwen3.5/3.6, GLM-5,
MiniMax-M2.5, Laguna-XS.2). The motivation was a tidier round-trip
contract: "string params always round-trip as strings." But the XML
wire format genuinely cannot distinguish the string "null" from a
JSON null, and matching vLLM is more important than matching our
sibling renderers — downstream consumers (verifier, RL-loss) align
with vLLM's interpretation, not ours.
Test updates:
- test_arg_coercion::test_null_is_preserved_verbatim_for_string_type
is renamed to test_null_short_circuit_applies_even_for_string_type
and inverted: a string-typed "null" now returns None.
- test_tool_arg_type_preservation::test_string_arg_preserves_type
branches on _JSON_FORMAT_MODELS (Qwen3 Hermes, Kimi K2 section JSON)
vs XML renderers for the string-null case only. JSON renderers
preserve "null" verbatim because the wire bytes quote strings; XML
renderers null-coerce. Other string cases (bool/int/array/object)
still round-trip verbatim everywhere.


Summary
Closes #47. Renderers' XML-style parsers (Qwen3.5, GLM, MiniMax, Laguna) flagged
<parameter=x>True</parameter>asINVALID_JSONfor boolean params becausejson.loads("True")fails — even though both SGLang'sQwen3CoderDetectorand vLLM'sQwen3CoderToolParseraccept Pythonic literals via case-folded comparison. Models freely emitTrue/Falseat inference because the reference parsers normalize them, but the same outputs came back malformed through renderers.This swaps
_coerce_arg_valuefrom a string-or-json.loadsdispatch to the full_convert_param_valueladder both reference parsers share:json.loads, fallback to raw text +INVALID_JSON(historical behavior preserved)string/str/text/varchar/char/enumint/uint/long/short/unsignedint(text), fall back to string +INVALID_JSONnum/floatfloat(text)with SGLang's source-string int-demotion heuristic, fall back to string +INVALID_JSONboolean/bool/binarytext.lower() == "true"; non-true/false→False+INVALID_JSONobject/array/dict*/list*/anyOfjson.loads, thenast.literal_evalfallback (Python literals like{'k': 1}), then string +INVALID_JSONast.literal_eval, fall back to string +INVALID_JSONnullliteral (any case)None— except for declaredstring-family types (see deviation below)INVALID_JSONis preserved as the verifier / RL-loss signal whenever the value had to degenerate (e.g.yesfor a bool,abcfor an int).Deliberate deviation from vLLM/SGLang
Both reference parsers null-coerce
"null"before checking type, so a string-typed arg of"null"returns PythonNone. Renderers keeps"null"verbatim fortype: "string"so the existingtests/test_tool_arg_type_preservation.py::test_string_arg_preserves_type[*-string-null]contract holds across all five XML renderers. The XML wire format already can't distinguish the string"null"from JSON null, but when the schema explicitly saystype: "string"we honour it.Reference parser sources
python/sglang/srt/function_call/qwen3_coder_detector.pyvllm/tool_parsers/qwen3coder_tool_parser.py::_convert_param_valueThe two are functionally near-identical for this code path; vLLM is slightly more permissive (extra
anyOf → objectbranch). Renderers now matches vLLM's superset, with the one string-nulldeviation noted above.Test plan
pytest tests/test_arg_coercion.py— 43 new unit tests cover every branch of the ladder, including the issue'sTrue/Falseregression through fullparse_qwen35round-trip.pytest tests/test_tool_arg_type_preservation.py— 30/30 string-preservation cases pass across all five XML renderers (thestring-nullcase still round-trips verbatim).🤖 Generated with Claude Code
Note
Medium Risk
Changes tool-call extraction and argument coercion for XML-based renderers, which may alter parsed values/statuses (e.g.,
"null"→None, booleans/int/float casting) and affect downstream tool execution or training masks if assumptions differ.Overview
Brings Qwen3.5-style XML tool-call parsing into closer parity with vLLM/SGLang by replacing the prior
json.loads-centric coercion with a schema-driven type ladder (case-insensitivenull, bool/int/float handling, andast.literal_evalfallback for Python literals).Refactors Qwen3.5 extraction to use shared regex-based
<function=…>block parsing (_parse_xml_function_blocks), improving tolerance for malformed/unclosed tags and adding vLLM-style behaviors like unstripped content prefixes, backoff parsing when<tool_call>wrappers are missing, and per-call ids inchatcmpl-tool-…format. Adds comprehensive unit/integration coverage for the new coercion ladder and edge cases, and updates the existing string-type preservation test to account for XML"null"ambiguity.Reviewed by Cursor Bugbot for commit b58fa63. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Fix XML tool-call argument coercion to match vLLM type ladder in Qwen3.5 parser
json.loads-based arg coercion inparsing.pywith a schema-aware type ladder: null short-circuit, string passthrough, int/float parsing, boolean vialower() == 'true', and object/array viajson.loadswithast.literal_evalfallback._parse_xml_function_blocksto parse<function=>blocks without requiring a<tool_call>wrapper, with tolerance for missing closing tags and per-call id/status assignment.Qwen35ToolParser.extract()inparsers.pyto delegate to_parse_xml_function_blocks, enabling multipleParsedToolCallentries per<tool_call>and more precise status codes (UNCLOSED_BLOCK,MALFORMED_STRUCTURE).chatcmpl-tool-<16 hex>format via_make_tool_call_id.nullliteral is coerced toNoneeven when the declared type isstring; on any parse exception, the full post-think text is returned as content with no tool calls.Macroscope summarized b58fa63.