fix(backends): capture vLLM reasoning field in mot._thinking#1063
Open
planetf1 wants to merge 2 commits into
Open
fix(backends): capture vLLM reasoning field in mot._thinking#1063planetf1 wants to merge 2 commits into
planetf1 wants to merge 2 commits into
Conversation
vLLM served with --reasoning-parser qwen3 (and other thinking models) surfaces the reasoning trace under the "reasoning" key of the raw message dict, not as a Python attribute on the openai SDK object. The existing hasattr(message, "reasoning_content") probe therefore missed it, and mot._thinking was silently left unpopulated for Qwen3 and similar models. Add a fallback in both the non-streaming (ChatCompletion) and streaming (ChatCompletionChunk) branches of processing(): when reasoning_content is absent, probe the raw .model_dump() for a "reasoning" key. The existing reasoning_content path is preserved untouched. Closes generative-computing#1061 Assisted-by: Claude Code
7e618c5 to
42cee39
Compare
Contributor
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
…soning
Replace the `hasattr(message, "reasoning_content") / else model_dump()`
pattern with a unified value-based probe:
thinking_chunk = getattr(obj, "reasoning_content", None)
if thinking_chunk is None:
thinking_chunk = (obj.model_extra or {}).get("reasoning")
This closes two gaps identified in code review:
1. Edge case: if a proxy sends `{"reasoning_content": null, "reasoning":
"trace"}`, the old hasattr guard was True, the None check
short-circuited, and the else branch never fired — silently dropping
the trace. The value-based check falls through correctly.
2. Performance: the old else branch called model_dump() on every streaming
delta for non-thinking models (gpt-4o etc.), allocating a full dict per
token. model_extra is already a plain dict — O(1) lookup, no
serialisation.
Also adds test_processing_reasoning_content_takes_precedence_over_reasoning
to pin that reasoning_content wins when both fields are present on the same
message object.
Closes generative-computing#1061
Assisted-by: Claude Code
psschwei
approved these changes
May 13, 2026
| @@ -1 +1 @@ | |||
| """Unit tests for OpenAI backend pure-logic helpers — no API calls required. | |||
Member
There was a problem hiding this comment.
Suggested change
| """Unit tests for OpenAI backend pure-logic helpers. |
since we're calling backend.processing() in some of the new tests
| thinking_chunk = message.reasoning_content # type: ignore | ||
| if thinking_chunk is not None: | ||
| mot._thinking += thinking_chunk | ||
| # reasoning_content (Anthropic/DeepSeek attribute path) takes priority; |
Member
There was a problem hiding this comment.
nit: I think Anthropic has a different API (?)
Comment on lines
+1021
to
+1025
| thinking_chunk = getattr(message_delta, "reasoning_content", None) | ||
| if thinking_chunk is None: | ||
| thinking_chunk = (message_delta.model_extra or {}).get("reasoning") | ||
| if thinking_chunk is not None: | ||
| mot._thinking += thinking_chunk |
akihikokuroda
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Misc PR
Type of PR
Description
When
OpenAIBackendis used with a vLLM-served thinking model (e.g. Qwen3with
--reasoning-parser qwen3), the reasoning trace was silently discarded—
mot._thinkingwas never populated.OpenAIBackend.processing()probed for the trace viahasattr(message, "reasoning_content"). vLLM 0.20.2 does not expose this asan attribute on the openai SDK message object; the trace lives under the raw
key
"reasoning", accessible viamodel_extra. Thehasattrcheck returnedFalse, somot._thinkingwas never assigned. Both the non-streaming(
ChatCompletion) and streaming (ChatCompletionChunk) branches were affected.The fix uses a value-based probe in both branches:
This also closes an edge case where a proxy could send
{"reasoning_content": null, "reasoning": "trace"}— the oldhasattr/elseshape would have dropped the trace in that scenario. Using
model_extrainsteadof
model_dump()also avoids a full Pydantic serialisation per streaming chunkfor non-thinking models.
A follow-up issue (#1070) has been filed for the same gap in
LiteLLMBackend,where the fix is more involved as LiteLLM drops the
reasoningfield during itsown normalisation layer.
Testing
New unit tests in
test/backends/test_openai_unit.py(25 passed):test_processing_captures_vllm_reasoning_field— non-streaming vLLM shape populatesmot._thinkingtest_processing_vllm_reasoning_with_null_content— reasoning captured whencontentis nulltest_processing_streaming_captures_vllm_reasoning_field— streaming deltas withreasoningaccumulate correctlytest_processing_reasoning_content_still_used— regression guard for thereasoning_contentattribute pathtest_processing_reasoning_content_takes_precedence_over_reasoning— pins thatreasoning_contentwins when both fields are presentAttribution