fix(backends): capture vLLM reasoning field in mot._thinking by planetf1 · Pull Request #1063 · generative-computing/mellea

planetf1 · 2026-05-12T13:27:44Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes OpenAI backend: mot._thinking never populated for vLLM thinking models (reasoning vs reasoning_content field name) #1061

When OpenAIBackend is used with a vLLM-served thinking model (e.g. Qwen3
with --reasoning-parser qwen3), the reasoning trace was silently discarded
— mot._thinking was never populated.

OpenAIBackend.processing() probed for the trace via
hasattr(message, "reasoning_content"). vLLM 0.20.2 does not expose this as
an attribute on the openai SDK message object; the trace lives under the raw
key "reasoning", accessible via model_extra. The hasattr check returned
False, so mot._thinking was never assigned. Both the non-streaming
(ChatCompletion) and streaming (ChatCompletionChunk) branches were affected.

The fix uses a value-based probe in both branches:

thinking_chunk = getattr(obj, "reasoning_content", None)
if thinking_chunk is None:
    thinking_chunk = (obj.model_extra or {}).get("reasoning")
if thinking_chunk is not None:
    mot._thinking += thinking_chunk

This also closes an edge case where a proxy could send
{"reasoning_content": null, "reasoning": "trace"} — the old hasattr/else
shape would have dropped the trace in that scenario. Using model_extra instead
of model_dump() also avoids a full Pydantic serialisation per streaming chunk
for non-thinking models.

A follow-up issue (#1070) has been filed for the same gap in LiteLLMBackend,
where the fix is more involved as LiteLLM drops the reasoning field during its
own normalisation layer.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

New unit tests in test/backends/test_openai_unit.py (25 passed):

test_processing_captures_vllm_reasoning_field — non-streaming vLLM shape populates mot._thinking
test_processing_vllm_reasoning_with_null_content — reasoning captured when content is null
test_processing_streaming_captures_vllm_reasoning_field — streaming deltas with reasoning accumulate correctly
test_processing_reasoning_content_still_used — regression guard for the reasoning_content attribute path
test_processing_reasoning_content_takes_precedence_over_reasoning — pins that reasoning_content wins when both fields are present

Attribution

AI coding assistants used

vLLM served with --reasoning-parser qwen3 (and other thinking models) surfaces the reasoning trace under the "reasoning" key of the raw message dict, not as a Python attribute on the openai SDK object. The existing hasattr(message, "reasoning_content") probe therefore missed it, and mot._thinking was silently left unpopulated for Qwen3 and similar models. Add a fallback in both the non-streaming (ChatCompletion) and streaming (ChatCompletionChunk) branches of processing(): when reasoning_content is absent, probe the raw .model_dump() for a "reasoning" key. The existing reasoning_content path is preserved untouched. Closes generative-computing#1061 Assisted-by: Claude Code

github-actions · 2026-05-13T09:18:56Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

…soning Replace the `hasattr(message, "reasoning_content") / else model_dump()` pattern with a unified value-based probe: thinking_chunk = getattr(obj, "reasoning_content", None) if thinking_chunk is None: thinking_chunk = (obj.model_extra or {}).get("reasoning") This closes two gaps identified in code review: 1. Edge case: if a proxy sends `{"reasoning_content": null, "reasoning": "trace"}`, the old hasattr guard was True, the None check short-circuited, and the else branch never fired — silently dropping the trace. The value-based check falls through correctly. 2. Performance: the old else branch called model_dump() on every streaming delta for non-thinking models (gpt-4o etc.), allocating a full dict per token. model_extra is already a plain dict — O(1) lookup, no serialisation. Also adds test_processing_reasoning_content_takes_precedence_over_reasoning to pin that reasoning_content wins when both fields are present on the same message object. Closes generative-computing#1061 Assisted-by: Claude Code

psschwei

nits but LGTM otherwise

psschwei · 2026-05-13T17:44:56Z

@@ -1 +1 @@
 """Unit tests for OpenAI backend pure-logic helpers — no API calls required.


Suggested change

"""Unit tests for OpenAI backend pure-logic helpers.

since we're calling backend.processing() in some of the new tests

psschwei · 2026-05-13T17:49:49Z

-                thinking_chunk = message.reasoning_content  # type: ignore
-                if thinking_chunk is not None:
-                    mot._thinking += thinking_chunk
+            # reasoning_content (Anthropic/DeepSeek attribute path) takes priority;


nit: I think Anthropic has a different API (?)

psschwei · 2026-05-13T17:50:43Z

+            thinking_chunk = getattr(message_delta, "reasoning_content", None)
+            if thinking_chunk is None:
+                thinking_chunk = (message_delta.model_extra or {}).get("reasoning")
+            if thinking_chunk is not None:
+                mot._thinking += thinking_chunk


github-actions Bot added the bug Something isn't working label May 12, 2026

planetf1 force-pushed the fix/1061-openai-reasoning-field branch from 7e618c5 to 42cee39 Compare May 12, 2026 13:31

planetf1 mentioned this pull request May 13, 2026

LiteLLMBackend: mot._thinking not populated for vLLM thinking models #1070

Open

planetf1 marked this pull request as ready for review May 13, 2026 11:44

planetf1 requested a review from a team as a code owner May 13, 2026 11:44

planetf1 requested review from akihikokuroda and psschwei May 13, 2026 11:44

psschwei approved these changes May 13, 2026

View reviewed changes

akihikokuroda approved these changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backends): capture vLLM reasoning field in mot._thinking#1063

fix(backends): capture vLLM reasoning field in mot._thinking#1063
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:fix/1061-openai-reasoning-field

planetf1 commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

psschwei left a comment

Uh oh!

psschwei May 13, 2026

Uh oh!

psschwei May 13, 2026

Uh oh!

psschwei May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1 +1 @@
		"""Unit tests for OpenAI backend pure-logic helpers — no API calls required.

Conversation

planetf1 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Attribution

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

psschwei left a comment

Choose a reason for hiding this comment

Uh oh!

psschwei May 13, 2026

Choose a reason for hiding this comment

Uh oh!

psschwei May 13, 2026

Choose a reason for hiding this comment

Uh oh!

psschwei May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

planetf1 commented May 12, 2026 •

edited

Loading