Skip to content

fix(backends): capture vLLM reasoning field in mot._thinking#1063

Open
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:fix/1061-openai-reasoning-field
Open

fix(backends): capture vLLM reasoning field in mot._thinking#1063
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:fix/1061-openai-reasoning-field

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented May 12, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

When OpenAIBackend is used with a vLLM-served thinking model (e.g. Qwen3
with --reasoning-parser qwen3), the reasoning trace was silently discarded
mot._thinking was never populated.

OpenAIBackend.processing() probed for the trace via
hasattr(message, "reasoning_content"). vLLM 0.20.2 does not expose this as
an attribute on the openai SDK message object; the trace lives under the raw
key "reasoning", accessible via model_extra. The hasattr check returned
False, so mot._thinking was never assigned. Both the non-streaming
(ChatCompletion) and streaming (ChatCompletionChunk) branches were affected.

The fix uses a value-based probe in both branches:

thinking_chunk = getattr(obj, "reasoning_content", None)
if thinking_chunk is None:
    thinking_chunk = (obj.model_extra or {}).get("reasoning")
if thinking_chunk is not None:
    mot._thinking += thinking_chunk

This also closes an edge case where a proxy could send
{"reasoning_content": null, "reasoning": "trace"} — the old hasattr/else
shape would have dropped the trace in that scenario. Using model_extra instead
of model_dump() also avoids a full Pydantic serialisation per streaming chunk
for non-thinking models.

A follow-up issue (#1070) has been filed for the same gap in LiteLLMBackend,
where the fix is more involved as LiteLLM drops the reasoning field during its
own normalisation layer.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

New unit tests in test/backends/test_openai_unit.py (25 passed):

  • test_processing_captures_vllm_reasoning_field — non-streaming vLLM shape populates mot._thinking
  • test_processing_vllm_reasoning_with_null_content — reasoning captured when content is null
  • test_processing_streaming_captures_vllm_reasoning_field — streaming deltas with reasoning accumulate correctly
  • test_processing_reasoning_content_still_used — regression guard for the reasoning_content attribute path
  • test_processing_reasoning_content_takes_precedence_over_reasoning — pins that reasoning_content wins when both fields are present

Attribution

  • AI coding assistants used

@github-actions github-actions Bot added the bug Something isn't working label May 12, 2026
vLLM served with --reasoning-parser qwen3 (and other thinking models)
surfaces the reasoning trace under the "reasoning" key of the raw
message dict, not as a Python attribute on the openai SDK object. The
existing hasattr(message, "reasoning_content") probe therefore missed
it, and mot._thinking was silently left unpopulated for Qwen3 and
similar models.

Add a fallback in both the non-streaming (ChatCompletion) and streaming
(ChatCompletionChunk) branches of processing(): when reasoning_content
is absent, probe the raw .model_dump() for a "reasoning" key. The
existing reasoning_content path is preserved untouched.

Closes generative-computing#1061

Assisted-by: Claude Code
@planetf1 planetf1 force-pushed the fix/1061-openai-reasoning-field branch from 7e618c5 to 42cee39 Compare May 12, 2026 13:31
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

…soning

Replace the `hasattr(message, "reasoning_content") / else model_dump()`
pattern with a unified value-based probe:

    thinking_chunk = getattr(obj, "reasoning_content", None)
    if thinking_chunk is None:
        thinking_chunk = (obj.model_extra or {}).get("reasoning")

This closes two gaps identified in code review:

1. Edge case: if a proxy sends `{"reasoning_content": null, "reasoning":
   "trace"}`, the old hasattr guard was True, the None check
   short-circuited, and the else branch never fired — silently dropping
   the trace. The value-based check falls through correctly.

2. Performance: the old else branch called model_dump() on every streaming
   delta for non-thinking models (gpt-4o etc.), allocating a full dict per
   token. model_extra is already a plain dict — O(1) lookup, no
   serialisation.

Also adds test_processing_reasoning_content_takes_precedence_over_reasoning
to pin that reasoning_content wins when both fields are present on the same
message object.

Closes generative-computing#1061

Assisted-by: Claude Code
@planetf1 planetf1 marked this pull request as ready for review May 13, 2026 11:44
@planetf1 planetf1 requested a review from a team as a code owner May 13, 2026 11:44
Copy link
Copy Markdown
Member

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits but LGTM otherwise

@@ -1 +1 @@
"""Unit tests for OpenAI backend pure-logic helpers — no API calls required.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Unit tests for OpenAI backend pure-logic helpers.

since we're calling backend.processing() in some of the new tests

Comment thread mellea/backends/openai.py
thinking_chunk = message.reasoning_content # type: ignore
if thinking_chunk is not None:
mot._thinking += thinking_chunk
# reasoning_content (Anthropic/DeepSeek attribute path) takes priority;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think Anthropic has a different API (?)

Comment thread mellea/backends/openai.py
Comment on lines +1021 to +1025
thinking_chunk = getattr(message_delta, "reasoning_content", None)
if thinking_chunk is None:
thinking_chunk = (message_delta.model_extra or {}).get("reasoning")
if thinking_chunk is not None:
mot._thinking += thinking_chunk
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI backend: mot._thinking never populated for vLLM thinking models (reasoning vs reasoning_content field name)

3 participants