fix(backends): raise error when OpenAI backend receives content=None by planetf1 · Pull Request #1062 · generative-computing/mellea

planetf1 · 2026-05-12T13:25:17Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes OpenAI backend silently returns empty string when model produces no text content (content=None) #1060

Detect silent empty-content responses in the OpenAI backend when the API returns content=None with finish_reason=stop and non-zero completion_tokens. This is the signature of a thinking-mode model (e.g. Qwen3 via vLLM --reasoning-parser qwen3) that emitted only reasoning tokens.

Raises a RuntimeError in post_processing() with an actionable message suggesting enable_thinking: False via model_options. The check covers both streaming and non-streaming paths (post_processing is shared) and skips when tool calls are present, since those legitimately permit empty content.

Before

session = start_session(
    "openai",
    model_id="Qwen/Qwen3-Coder-Next-FP8",
    base_url="http://localhost:59691/v1",
    api_key="unused",
)
result = session.instruct("What is 2 + 2?")
print(repr(result.value))        # '' — silent empty string
print(result.generation.usage)   # {'completion_tokens': 9, ...}

After

RuntimeError: OpenAI backend received an empty response (content=None) with
finish_reason=stop and completion_tokens=9. This typically indicates a
thinking-mode model that emitted only reasoning tokens. For OpenAI-compatible
thinking models, disable thinking via model_options, e.g.:
model_options={"extra_body": {"chat_template_kwargs": {"enable_thinking": False}}}.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

Assisted-by: Claude Code

Thinking-mode models (e.g. Qwen3 served via vLLM with --reasoning-parser qwen3) return content=None with finish_reason=stop and non-zero completion_tokens when they emit only reasoning tokens. The OpenAI backend silently skipped content accumulation, leaving the caller with an empty string and no indication that anything went wrong. Detect this case in post_processing and raise RuntimeError with a message suggesting enable_thinking: False via model_options. The check covers both streaming and non-streaming paths (post_processing is shared) and skips when tool_calls are present, since those legitimately permit empty content. Closes generative-computing#1060 Assisted-by: Claude Code

github-actions · 2026-05-13T09:18:34Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

…y-content raise Before this commit, the RuntimeError for thinking-mode empty responses was raised mid-post_processing, leaving the OTel span unclosed (the cleanup block runs later). base.py only catches generation-time exceptions from the chunk queue, not exceptions from _post_process itself, so the span leaked on every thinking-mode failure. Changes: - Move model/provider metadata assignment before the guard so all fields satisfy the backend telemetry contract even when raising - Build the RuntimeError first, close the span via set_span_error/end_backend_span, then raise — no span leaks on the error path - Scope the enable_thinking hint to vLLM/Qwen3; add generic "consult your runtime's docs" for other providers - Include reasoning content length in the error message when mot._thinking is set - Add streaming-path regression test (oai_chat_response_choice absent) - Add tool_calls bypass test - Remove redundant @pytest.mark.asyncio decorators (asyncio_mode=auto in pyproject) - Fix tool_calls param type annotation to list[dict] | None Assisted-by: Claude Code

ajbozarth

A few notes inline — nothing on the error-path span cleanup before the raise since that's getting rewritten anyway.

ajbozarth · 2026-05-13T17:42:34Z

+            and not mot.tool_calls
+        ):
+            thinking_note = (
+                f" Reasoning content ({len(mot._thinking)} chars) is in mot._thinking."


Nit: the error message points users at mot._thinking, which is a private attribute today. #909 item 4 is considering whether to promote it to public — happy to leave this as-is and let the message track whatever shakes out of that issue, but worth flagging.

ajbozarth · 2026-05-13T17:42:34Z


+        # content=None with stop+tokens means thinking-only mode; surface it rather than returning "".
+        finish_reason = choice_response.get("finish_reason")
+        completion_tokens = usage.get("completion_tokens", 0) if usage else 0


When usage is unavailable (e.g. streaming without stream_options={"include_usage": True}, or an OpenAI-compatible provider that omits it), completion_tokens falls back to 0 and the guard silently passes — same empty-string symptom the fix is meant to surface. Not a regression, but worth a comment here or a line in the error so the next person hitting it knows why the guard didn't fire for them.

ajbozarth · 2026-05-13T17:42:34Z

+    finish_reason: str = "stop",
+    content: str | None = None,
+    completion_tokens: int = 9,
+    tool_calls: list[dict] | None = None,


Optional: the helper hand-assembles mot._meta["oai_chat_response"] and oai_chat_response_choice. Driving a real ChatCompletion fixture through processing() instead would catch future drift in those meta keys. Fine as a unit test; mentioning in case you want the broader coverage.

github-actions Bot added the bug Something isn't working label May 12, 2026

planetf1 force-pushed the fix/1060-openai-empty-content branch from b37b587 to 601a224 Compare May 12, 2026 13:28

planetf1 marked this pull request as ready for review May 13, 2026 11:25

planetf1 requested a review from a team as a code owner May 13, 2026 11:25

planetf1 requested review from ajbozarth and jakelorocco May 13, 2026 11:25

ajbozarth reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backends): raise error when OpenAI backend receives content=None#1062

fix(backends): raise error when OpenAI backend receives content=None#1062
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:fix/1060-openai-empty-content

planetf1 commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

ajbozarth left a comment

Uh oh!

ajbozarth May 13, 2026

Uh oh!

ajbozarth May 13, 2026

Uh oh!

ajbozarth May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

planetf1 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Before

After

Testing

Attribution

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 13, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 13, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

planetf1 commented May 12, 2026 •

edited

Loading