Skip to content

fix(backends): raise error when OpenAI backend receives content=None#1062

Open
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:fix/1060-openai-empty-content
Open

fix(backends): raise error when OpenAI backend receives content=None#1062
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:fix/1060-openai-empty-content

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented May 12, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Detect silent empty-content responses in the OpenAI backend when the API returns content=None with finish_reason=stop and non-zero completion_tokens. This is the signature of a thinking-mode model (e.g. Qwen3 via vLLM --reasoning-parser qwen3) that emitted only reasoning tokens.

Raises a RuntimeError in post_processing() with an actionable message suggesting enable_thinking: False via model_options. The check covers both streaming and non-streaming paths (post_processing is shared) and skips when tool calls are present, since those legitimately permit empty content.

Before

session = start_session(
    "openai",
    model_id="Qwen/Qwen3-Coder-Next-FP8",
    base_url="http://localhost:59691/v1",
    api_key="unused",
)
result = session.instruct("What is 2 + 2?")
print(repr(result.value))        # '' — silent empty string
print(result.generation.usage)   # {'completion_tokens': 9, ...}

After

RuntimeError: OpenAI backend received an empty response (content=None) with
finish_reason=stop and completion_tokens=9. This typically indicates a
thinking-mode model that emitted only reasoning tokens. For OpenAI-compatible
thinking models, disable thinking via model_options, e.g.:
model_options={"extra_body": {"chat_template_kwargs": {"enable_thinking": False}}}.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Assisted-by: Claude Code

@github-actions github-actions Bot added the bug Something isn't working label May 12, 2026
Thinking-mode models (e.g. Qwen3 served via vLLM with --reasoning-parser
qwen3) return content=None with finish_reason=stop and non-zero
completion_tokens when they emit only reasoning tokens. The OpenAI
backend silently skipped content accumulation, leaving the caller with
an empty string and no indication that anything went wrong.

Detect this case in post_processing and raise RuntimeError with a
message suggesting enable_thinking: False via model_options. The check
covers both streaming and non-streaming paths (post_processing is shared)
and skips when tool_calls are present, since those legitimately permit
empty content.

Closes generative-computing#1060

Assisted-by: Claude Code
@planetf1 planetf1 force-pushed the fix/1060-openai-empty-content branch from b37b587 to 601a224 Compare May 12, 2026 13:28
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

…y-content raise

Before this commit, the RuntimeError for thinking-mode empty responses was raised
mid-post_processing, leaving the OTel span unclosed (the cleanup block runs later).
base.py only catches generation-time exceptions from the chunk queue, not exceptions
from _post_process itself, so the span leaked on every thinking-mode failure.

Changes:
- Move model/provider metadata assignment before the guard so all fields satisfy
  the backend telemetry contract even when raising
- Build the RuntimeError first, close the span via set_span_error/end_backend_span,
  then raise — no span leaks on the error path
- Scope the enable_thinking hint to vLLM/Qwen3; add generic "consult your runtime's
  docs" for other providers
- Include reasoning content length in the error message when mot._thinking is set
- Add streaming-path regression test (oai_chat_response_choice absent)
- Add tool_calls bypass test
- Remove redundant @pytest.mark.asyncio decorators (asyncio_mode=auto in pyproject)
- Fix tool_calls param type annotation to list[dict] | None

Assisted-by: Claude Code
@planetf1 planetf1 marked this pull request as ready for review May 13, 2026 11:25
@planetf1 planetf1 requested a review from a team as a code owner May 13, 2026 11:25
@planetf1 planetf1 requested review from ajbozarth and jakelorocco May 13, 2026 11:25
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes inline — nothing on the error-path span cleanup before the raise since that's getting rewritten anyway.

Comment thread mellea/backends/openai.py
and not mot.tool_calls
):
thinking_note = (
f" Reasoning content ({len(mot._thinking)} chars) is in mot._thinking."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the error message points users at mot._thinking, which is a private attribute today. #909 item 4 is considering whether to promote it to public — happy to leave this as-is and let the message track whatever shakes out of that issue, but worth flagging.

Comment thread mellea/backends/openai.py

# content=None with stop+tokens means thinking-only mode; surface it rather than returning "".
finish_reason = choice_response.get("finish_reason")
completion_tokens = usage.get("completion_tokens", 0) if usage else 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When usage is unavailable (e.g. streaming without stream_options={"include_usage": True}, or an OpenAI-compatible provider that omits it), completion_tokens falls back to 0 and the guard silently passes — same empty-string symptom the fix is meant to surface. Not a regression, but worth a comment here or a line in the error so the next person hitting it knows why the guard didn't fire for them.

finish_reason: str = "stop",
content: str | None = None,
completion_tokens: int = 9,
tool_calls: list[dict] | None = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: the helper hand-assembles mot._meta["oai_chat_response"] and oai_chat_response_choice. Driving a real ChatCompletion fixture through processing() instead would catch future drift in those meta keys. Fine as a unit test; mentioning in case you want the broader coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI backend silently returns empty string when model produces no text content (content=None)

2 participants