Skip to content

fix: handle list format content from OpenAI-compatible APIs#5128

Merged
Soulter merged 3 commits intoAstrBotDevs:masterfrom
NayukiChiba:fix/openai-content-list-format
Feb 15, 2026
Merged

fix: handle list format content from OpenAI-compatible APIs#5128
Soulter merged 3 commits intoAstrBotDevs:masterfrom
NayukiChiba:fix/openai-content-list-format

Conversation

@NayukiChiba
Copy link
Contributor

@NayukiChiba NayukiChiba commented Feb 15, 2026

Summary

  • Fix issue where some LLM providers (e.g., GLM-4.5V via SiliconFlow) return content as [{'type': 'text', 'text': '...'}] format, causing raw list representation to be displayed to users
  • Handle list[dict] content format in both streaming and non-streaming responses
  • Parse string content that looks like JSON-encoded list format
  • Clean up orphan </think> tags that may leak from some models

Test plan

  • Test with SiliconFlow GLM-4.5V model to confirm [{'type': 'text', ...}] format no longer appears
  • Test both streaming and non-streaming responses
  • Verify normal string format content is not affected

Fixes #5124

Summary by Sourcery

处理非标准的 OpenAI 兼容内容格式,并清理出现在 LLM 响应中的泄漏推理标签。

Bug 修复:

  • 规范来自 OpenAI 兼容提供方的列表格式和 JSON 编码列表格式的内容,在流式和非流式响应中都进行处理,防止原始列表输出直接展示给用户。
  • 移除某些模型可能泄漏的孤立 </think> 标签,确保它们不会出现在最终返回的文本中。
Original summary in English

Summary by Sourcery

Handle non-standard OpenAI-compatible content formats and clean up leaked reasoning tags in LLM responses.

Bug Fixes:

  • Normalize list-based and JSON-encoded list content formats from OpenAI-compatible providers in both streaming and non-streaming responses to prevent raw list output being shown to users.
  • Strip orphan tags that may leak from some models so they are not displayed in the final completion text.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 15, 2026
@dosubot
Copy link

dosubot bot commented Feb 15, 2026

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并给出了一些整体反馈:

  • 用于规范化 content 的逻辑(处理 list[dict]、类 JSON 字符串以及回退到 str)在流式和非流式路径中都有重复;建议将其提取为一个共享的辅助函数,以保持行为一致并便于维护。
  • 你在对模型输出字符串进行解析为类 JSON 列表时使用了 ast.literal_eval;从安全性和可预测性角度来看,优先使用 json.loads 可能会更好(可以在此之前先规范化引号),并且应在合理的大小/结构检查之后再进行解析,以避免意外的开销或对大型任意载荷的解析。
  • 在流式分支中,用于检查 JSON 编码列表的逻辑是在原始字符串上直接使用 startswith("[")/endswith("]"),且未去除空白字符,而非流式分支则会先调用 .strip();将这些行为对齐(例如总是先去除空白再检查)可以避免细微的不一致。
给 AI Agent 的提示
Please address the comments from this code review:

## Overall Comments
- The logic for normalizing `content` (handling `list[dict]`, JSON-like strings, and fallback to `str`) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain.
- You're using `ast.literal_eval` on model output strings to parse JSON-like lists; it might be safer and more predictable to prefer `json.loads` (possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads.
- In the streaming branch, the check for JSON-encoded lists uses `startswith("[")`/`endswith("]")` on the raw string without trimming whitespace, whereas the non-streaming branch calls `.strip()` first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.

## Individual Comments

### Comment 1
<location> `astrbot/core/provider/sources/openai_source.py:326-335` </location>
<code_context>
+                # Handle both string and list[dict] content formats
</code_context>

<issue_to_address>
**suggestion:** Factor out shared content-normalization logic to avoid duplication and divergence.

The normalization of `raw_content` into `completion_text` in `_query_stream` mirrors the logic in `_parse_openai_completion` (list/dict/str handling, nested list parsing, etc.). To avoid divergence between streaming and non-streaming paths, consider extracting a helper like `normalize_completion_content(raw_content: Any) -> str` and using it in both places.
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得我们的代码审查有帮助,请考虑分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的代码审查。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • The logic for normalizing content (handling list[dict], JSON-like strings, and fallback to str) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain.
  • You're using ast.literal_eval on model output strings to parse JSON-like lists; it might be safer and more predictable to prefer json.loads (possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads.
  • In the streaming branch, the check for JSON-encoded lists uses startswith("[")/endswith("]") on the raw string without trimming whitespace, whereas the non-streaming branch calls .strip() first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The logic for normalizing `content` (handling `list[dict]`, JSON-like strings, and fallback to `str`) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain.
- You're using `ast.literal_eval` on model output strings to parse JSON-like lists; it might be safer and more predictable to prefer `json.loads` (possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads.
- In the streaming branch, the check for JSON-encoded lists uses `startswith("[")`/`endswith("]")` on the raw string without trimming whitespace, whereas the non-streaming branch calls `.strip()` first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.

## Individual Comments

### Comment 1
<location> `astrbot/core/provider/sources/openai_source.py:326-335` </location>
<code_context>
+                # Handle both string and list[dict] content formats
</code_context>

<issue_to_address>
**suggestion:** Factor out shared content-normalization logic to avoid duplication and divergence.

The normalization of `raw_content` into `completion_text` in `_query_stream` mirrors the logic in `_parse_openai_completion` (list/dict/str handling, nested list parsing, etc.). To avoid divergence between streaming and non-streaming paths, consider extracting a helper like `normalize_completion_content(raw_content: Any) -> str` and using it in both places.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Feb 15, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d75b35b8d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@NayukiChiba NayukiChiba force-pushed the fix/openai-content-list-format branch from d75b35b to dfea718 Compare February 15, 2026 14:49
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfea718338

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@NayukiChiba NayukiChiba force-pushed the fix/openai-content-list-format branch from dfea718 to 1d8195e Compare February 15, 2026 14:54
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d8195ed19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Some LLM providers (e.g., GLM-4.5V via SiliconFlow) return content as
list[dict] format like [{'type': 'text', 'text': '...'}] instead of
plain string. This causes the raw list representation to be displayed
to users.

Changes:
- Add _normalize_content() helper to extract text from various content formats
- Use json.loads instead of ast.literal_eval for safer parsing
- Add size limit check (8KB) before attempting JSON parsing
- Only convert lists that match OpenAI content-part schema (has 'type': 'text')
  to avoid collapsing legitimate list-literal replies like ['foo', 'bar']
- Add strip parameter to preserve whitespace in streaming chunks
- Clean up orphan </think> tags that may leak from some models

Fixes AstrBotDevs#5124
@NayukiChiba NayukiChiba force-pushed the fix/openai-content-list-format branch from 1d8195e to 8e5223e Compare February 15, 2026 14:59
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e5223e7c7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Try json.loads first, fallback to ast.literal_eval for single-quoted
  Python literals to avoid corrupting apostrophes (e.g., "don't")
- Coerce text values to str to handle null or non-string text fields
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8af47d0987

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (
check_content.startswith("[")
and check_content.endswith("]")
and len(check_content) < 8192

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse long list-encoded content beyond 8KB

The normalization path stops attempting list decoding when a bracketed string reaches 8192 characters, so long responses from OpenAI-compatible providers that serialize content parts (the exact format this patch targets) will fall back to raw [{"type":"text",...}] text instead of plain assistant output. This creates a size-dependent regression where short answers are fixed but long-form answers or accumulated streamed completions are still corrupted.

Useful? React with 👍 / 👎.

)
completion_text = reasoning_pattern.sub("", completion_text).strip()
# Also clean up orphan </think> tags that may leak from some models
completion_text = re.sub(r"</think>\s*$", "", completion_text).strip()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict orphan stripping to leakage cases

The new unconditional </think> tail removal truncates valid model output whenever a response intentionally ends with that literal tag (for example XML/HTML or prompt-echoed text), because _parse_openai_completion now deletes it even when no leaked <think>...</think> reasoning block exists. This silently alters user-visible content and should be gated to confirmed leakage scenarios.

Useful? React with 👍 / 👎.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 15, 2026
@Soulter Soulter merged commit 79e239a into AstrBotDevs:master Feb 15, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]bot在回复信息时会出现[{'type': 'text', 'text': '

2 participants