fix: handle list format content from OpenAI-compatible APIs by NayukiChiba · Pull Request #5128 · AstrBotDevs/AstrBot

NayukiChiba · 2026-02-15T14:43:19Z

Summary

Fix issue where some LLM providers (e.g., GLM-4.5V via SiliconFlow) return content as [{'type': 'text', 'text': '...'}] format, causing raw list representation to be displayed to users
Handle list[dict] content format in both streaming and non-streaming responses
Parse string content that looks like JSON-encoded list format
Clean up orphan </think> tags that may leak from some models

Test plan

Test with SiliconFlow GLM-4.5V model to confirm [{'type': 'text', ...}] format no longer appears
Test both streaming and non-streaming responses
Verify normal string format content is not affected

Fixes #5124

Summary by Sourcery

处理非标准的 OpenAI 兼容内容格式，并清理出现在 LLM 响应中的泄漏推理标签。

Bug 修复：

规范来自 OpenAI 兼容提供方的列表格式和 JSON 编码列表格式的内容，在流式和非流式响应中都进行处理，防止原始列表输出直接展示给用户。
移除某些模型可能泄漏的孤立 </think> 标签，确保它们不会出现在最终返回的文本中。

Original summary in English

Summary by Sourcery

Handle non-standard OpenAI-compatible content formats and clean up leaked reasoning tags in LLM responses.

Bug Fixes:

Normalize list-based and JSON-encoded list content formats from OpenAI-compatible providers in both streaming and non-streaming responses to prevent raw list output being shown to users.
Strip orphan tags that may leak from some models so they are not displayed in the final completion text.

dosubot · 2026-02-15T14:43:32Z

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

sourcery-ai

Hey - 我发现了 1 个问题，并给出了一些整体反馈：

用于规范化 content 的逻辑（处理 list[dict]、类 JSON 字符串以及回退到 str）在流式和非流式路径中都有重复；建议将其提取为一个共享的辅助函数，以保持行为一致并便于维护。
你在对模型输出字符串进行解析为类 JSON 列表时使用了 ast.literal_eval；从安全性和可预测性角度来看，优先使用 json.loads 可能会更好（可以在此之前先规范化引号），并且应在合理的大小/结构检查之后再进行解析，以避免意外的开销或对大型任意载荷的解析。
在流式分支中，用于检查 JSON 编码列表的逻辑是在原始字符串上直接使用 startswith("[")/endswith("]")，且未去除空白字符，而非流式分支则会先调用 .strip()；将这些行为对齐（例如总是先去除空白再检查）可以避免细微的不一致。

给 AI Agent 的提示

Please address the comments from this code review:

## Overall Comments
- The logic for normalizing `content` (handling `list[dict]`, JSON-like strings, and fallback to `str`) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain.
- You're using `ast.literal_eval` on model output strings to parse JSON-like lists; it might be safer and more predictable to prefer `json.loads` (possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads.
- In the streaming branch, the check for JSON-encoded lists uses `startswith("[")`/`endswith("]")` on the raw string without trimming whitespace, whereas the non-streaming branch calls `.strip()` first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.

## Individual Comments

### Comment 1
<location> `astrbot/core/provider/sources/openai_source.py:326-335` </location>
<code_context>
+                # Handle both string and list[dict] content formats
</code_context>

<issue_to_address>
**suggestion:** Factor out shared content-normalization logic to avoid duplication and divergence.

The normalization of `raw_content` into `completion_text` in `_query_stream` mirrors the logic in `_parse_openai_completion` (list/dict/str handling, nested list parsing, etc.). To avoid divergence between streaming and non-streaming paths, consider extracting a helper like `normalize_completion_content(raw_content: Any) -> str` and using it in both places.
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得我们的代码审查有帮助，请考虑分享 ✨

_{帮我变得更有用！请在每条评论上点 👍 或 👎，我会根据你的反馈改进后续的代码审查。}

Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

The logic for normalizing content (handling list[dict], JSON-like strings, and fallback to str) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain.
You're using ast.literal_eval on model output strings to parse JSON-like lists; it might be safer and more predictable to prefer json.loads (possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads.
In the streaming branch, the check for JSON-encoded lists uses startswith("[")/endswith("]") on the raw string without trimming whitespace, whereas the non-streaming branch calls .strip() first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The logic for normalizing `content` (handling `list[dict]`, JSON-like strings, and fallback to `str`) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain.
- You're using `ast.literal_eval` on model output strings to parse JSON-like lists; it might be safer and more predictable to prefer `json.loads` (possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads.
- In the streaming branch, the check for JSON-encoded lists uses `startswith("[")`/`endswith("]")` on the raw string without trimming whitespace, whereas the non-streaming branch calls `.strip()` first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.

## Individual Comments

### Comment 1
<location> `astrbot/core/provider/sources/openai_source.py:326-335` </location>
<code_context>
+                # Handle both string and list[dict] content formats
</code_context>

<issue_to_address>
**suggestion:** Factor out shared content-normalization logic to avoid duplication and divergence.

The normalization of `raw_content` into `completion_text` in `_query_stream` mirrors the logic in `_parse_openai_completion` (list/dict/str handling, nested list parsing, etc.). To avoid divergence between streaming and non-streaming paths, consider extracting a helper like `normalize_completion_content(raw_content: Any) -> str` and using it in both places.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

astrbot/core/provider/sources/openai_source.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d75b35b8d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

astrbot/core/provider/sources/openai_source.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfea718338

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

astrbot/core/provider/sources/openai_source.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d8195ed19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

astrbot/core/provider/sources/openai_source.py

Some LLM providers (e.g., GLM-4.5V via SiliconFlow) return content as list[dict] format like [{'type': 'text', 'text': '...'}] instead of plain string. This causes the raw list representation to be displayed to users. Changes: - Add _normalize_content() helper to extract text from various content formats - Use json.loads instead of ast.literal_eval for safer parsing - Add size limit check (8KB) before attempting JSON parsing - Only convert lists that match OpenAI content-part schema (has 'type': 'text') to avoid collapsing legitimate list-literal replies like ['foo', 'bar'] - Add strip parameter to preserve whitespace in streaming chunks - Clean up orphan </think> tags that may leak from some models Fixes AstrBotDevs#5124

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e5223e7c7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

astrbot/core/provider/sources/openai_source.py

- Try json.loads first, fallback to ast.literal_eval for single-quoted Python literals to avoid corrupting apostrophes (e.g., "don't") - Coerce text values to str to handle null or non-string text fields

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8af47d0987

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-15T15:14:43Z

astrbot/core/provider/sources/openai_source.py

+            if (
+                check_content.startswith("[")
+                and check_content.endswith("]")
+                and len(check_content) < 8192


Parse long list-encoded content beyond 8KB

The normalization path stops attempting list decoding when a bracketed string reaches 8192 characters, so long responses from OpenAI-compatible providers that serialize content parts (the exact format this patch targets) will fall back to raw [{"type":"text",...}] text instead of plain assistant output. This creates a size-dependent regression where short answers are fixed but long-form answers or accumulated streamed completions are still corrupted.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-15T15:14:43Z

astrbot/core/provider/sources/openai_source.py

                )
                completion_text = reasoning_pattern.sub("", completion_text).strip()
+            # Also clean up orphan </think> tags that may leak from some models
+            completion_text = re.sub(r"</think>\s*$", "", completion_text).strip()


Restrict orphan stripping to leakage cases

The new unconditional </think> tail removal truncates valid model output whenever a response intentionally ends with that literal tag (for example XML/HTML or prompt-echoed text), because _parse_openai_completion now deletes it even when no leaked <think>...</think> reasoning block exists. This silently alters user-visible content and should be gated to confirmed leakage scenarios.

Useful? React with 👍 / 👎.

Merge branch 'master' of https://github.com/NayukiChiba/AstrBot

549cbb8

auto-assign bot requested review from LIghtJUNction and anka-afk February 15, 2026 14:43

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 15, 2026

sourcery-ai bot reviewed Feb 15, 2026

View reviewed changes

astrbot/core/provider/sources/openai_source.py Outdated Show resolved Hide resolved

dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Feb 15, 2026

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

astrbot/core/provider/sources/openai_source.py Outdated Show resolved Hide resolved

NayukiChiba force-pushed the fix/openai-content-list-format branch from d75b35b to dfea718 Compare February 15, 2026 14:49

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

astrbot/core/provider/sources/openai_source.py Outdated Show resolved Hide resolved

NayukiChiba force-pushed the fix/openai-content-list-format branch from dfea718 to 1d8195e Compare February 15, 2026 14:54

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

astrbot/core/provider/sources/openai_source.py Outdated Show resolved Hide resolved

NayukiChiba force-pushed the fix/openai-content-list-format branch from 1d8195e to 8e5223e Compare February 15, 2026 14:59

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

astrbot/core/provider/sources/openai_source.py Outdated Show resolved Hide resolved

fix: improve content normalization safety

8af47d0

- Try json.loads first, fallback to ast.literal_eval for single-quoted Python literals to avoid corrupting apostrophes (e.g., "don't") - Coerce text values to str to handle null or non-string text fields

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

Soulter approved these changes Feb 15, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 15, 2026

Soulter merged commit 79e239a into AstrBotDevs:master Feb 15, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: handle list format content from OpenAI-compatible APIs#5128

fix: handle list format content from OpenAI-compatible APIs#5128
Soulter merged 3 commits intoAstrBotDevs:masterfrom
NayukiChiba:fix/openai-content-list-format

NayukiChiba commented Feb 15, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

dosubot bot commented Feb 15, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

NayukiChiba commented Feb 15, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by Sourcery

Summary by Sourcery

Uh oh!

dosubot bot commented Feb 15, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NayukiChiba commented Feb 15, 2026 •

edited by sourcery-ai bot

Loading