fix: handle list format content from OpenAI-compatible APIs#5128
fix: handle list format content from OpenAI-compatible APIs#5128Soulter merged 3 commits intoAstrBotDevs:masterfrom
Conversation
There was a problem hiding this comment.
Hey - 我发现了 1 个问题,并给出了一些整体反馈:
- 用于规范化
content的逻辑(处理list[dict]、类 JSON 字符串以及回退到str)在流式和非流式路径中都有重复;建议将其提取为一个共享的辅助函数,以保持行为一致并便于维护。 - 你在对模型输出字符串进行解析为类 JSON 列表时使用了
ast.literal_eval;从安全性和可预测性角度来看,优先使用json.loads可能会更好(可以在此之前先规范化引号),并且应在合理的大小/结构检查之后再进行解析,以避免意外的开销或对大型任意载荷的解析。 - 在流式分支中,用于检查 JSON 编码列表的逻辑是在原始字符串上直接使用
startswith("[")/endswith("]"),且未去除空白字符,而非流式分支则会先调用.strip();将这些行为对齐(例如总是先去除空白再检查)可以避免细微的不一致。
给 AI Agent 的提示
Please address the comments from this code review:
## Overall Comments
- The logic for normalizing `content` (handling `list[dict]`, JSON-like strings, and fallback to `str`) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain.
- You're using `ast.literal_eval` on model output strings to parse JSON-like lists; it might be safer and more predictable to prefer `json.loads` (possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads.
- In the streaming branch, the check for JSON-encoded lists uses `startswith("[")`/`endswith("]")` on the raw string without trimming whitespace, whereas the non-streaming branch calls `.strip()` first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.
## Individual Comments
### Comment 1
<location> `astrbot/core/provider/sources/openai_source.py:326-335` </location>
<code_context>
+ # Handle both string and list[dict] content formats
</code_context>
<issue_to_address>
**suggestion:** Factor out shared content-normalization logic to avoid duplication and divergence.
The normalization of `raw_content` into `completion_text` in `_query_stream` mirrors the logic in `_parse_openai_completion` (list/dict/str handling, nested list parsing, etc.). To avoid divergence between streaming and non-streaming paths, consider extracting a helper like `normalize_completion_content(raw_content: Any) -> str` and using it in both places.
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的代码审查。
Original comment in English
Hey - I've found 1 issue, and left some high level feedback:
- The logic for normalizing
content(handlinglist[dict], JSON-like strings, and fallback tostr) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain. - You're using
ast.literal_evalon model output strings to parse JSON-like lists; it might be safer and more predictable to preferjson.loads(possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads. - In the streaming branch, the check for JSON-encoded lists uses
startswith("[")/endswith("]")on the raw string without trimming whitespace, whereas the non-streaming branch calls.strip()first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The logic for normalizing `content` (handling `list[dict]`, JSON-like strings, and fallback to `str`) is duplicated in both the streaming and non-streaming paths; consider extracting this into a shared helper to keep behavior consistent and easier to maintain.
- You're using `ast.literal_eval` on model output strings to parse JSON-like lists; it might be safer and more predictable to prefer `json.loads` (possibly after normalizing quotes) and to guard this parsing behind reasonable size/structure checks to avoid unexpected overhead or parsing of large arbitrary payloads.
- In the streaming branch, the check for JSON-encoded lists uses `startswith("[")`/`endswith("]")` on the raw string without trimming whitespace, whereas the non-streaming branch calls `.strip()` first; aligning these behaviors (e.g., always trimming before the check) would avoid subtle inconsistencies.
## Individual Comments
### Comment 1
<location> `astrbot/core/provider/sources/openai_source.py:326-335` </location>
<code_context>
+ # Handle both string and list[dict] content formats
</code_context>
<issue_to_address>
**suggestion:** Factor out shared content-normalization logic to avoid duplication and divergence.
The normalization of `raw_content` into `completion_text` in `_query_stream` mirrors the logic in `_parse_openai_completion` (list/dict/str handling, nested list parsing, etc.). To avoid divergence between streaming and non-streaming paths, consider extracting a helper like `normalize_completion_content(raw_content: Any) -> str` and using it in both places.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d75b35b8d5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
d75b35b to
dfea718
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dfea718338
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
dfea718 to
1d8195e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d8195ed19
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Some LLM providers (e.g., GLM-4.5V via SiliconFlow) return content as
list[dict] format like [{'type': 'text', 'text': '...'}] instead of
plain string. This causes the raw list representation to be displayed
to users.
Changes:
- Add _normalize_content() helper to extract text from various content formats
- Use json.loads instead of ast.literal_eval for safer parsing
- Add size limit check (8KB) before attempting JSON parsing
- Only convert lists that match OpenAI content-part schema (has 'type': 'text')
to avoid collapsing legitimate list-literal replies like ['foo', 'bar']
- Add strip parameter to preserve whitespace in streaming chunks
- Clean up orphan </think> tags that may leak from some models
Fixes AstrBotDevs#5124
1d8195e to
8e5223e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e5223e7c7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Try json.loads first, fallback to ast.literal_eval for single-quoted Python literals to avoid corrupting apostrophes (e.g., "don't") - Coerce text values to str to handle null or non-string text fields
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8af47d0987
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if ( | ||
| check_content.startswith("[") | ||
| and check_content.endswith("]") | ||
| and len(check_content) < 8192 |
There was a problem hiding this comment.
Parse long list-encoded content beyond 8KB
The normalization path stops attempting list decoding when a bracketed string reaches 8192 characters, so long responses from OpenAI-compatible providers that serialize content parts (the exact format this patch targets) will fall back to raw [{"type":"text",...}] text instead of plain assistant output. This creates a size-dependent regression where short answers are fixed but long-form answers or accumulated streamed completions are still corrupted.
Useful? React with 👍 / 👎.
| ) | ||
| completion_text = reasoning_pattern.sub("", completion_text).strip() | ||
| # Also clean up orphan </think> tags that may leak from some models | ||
| completion_text = re.sub(r"</think>\s*$", "", completion_text).strip() |
There was a problem hiding this comment.
Restrict orphan stripping to leakage cases
The new unconditional </think> tail removal truncates valid model output whenever a response intentionally ends with that literal tag (for example XML/HTML or prompt-echoed text), because _parse_openai_completion now deletes it even when no leaked <think>...</think> reasoning block exists. This silently alters user-visible content and should be gated to confirmed leakage scenarios.
Useful? React with 👍 / 👎.
Summary
[{'type': 'text', 'text': '...'}]format, causing raw list representation to be displayed to users</think>tags that may leak from some modelsTest plan
[{'type': 'text', ...}]format no longer appearsFixes #5124
Summary by Sourcery
处理非标准的 OpenAI 兼容内容格式,并清理出现在 LLM 响应中的泄漏推理标签。
Bug 修复:
</think>标签,确保它们不会出现在最终返回的文本中。Original summary in English
Summary by Sourcery
Handle non-standard OpenAI-compatible content formats and clean up leaked reasoning tags in LLM responses.
Bug Fixes: