[Feature] Add OpenAI-compatible tool_choice support for chat completions#7882
[Feature] Add OpenAI-compatible tool_choice support for chat completions#7882luukunn wants to merge 10 commits into
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
本 PR 为 FastDeploy 的 OpenAI 兼容 Chat Completions 增加 tool_choice(none/auto/required/指定函数工具)协议支持,并在强制工具调用(prompt 尾部被 chat template 注入未闭合的 tool-call 前缀)时,通过“前缀检测 + 补偿拼接”让现有 tool parser 在流式/非流式输出下都能继续正确解析工具调用。
Changes:
- 在 OpenAI 协议层新增
tool_choice相关模型定义,并把tool_choice接入ChatCompletionRequest。 - 在 chat serving → response processor → data processor 链路中透传渲染后的
prompt_tokens,用于检测 prompt 尾部注入的 tool-call 前缀。 - ToolParser 抽象层新增
detect_tool_prefix与 per-request 前缀状态;BaseTextProcessor 增加 forced tool choice 的前缀补偿逻辑,并补充对应单测覆盖。
建议同时确认是否需要在对外文档/示例(如 OpenAI 兼容 API 文档或用例)中补充 tool_choice 用法说明。
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/input/test_text_processor.py | 补齐 mock logger 方法;更新/新增用例覆盖 forced tool choice 识别与前缀补偿(流式/非流式)。 |
| tests/entrypoints/openai/tool_parsers/test_abstract_tool_parser.py | 新增 ToolParser.detect_tool_prefix 行为单测。 |
| fastdeploy/input/base_processor.py | 增加 _is_forced_tool_choice 与 _prepare_tool_prefix,并在工具解析前对输入做 prefix 补偿。 |
| fastdeploy/entrypoints/openai/tool_parsers/abstract_tool_parser.py | 为 ToolParser 增加 sentinel token 声明、前缀检测方法与 per-request 前缀状态字段。 |
| fastdeploy/entrypoints/openai/serving_chat.py | 将渲染后的 prompt_tokens 透传给 response processor。 |
| fastdeploy/entrypoints/openai/response_processors.py | 扩展 process_response_chat 签名以接收并下传 prompt_tokens。 |
| fastdeploy/entrypoints/openai/protocol.py | 新增 ChatCompletionNamedFunction/ChatCompletionNamedToolChoiceParam,并在 ChatCompletionRequest 中加入 tool_choice 字段。 |
Comments suppressed due to low confidence (1)
fastdeploy/entrypoints/openai/response_processors.py:111
- 在
process_response_chat的非异步分支里调用data_processor.process_response_dict(...)时没有透传prompt_tokens。这样BaseTextProcessor._prepare_tool_prefix无法检测 chat template 注入的 tool-call 前缀,导致tool_choice=required等场景在默认同步 processor 下仍然解析失败。建议在同步分支也传入prompt_tokens=prompt_tokens(与异步分支保持一致)。
response = self.data_processor.process_response_dict(
response_dict=request_output,
stream=stream,
include_stop_str_in_output=include_stop_str_in_output,
audio_tokens=all_audio_tokens,
| if prefix: | ||
| stream_previous = prefix + stream_previous | ||
| stream_current = prefix + stream_current | ||
| if not tool_parser._tool_prefix_injected_to_delta: | ||
| stream_delta = prefix + stream_delta |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览所有 required 任务已通过,建议通过;当前存在 3 个 optional 失败任务和 1 个 optional 等待任务,不阻塞合并。
2 任务状态汇总日志列说明:失败任务直接使用 CI 日志链接;optional 失败仅供参考。 2.1 Required任务 : 10/10 通过
2.2 可选任务 — 32/36 通过
3 失败详情(仅 required)无 required 失败任务。本次不对 optional 失败任务做深度分析。 |
| def _is_forced_tool_choice(request) -> bool: | ||
| """Return True iff the request asks the chat template to inject a | ||
| tool-call prefix into the prompt. Two ways are recognized: | ||
|
|
||
| 1. ``request.tool_choice`` is a named-tool choice (a | ||
| ``ChatCompletionNamedToolChoiceParam`` pydantic model with | ||
| ``type == "function"``). The plain ``"required"`` string does NOT | ||
| trigger prefix injection in the chat template. | ||
| 2. ``request.chat_template_kwargs.options.tool_choice.mode == "force"`` | ||
| — used by chat templates that drive forced tool calls through their | ||
| own ``options`` dict instead of the OpenAI-style ``tool_choice`` | ||
| field. | ||
| """ | ||
| if request is None: | ||
| return False | ||
|
|
||
| tool_choice = getattr(request, "tool_choice", None) | ||
| # Named-tool choices are pydantic ``ChatCompletionNamedToolChoiceParam`` | ||
| # objects (``type == "function"``); plain string values such as | ||
| # ``"required"`` / ``"auto"`` / ``"none"`` are skipped here. | ||
| if not isinstance(tool_choice, str) and getattr(tool_choice, "type", None) == "function": | ||
| return True | ||
|
|
||
| chat_template_kwargs = getattr(request, "chat_template_kwargs", None) or {} | ||
| options = chat_template_kwargs.get("options") if isinstance(chat_template_kwargs, dict) else None | ||
| inner = options.get("tool_choice") if isinstance(options, dict) else None | ||
| if isinstance(inner, dict) and inner.get("mode") == "force": | ||
| return True | ||
| return False |
| try: | ||
| tool_parser._tool_prefix_token_ids = list(self.tokenizer.encode(prefix, add_special_tokens=False)) | ||
| except Exception: | ||
| data_processor_logger.exception("encode tool prefix to token ids failed; token-id splice disabled") | ||
| tool_parser._tool_prefix_token_ids = [] |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7882 +/- ##
==========================================
Coverage ? 63.64%
==========================================
Files ? 462
Lines ? 64574
Branches ? 9898
==========================================
Hits ? 41095
Misses ? 20702
Partials ? 2777
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| previous_token_ids = list(status[2]) | ||
| status[2].extend(token_id) | ||
| decode_str, prefix_offset, read_offset = self.tokenizer.decode_token(status[2], status[0], status[1]) | ||
| status[0] = prefix_offset | ||
| status[1] = read_offset |
| self.data_processor.process_response_dict( | ||
| response_dict=request_output, | ||
| stream=stream, | ||
| include_stop_str_in_output=include_stop_str_in_output, | ||
| request=request, |
| response_dict=request_output, | ||
| stream=stream, | ||
| include_stop_str_in_output=include_stop_str_in_output, | ||
| request=request, | ||
| prompt_tokens=prompt_tokens, |
| ``previous_token_ids`` and ``previous_texts`` are **snapshots of the | ||
| accumulated state BEFORE this call's tokens were appended** — | ||
| symmetric pre-delta views of what the caller had decoded so far. | ||
| Both are owned by the caller (no aliasing of internal state). |
| Finds the **last** :attr:`tool_call_start_token` in ``prompt`` that is | ||
| not closed by a later :attr:`tool_call_end_token` and reaches the | ||
| prompt end (modulo trailing whitespace). Returns ``""`` otherwise. | ||
| Subclasses with non-paired tag formats may override. |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-22 17:43:48
📋 Review 摘要
PR 概述:增强 OpenAI 兼容 chat completions 的 tool-call 前缀补偿能力,修复 forced tool_choice 场景下 streaming / non-streaming 路径的 tool parser 解析问题
变更范围:fastdeploy/entrypoints/openai/、fastdeploy/input/base_processor.py、fastdeploy/engine/common_engine.py
影响面 Tag:[APIServer] [Engine] [DataProcessor]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | abstract_tool_parser.py:107 |
detect_tool_prefix 未限制前缀长度,最后一个未闭合 start token 不在 prompt 末尾时会将大量历史内容误当作前缀 |
| 📝 PR 规范 | — | Usage or Command 和 Accuracy Tests section 仅含 HTML 占位注释,未填写 N/A |
📝 PR 规范检查
标题格式合规([Feature] 为官方 Tag)。Usage or Command 与 Accuracy Tests 段仅包含 HTML 占位符注释,未按模板要求填写 N/A。
标题建议(可直接复制):
[Feature] Add OpenAI-compatible tool_choice support for chat completions(标题合规,无需修改)
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
本 PR 增强了 OpenAI 兼容 chat completions 场景下的工具调用解析能力,完善了 tool-call 输出在流式 / 非流式场景下的处理链路。
在部分 chat template 实现中,渲染后的 prompt 尾部会携带一个未闭合的 tool-call 前缀。此时模型生成结果并不是从完整的 tool call 起始边界开始,而是从该前缀之后继续生成。如果仍按原有方式解析,tool parser 在部分场景下无法正确识别工具调用内容,尤其是在 streaming 场景下更容易出现解析不完整的问题。
本 PR 通过在响应处理阶段感知 prompt 尾部注入的 tool-call 前缀,并在 parser 输入侧补齐对应的 text / token ids,使现有 tool parser 能够正确处理这类由 chat template 驱动的 tool-call 输出。同时补充了对应单元测试,覆盖 normal / streaming 两条路径。
## Modifications
1. **将渲染后的 prompt 透传到响应处理流程**:更新 `process_response_chat(...)` 签名新增 `prompt_tokens` 可选参数,在各响应处理分支中统一透传
2. **增强 ToolParser prefix 检测**:新增 `tool_call_start_token`/`tool_call_end_token` 类属性及 `detect_tool_prefix(prompt)` 方法,通过 rfind 检测 prompt 尾部未闭合的 tool-call start token
3. **BaseTextProcessor 增加 prefix 补偿**:新增 `_text_to_token_ids`、`_prepare_tool_prefix`,在非流式路径中拼接前缀,在流式路径中补偿 previous/current/delta 的文本与 token ids(delta 仅首个 chunk 注入)
4. **调整 `ids2tokens` 返回语义**:返回 pre-delta snapshot 避免别名问题,在 `common_engine.py` 中通过 `previous_token_ids + token_ids` 重建 cumulative token ids
5. **补充单元测试**:新增 `test_abstract_tool_parser.py` 覆盖 `detect_tool_prefix` 多场景;扩展 `test_text_processor.py` 覆盖 normal/streaming prefix compensation 路径
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
整体实现思路清晰,通过 prompt prefix 感知 + 注入修复了 forced tool_choice 场景下 tool parser 解析不完整的问题,流式/非流式两条路径均有覆盖并补充了单元测试。detect_tool_prefix 基类默认实现对未闭合 start token 的位置判断较宽松,建议添加长度上限保护以降低误检风险。
| # override this method to apply stricter validation. | ||
| return prompt[last_start:] | ||
|
|
||
| def extract_tool_calls(self, model_output: str, request: ChatCompletionRequest) -> ExtractedToolCallInformation: |
There was a problem hiding this comment.
🟡 建议 detect_tool_prefix 对未闭合 start token 的判断过于宽松
prompt.rfind(start) 找到最后一个 start token 后,只检查其后是否有 end token,但未验证该 start token 是否真正位于 prompt 末尾附近。若历史对话中存在格式异常的未闭合 start token,而 prompt 末尾还有其他内容,则从该位置到 prompt 末尾的所有内容(可能非常长)会被当作 prefix 注入,导致 tool parser 接收到错误的解析输入。
建议修复方式:在 return prompt[last_start:] 前添加长度保护,超过合理阈值时视为误检返回 "":
_MAX_PREFIX_LEN = 512
tail = prompt[last_start:]
if len(tail) > _MAX_PREFIX_LEN:
return ""
return tail
Motivation
本 PR 增强了 OpenAI 兼容 chat completions 场景下的工具调用解析能力,完善了 tool-call 输出在流式 / 非流式场景下的处理链路。
在部分 chat template 实现中,渲染后的 prompt 尾部会携带一个未闭合的 tool-call 前缀。此时模型生成结果并不是从完整的 tool call 起始边界开始,而是从该前缀之后继续生成。如果仍按原有方式解析,tool parser 在部分场景下无法正确识别工具调用内容,尤其是在 streaming 场景下更容易出现解析不完整的问题。
本 PR 通过在响应处理阶段感知 prompt 尾部注入的 tool-call 前缀,并在 parser 输入侧补齐对应的 text / token ids,使现有 tool parser 能够正确处理这类由 chat template 驱动的 tool-call 输出。同时补充了对应单元测试,覆盖 normal / streaming 两条路径。
Modifications
将渲染后的 prompt 透传到响应处理流程
process_response_chat(...)方法签名,新增prompt_tokens参数serving_chat.py中将prompt_tokens传递到 response processorprompt_tokens,为后续 parser prefix 检测提供上下文增强 ToolParser,支持检测 prompt 尾部的 tool-call prefix
tool_call_start_tokentool_call_end_token_tool_prefix_tool_prefix_token_ids_tool_prefix_computed_tool_prefix_injected_to_deltadetect_tool_prefix(prompt)方法tool_call_start_token,识别由 chat template 注入的 tool-call prefix在 BaseTextProcessor 中增加 prefix 补偿逻辑
_text_to_token_ids(...),统一文本到 token ids 的编码逻辑,兼容不同 tokenizer 分支_prepare_tool_prefix(...)中:detect_tool_prefix(...)检测 prefixextract_tool_calls(...)previous/current/delta的文本输入previous/current/delta的 token ids 输入delta_*注入 prefix,避免后续重复拼接调整
ids2tokens(...)返回语义并修正引擎侧累计 token 使用方式ids2tokens(...)返回的previous_token_ids/previous_texts语义为“本次 token 追加之前的快照”common_engine.py中根据previous_token_ids + token_ids重建本次调用后的 cumulative token ids补充并更新测试
tests/entrypoints/openai/tool_parsers/test_abstract_tool_parser.pydetect_tool_prefix(...)的多种场景tests/input/test_text_processor.py中新增 prefix compensation 相关测试prompt_tokens新参数和ids2tokens(...)新语义exception/error方法Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.