[Feature] Add OpenAI-compatible tool_choice support for chat completions by luukunn · Pull Request #7882 · PaddlePaddle/FastDeploy

luukunn · 2026-05-21T11:13:50Z

Motivation

本 PR 增强了 OpenAI 兼容 chat completions 场景下的工具调用解析能力，完善了 tool-call 输出在流式 / 非流式场景下的处理链路。

在部分 chat template 实现中，渲染后的 prompt 尾部会携带一个未闭合的 tool-call 前缀。此时模型生成结果并不是从完整的 tool call 起始边界开始，而是从该前缀之后继续生成。如果仍按原有方式解析，tool parser 在部分场景下无法正确识别工具调用内容，尤其是在 streaming 场景下更容易出现解析不完整的问题。

本 PR 通过在响应处理阶段感知 prompt 尾部注入的 tool-call 前缀，并在 parser 输入侧补齐对应的 text / token ids，使现有 tool parser 能够正确处理这类由 chat template 驱动的 tool-call 输出。同时补充了对应单元测试，覆盖 normal / streaming 两条路径。

Modifications

将渲染后的 prompt 透传到响应处理流程
- 更新 process_response_chat(...) 方法签名，新增 prompt_tokens 参数
- 在 serving_chat.py 中将 prompt_tokens 传递到 response processor
- 在不同响应处理分支中统一透传 prompt_tokens，为后续 parser prefix 检测提供上下文
增强 ToolParser，支持检测 prompt 尾部的 tool-call prefix
- 在抽象 parser 中新增以下字段：
  - tool_call_start_token
  - tool_call_end_token
- 新增 parser 级别状态：
  - _tool_prefix
  - _tool_prefix_token_ids
  - _tool_prefix_computed
  - _tool_prefix_injected_to_delta
- 新增 detect_tool_prefix(prompt) 方法
- 该方法通过检测 prompt 尾部最后一个未闭合的 tool_call_start_token，识别由 chat template 注入的 tool-call prefix
在 BaseTextProcessor 中增加 prefix 补偿逻辑
- 新增 _text_to_token_ids(...)，统一文本到 token ids 的编码逻辑，兼容不同 tokenizer 分支
- 在 _prepare_tool_prefix(...) 中：
  - 调用 parser 的 detect_tool_prefix(...) 检测 prefix
  - 缓存 prefix 文本
  - 同时将 prefix 编码为 token ids，供 streaming parser 使用
- 在非流式路径中：
  - 将检测到的 prefix 拼接回完整输出文本后，再交给 extract_tool_calls(...)
- 在流式路径中：
  - 同时补偿 previous/current/delta 的文本输入
  - 同时补偿 previous/current/delta 的 token ids 输入
  - 仅在首个 chunk 中向 delta_* 注入 prefix，避免后续重复拼接
- 这样可以兼容既依赖文本边界、也依赖 token id 边界进行 tool-call 检测的 parser
调整 ids2tokens(...) 返回语义并修正引擎侧累计 token 使用方式
- 明确 ids2tokens(...) 返回的 previous_token_ids / previous_texts 语义为“本次 token 追加之前的快照”
- 在非 HF tokenizer 路径下，返回 pre-delta snapshot，避免状态别名问题
- 在 common_engine.py 中根据 previous_token_ids + token_ids 重建本次调用后的 cumulative token ids
- 保证流式解析链路中 previous/current/delta token ids 的语义一致
补充并更新测试
- 新增 tests/entrypoints/openai/tool_parsers/test_abstract_tool_parser.py
  - 覆盖 detect_tool_prefix(...) 的多种场景
- 在 tests/input/test_text_processor.py 中新增 prefix compensation 相关测试
  - 覆盖 normal / streaming 路径
  - 覆盖 prefix 仅首个 delta 注入的逻辑
  - 覆盖 token ids splice 行为
- 更新若干已有测试以适配 prompt_tokens 新参数和 ids2tokens(...) 新语义
- 在测试 mock logger 中补充 exception / error 方法

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-21T11:14:00Z

Thanks for your contribution!

Copilot

Pull request overview

本 PR 为 FastDeploy 的 OpenAI 兼容 Chat Completions 增加 tool_choice（none/auto/required/指定函数工具）协议支持，并在强制工具调用（prompt 尾部被 chat template 注入未闭合的 tool-call 前缀）时，通过“前缀检测 + 补偿拼接”让现有 tool parser 在流式/非流式输出下都能继续正确解析工具调用。

Changes:

在 OpenAI 协议层新增 tool_choice 相关模型定义，并把 tool_choice 接入 ChatCompletionRequest。
在 chat serving → response processor → data processor 链路中透传渲染后的 prompt_tokens，用于检测 prompt 尾部注入的 tool-call 前缀。
ToolParser 抽象层新增 detect_tool_prefix 与 per-request 前缀状态；BaseTextProcessor 增加 forced tool choice 的前缀补偿逻辑，并补充对应单测覆盖。

建议同时确认是否需要在对外文档/示例（如 OpenAI 兼容 API 文档或用例）中补充 tool_choice 用法说明。

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/input/test_text_processor.py	补齐 mock logger 方法；更新/新增用例覆盖 forced tool choice 识别与前缀补偿（流式/非流式）。
tests/entrypoints/openai/tool_parsers/test_abstract_tool_parser.py	新增 `ToolParser.detect_tool_prefix` 行为单测。
fastdeploy/input/base_processor.py	增加 `_is_forced_tool_choice` 与 `_prepare_tool_prefix`，并在工具解析前对输入做 prefix 补偿。
fastdeploy/entrypoints/openai/tool_parsers/abstract_tool_parser.py	为 ToolParser 增加 sentinel token 声明、前缀检测方法与 per-request 前缀状态字段。
fastdeploy/entrypoints/openai/serving_chat.py	将渲染后的 `prompt_tokens` 透传给 response processor。
fastdeploy/entrypoints/openai/response_processors.py	扩展 `process_response_chat` 签名以接收并下传 `prompt_tokens`。
fastdeploy/entrypoints/openai/protocol.py	新增 `ChatCompletionNamedFunction`/`ChatCompletionNamedToolChoiceParam`，并在 `ChatCompletionRequest` 中加入 `tool_choice` 字段。

Comments suppressed due to low confidence (1)

fastdeploy/entrypoints/openai/response_processors.py:111

在 process_response_chat 的非异步分支里调用 data_processor.process_response_dict(...) 时没有透传 prompt_tokens。这样 BaseTextProcessor._prepare_tool_prefix 无法检测 chat template 注入的 tool-call 前缀，导致 tool_choice=required 等场景在默认同步 processor 下仍然解析失败。建议在同步分支也传入 prompt_tokens=prompt_tokens（与异步分支保持一致）。

                            response = self.data_processor.process_response_dict(
                                response_dict=request_output,
                                stream=stream,
                                include_stop_str_in_output=include_stop_str_in_output,
                                audio_tokens=all_audio_tokens,

+                if prefix:
+                    stream_previous = prefix + stream_previous
+                    stream_current = prefix + stream_current
+                    if not tool_parser._tool_prefix_injected_to_delta:
+                        stream_delta = prefix + stream_delta


PaddlePaddle-bot · 2026-05-21T11:43:13Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-22 19:42:41

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 0cafd3e
Merge base: f254649 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

所有 required 任务已通过，建议通过；当前存在 3 个 optional 失败任务和 1 个 optional 等待任务，不阻塞合并。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
46(0)	46	42	3	0	1	0

2 任务状态汇总

日志列说明：失败任务直接使用 CI 日志链接；optional 失败仅供参考。

2.1 Required任务 : 10/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
✅	其余 10 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 32/36 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	1m41s	Job	-
❌	`Check PR Template`	16s	Job	-
❌	`Trigger Jenkins for PR`	7m30s	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 32 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。本次不对 optional 失败任务做深度分析。

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

+def _is_forced_tool_choice(request) -> bool:
+    """Return True iff the request asks the chat template to inject a
+    tool-call prefix into the prompt. Two ways are recognized:
+
+    1. ``request.tool_choice`` is a named-tool choice (a
+       ``ChatCompletionNamedToolChoiceParam`` pydantic model with
+       ``type == "function"``). The plain ``"required"`` string does NOT
+       trigger prefix injection in the chat template.
+    2. ``request.chat_template_kwargs.options.tool_choice.mode == "force"``
+       — used by chat templates that drive forced tool calls through their
+       own ``options`` dict instead of the OpenAI-style ``tool_choice``
+       field.
+    """
+    if request is None:
+        return False
+
+    tool_choice = getattr(request, "tool_choice", None)
+    # Named-tool choices are pydantic ``ChatCompletionNamedToolChoiceParam``
+    # objects (``type == "function"``); plain string values such as
+    # ``"required"`` / ``"auto"`` / ``"none"`` are skipped here.
+    if not isinstance(tool_choice, str) and getattr(tool_choice, "type", None) == "function":
+        return True
+
+    chat_template_kwargs = getattr(request, "chat_template_kwargs", None) or {}
+    options = chat_template_kwargs.get("options") if isinstance(chat_template_kwargs, dict) else None
+    inner = options.get("tool_choice") if isinstance(options, dict) else None
+    if isinstance(inner, dict) and inner.get("mode") == "force":
+        return True
+    return False


+        try:
+            tool_parser._tool_prefix_token_ids = list(self.tokenizer.encode(prefix, add_special_tokens=False))
+        except Exception:
+            data_processor_logger.exception("encode tool prefix to token ids failed; token-id splice disabled")
+            tool_parser._tool_prefix_token_ids = []


codecov-commenter · 2026-05-21T12:48:08Z

Codecov Report

❌ Patch coverage is 89.02439% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@f254649). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/input/base_processor.py	85.48%	5 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7882   +/-   ##
==========================================
  Coverage           ?   63.64%           
==========================================
  Files              ?      462           
  Lines              ?    64574           
  Branches           ?     9898           
==========================================
  Hits               ?    41095           
  Misses             ?    20702           
  Partials           ?     2777

Flag	Coverage Δ
GPU	`72.76% <89.02%> (?)`
XPU	`7.11% <3.65%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

+            previous_token_ids = list(status[2])
            status[2].extend(token_id)
            decode_str, prefix_offset, read_offset = self.tokenizer.decode_token(status[2], status[0], status[1])
            status[0] = prefix_offset
            status[1] = read_offset


                                self.data_processor.process_response_dict(
                                    response_dict=request_output,
                                    stream=stream,
                                    include_stop_str_in_output=include_stop_str_in_output,
                                    request=request,


Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

                                    response_dict=request_output,
                                    stream=stream,
                                    include_stop_str_in_output=include_stop_str_in_output,
                                    request=request,
+                                    prompt_tokens=prompt_tokens,


+        ``previous_token_ids`` and ``previous_texts`` are **snapshots of the
+        accumulated state BEFORE this call's tokens were appended** —
+        symmetric pre-delta views of what the caller had decoded so far.
+        Both are owned by the caller (no aliasing of internal state).


+        Finds the **last** :attr:`tool_call_start_token` in ``prompt`` that is
+        not closed by a later :attr:`tool_call_end_token` and reaches the
+        prompt end (modulo trailing whitespace). Returns ``""`` otherwise.
+        Subclasses with non-paired tag formats may override.


PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 17:43:48

📋 Review 摘要

PR 概述：增强 OpenAI 兼容 chat completions 的 tool-call 前缀补偿能力，修复 forced tool_choice 场景下 streaming / non-streaming 路径的 tool parser 解析问题
变更范围：fastdeploy/entrypoints/openai/、fastdeploy/input/base_processor.py、fastdeploy/engine/common_engine.py
影响面 Tag：[APIServer] [Engine] [DataProcessor]

问题

级别	文件	概述
🟡 建议	`abstract_tool_parser.py:107`	`detect_tool_prefix` 未限制前缀长度，最后一个未闭合 start token 不在 prompt 末尾时会将大量历史内容误当作前缀
📝 PR 规范	—	`Usage or Command` 和 `Accuracy Tests` section 仅含 HTML 占位注释，未填写 N/A

📝 PR 规范检查

标题格式合规（[Feature] 为官方 Tag）。Usage or Command 与 Accuracy Tests 段仅包含 HTML 占位符注释，未按模板要求填写 N/A。

标题建议（可直接复制）：

[Feature] Add OpenAI-compatible tool_choice support for chat completions（标题合规，无需修改）

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation

本 PR 增强了 OpenAI 兼容 chat completions 场景下的工具调用解析能力，完善了 tool-call 输出在流式 / 非流式场景下的处理链路。

在部分 chat template 实现中，渲染后的 prompt 尾部会携带一个未闭合的 tool-call 前缀。此时模型生成结果并不是从完整的 tool call 起始边界开始，而是从该前缀之后继续生成。如果仍按原有方式解析，tool parser 在部分场景下无法正确识别工具调用内容，尤其是在 streaming 场景下更容易出现解析不完整的问题。

本 PR 通过在响应处理阶段感知 prompt 尾部注入的 tool-call 前缀，并在 parser 输入侧补齐对应的 text / token ids，使现有 tool parser 能够正确处理这类由 chat template 驱动的 tool-call 输出。同时补充了对应单元测试，覆盖 normal / streaming 两条路径。

## Modifications

1. **将渲染后的 prompt 透传到响应处理流程**：更新 `process_response_chat(...)` 签名新增 `prompt_tokens` 可选参数，在各响应处理分支中统一透传
2. **增强 ToolParser prefix 检测**：新增 `tool_call_start_token`/`tool_call_end_token` 类属性及 `detect_tool_prefix(prompt)` 方法，通过 rfind 检测 prompt 尾部未闭合的 tool-call start token
3. **BaseTextProcessor 增加 prefix 补偿**：新增 `_text_to_token_ids`、`_prepare_tool_prefix`，在非流式路径中拼接前缀，在流式路径中补偿 previous/current/delta 的文本与 token ids（delta 仅首个 chunk 注入）
4. **调整 `ids2tokens` 返回语义**：返回 pre-delta snapshot 避免别名问题，在 `common_engine.py` 中通过 `previous_token_ids + token_ids` 重建 cumulative token ids
5. **补充单元测试**：新增 `test_abstract_tool_parser.py` 覆盖 `detect_tool_prefix` 多场景；扩展 `test_text_processor.py` 覆盖 normal/streaming prefix compensation 路径

## Usage or Command

N/A

## Accuracy Tests

N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体实现思路清晰，通过 prompt prefix 感知 + 注入修复了 forced tool_choice 场景下 tool parser 解析不完整的问题，流式/非流式两条路径均有覆盖并补充了单元测试。detect_tool_prefix 基类默认实现对未闭合 start token 的位置判断较宽松，建议添加长度上限保护以降低误检风险。

PaddlePaddle-bot · 2026-05-22T09:51:15Z

+        # override this method to apply stricter validation.
+        return prompt[last_start:]
+
    def extract_tool_calls(self, model_output: str, request: ChatCompletionRequest) -> ExtractedToolCallInformation:


🟡 建议 detect_tool_prefix 对未闭合 start token 的判断过于宽松

prompt.rfind(start) 找到最后一个 start token 后，只检查其后是否有 end token，但未验证该 start token 是否真正位于 prompt 末尾附近。若历史对话中存在格式异常的未闭合 start token，而 prompt 末尾还有其他内容，则从该位置到 prompt 末尾的所有内容（可能非常长）会被当作 prefix 注入，导致 tool parser 接收到错误的解析输入。

建议修复方式：在 return prompt[last_start:] 前添加长度保护，超过合理阈值时视为误检返回 ""：

_MAX_PREFIX_LEN = 512 tail = prompt[last_start:] if len(tail) > _MAX_PREFIX_LEN: return "" return tail

luukunn added 3 commits May 20, 2026 20:07

first commit

6b2f806

fix bug

47a7e23

fix unit test

7041f47

Copilot AI review requested due to automatic review settings May 21, 2026 11:13

luukunn had a problem deploying to Metax_ci May 21, 2026 11:13 — with GitHub Actions Failure

Copilot started reviewing on behalf of luukunn May 21, 2026 11:14 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread fastdeploy/input/base_processor.py Outdated

Comment on lines +428 to +432

if prefix:

stream_previous = prefix + stream_previous

stream_current = prefix + stream_current

if not tool_parser._tool_prefix_injected_to_delta:

stream_delta = prefix + stream_delta