[Cherry-Pick] [Optimization] TopP=1.0 using _random_sample (#7892) and Triton SamplerBackend (#7639) by ckl117 · Pull Request #7910 · PaddlePaddle/FastDeploy

ckl117 · 2026-05-25T05:06:56Z

Motivation

支持triton采样后端 #7639
优化top_p=1.0采样 #7892
新增环境变量FD_ENABLE_TOP_P_ONE_OPT=1，方便RL验证，待验证通过后删掉环境变量。

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

)

paddle-bot · 2026-05-25T05:07:02Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-25T06:03:08Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-25 19:42:44

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: b73b574
Merge base: 85399db (branch: release/2.6)
查看完整 Diff
CI 详情

1 任务总览

当前 required 任务未全部通过，建议暂不通过。required 失败任务数 2，等待处理的 required 任务数 0（运行中 0、等待中 0）。主要阻塞项为单测覆盖率门禁失败与 Approval 待人工审批。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
37(0)	37	31	5	0	1	0

2 任务状态汇总

日志列说明：失败任务直接使用日志链接；运行中任务使用 Job 链接。

2.1 Required任务 : 8/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	1h13m	PR问题：diff覆盖率仅20%，未达80%	补充采样/Triton覆盖或豁免JIT内核	Job	-
❌	`Approval`	19s	需要 Approval	请通过人工审批	Job	-
✅	其余 8 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 23/27 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	15m48s	Job	-
❌	`Check PR Template`	21s	Job	-
❌	`Trigger Jenkins for PR`	19s	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 23 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率门禁（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 覆盖率门禁
置信度: 高
根因摘要: diff覆盖率仅20%，未达80%
分析器: ci_analyze_unittest_fastdeploy

失败用例: 无。日志显示 All tests passed，失败发生在覆盖率阈值校验步骤。

根因详情:
本次 PR 修改/新增采样相关代码（如 top_k_top_p_triton.py、sampler.py、top_k_top_p_sampling.py、input_batch.py），但 diff 覆盖率报告显示总覆盖率仅 20%，低于 CI 要求的 80%。其中 fastdeploy/model_executor/layers/sample/ops/top_k_top_p_triton.py 覆盖率仅 11.49%，大量 Triton JIT kernel 段（如 _update_min_larger_stats、_topk_topp_kernel 等）被计入 diff coverage 但现有 tests/layers/test_triton_sampler.py 主要只覆盖 Python wrapper 的少量路径，导致 508 行 violation。

关键日志:

All tests passed
Coverage generation failed (exit code 9)
GPU Patch Coverage Details:
total_percent_covered: 20
total_num_violations: 508
fastdeploy/model_executor/layers/sample/ops/top_k_top_p_triton.py: 11.49%
fastdeploy/model_executor/layers/sample/ops/top_k_top_p_sampling.py: 70.0% (violations: 40,45,46)
fastdeploy/worker/input_batch.py: 66.67% (violation: 552)

修复建议:

在 tests/layers/test_triton_sampler.py 补充覆盖 sampler.py 中 triton 分支、top_p=1.0 优化分支、top_k_top_p_sampling.py ImportError/dispatch 分支，以及 input_batch.py::reset_share_inputs 中 top_p_list 重置逻辑。
对 fastdeploy/model_executor/layers/sample/ops/top_k_top_p_triton.py 中 Python coverage 无法真实追踪的 Triton JIT kernel 区域（例如 92 行后的 @triton.jit helper 和 115 行后的 _topk_topp_kernel）补充可执行 GPU 覆盖；若该类 kernel 源码不适合 Python diff coverage 统计，应按项目规范添加 coverage exclude/豁免，而不是让未可追踪内核行进入 80% 门禁。

修复建议摘要: 补充采样/Triton覆盖或豁免JIT内核

关联变更: fastdeploy/model_executor/layers/sample/ops/top_k_top_p_triton.py, fastdeploy/model_executor/layers/sample/sampler.py, fastdeploy/model_executor/layers/sample/ops/top_k_top_p_sampling.py, fastdeploy/worker/input_batch.py, tests/layers/test_triton_sampler.py

链接: 查看日志

Approval — 需要人工审批（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

codecov-commenter · 2026-05-25T08:05:15Z

Codecov Report

❌ Patch coverage is 18.11024% with 520 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@85399db). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...l_executor/layers/sample/ops/top_k_top_p_triton.py	10.95%	493 Missing and 3 partials ⚠️
fastdeploy/model_executor/layers/sample/sampler.py	69.35%	11 Missing and 8 partials ⚠️
...executor/layers/sample/ops/top_k_top_p_sampling.py	60.00%	3 Missing and 1 partial ⚠️
fastdeploy/worker/input_batch.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7910   +/-   ##
==============================================
  Coverage               ?   71.83%           
==============================================
  Files                  ?      383           
  Lines                  ?    55071           
  Branches               ?     8620           
==============================================
  Hits                   ?    39560           
  Misses                 ?    12732           
  Partials               ?     2779

Flag	Coverage Δ
GPU	`71.83% <18.11%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

into 26_topp1

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-25 16:44:39

📋 Review 摘要

PR 概述：Cherry-Pick 两项采样优化——新增 Triton 采样后端和 top_p=1.0 快速路径优化
变更范围：model_executor/layers/sample/、worker/gpu_model_runner.py、envs.py
影响面 Tag：[OP] [Optimization]

问题

级别	文件	概述
🟡 建议	`meta_data.py:46`	`top_p_list` 使用裸 `list` 类型，缺乏元素类型注解
❓ 疑问	`top_k_top_p_sampling.py:108`	移除 `topp_seed` CPU→GPU 拷贝，需确认调用侧已保证在 GPU 上
🟡 建议	`gpu_model_runner.py`	A6：通用路径有改动，请确认是否需同步其他硬件 ModelRunner
📝 PR 规范	PR 描述	`Modifications`、`Usage or Command`、`Accuracy Tests` 三节为空

📝 PR 规范检查

PR 标题 Cherry-Pick 格式中 [Cherry-Pick] 与 [Optimization] 之间有多余空格（规范为 [Cherry-Pick][Tag] 紧邻），且 Modifications、Usage or Command、Accuracy Tests 三节内容均为空。

标题建议（可直接复制）：

[Cherry-Pick][Optimization] TopP=1.0 using _random_sample and Triton SamplerBackend (#7892, #7639)

PR 描述建议（可直接复制）：

## Motivation
1. 支持 Triton 采样后端（`FD_SAMPLING_CLASS=triton`），基于 Qrita 算法实现高性能 Top-K + Top-P 联合采样 kernel（#7639）。
2. 优化 top_p=1.0 采样路径，当 `FD_ENABLE_TOP_P_ONE_OPT=1` 时直接走 random_sample 跳过 top-p 过滤，降低 RL 验证场景延迟（#7892）。
3. 新增环境变量 `FD_ENABLE_TOP_P_ONE_OPT`，待 RL 场景验证通过后移除。

## Modifications
- `fastdeploy/model_executor/layers/sample/ops/top_k_top_p_triton.py`：新增 Triton Top-K/Top-P 联合采样 kernel，基于 Qrita 论文 pivot-based 截断算法
- `fastdeploy/model_executor/layers/sample/ops/top_k_top_p_sampling.py`：提取 `dispatch_top_k_renorm_probs` 公共函数，简化 `topp_seed` 传参逻辑
- `fastdeploy/model_executor/layers/sample/sampler.py`：集成 triton 采样后端分支及 top_p=1.0 优化路径
- `fastdeploy/model_executor/layers/sample/meta_data.py`：新增 `top_p_list` 字段供 triton 路径使用
- `fastdeploy/worker/gpu_model_runner.py` / `input_batch.py`：透传 top_p_list 至采样层
- `fastdeploy/envs.py`：新增 `FD_ENABLE_TOP_P_ONE_OPT` 环境变量
- `tests/layers/test_triton_sampler.py`：新增 triton 采样算子单测

## Usage or Command
```bash
# 启用 Triton 采样后端
FD_SAMPLING_CLASS=triton python -m fastdeploy.entrypoints.openai.api_server ...

# 启用 top_p=1.0 优化（RL 验证场景）
FD_ENABLE_TOP_P_ONE_OPT=1 python -m fastdeploy.entrypoints.openai.api_server ...
```

## Accuracy Tests
N/A（采样输出为随机分布，无确定性精度对比；triton 后端与 base 后端的分布等价性由 `tests/layers/test_triton_sampler.py` 覆盖）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体代码结构清晰，Triton kernel 实现参考了 Qrita 论文，逻辑完整；dispatch_top_k_renorm_probs 的提取也改善了代码复用。建议确认 topp_seed device 保证和多硬件同步情况后即可合入。

PaddlePaddle-bot · 2026-05-25T08:47:44Z


    top_p: paddle.Tensor
+    top_p_list: Optional[list] = None
    # only GPU used


🟡 建议 top_p_list 字段使用裸 list 类型注解，与相邻字段（top_p: paddle.Tensor）风格不一致，且缺乏元素类型信息，IDE 和类型检查工具无法推导元素类型。

建议修改为：

top_p_list: Optional[List[float]] = None

（同时在文件顶部 import List 若尚未导入）

PaddlePaddle-bot · 2026-05-25T08:47:44Z

-                topp_seed_device = paddle.empty(shape=topp_seed.shape, dtype=topp_seed.dtype)
-                topp_seed_device.copy_(topp_seed, False)
            _, ids = paddle.tensor.top_p_sampling(
                x,


❓ 疑问 原代码明确将 topp_seed 从 CPU 拷贝到 GPU（paddle.empty(...).copy_(topp_seed, False)）再传入 paddle.tensor.top_p_sampling，此处直接将原始 topp_seed 传入。

若 topp_seed 在调用侧已保证在 GPU 上，此简化正确；但若仍可能在 CPU 上（如从 input_batch CPU 侧构造），则会导致运行时 device mismatch 错误。

请确认 topp_seed 的来源保证在 GPU 上，或添加注释说明。

zeroRains and others added 2 commits May 25, 2026 12:54

[CP][Feature] support new sampler backend with triton (PaddlePaddle#7639

27b9c26

)

[Optimization] TopP=1.0 using _random_sample (PaddlePaddle#7892)

e52ec74

ckl117 had a problem deploying to Metax_ci May 25, 2026 05:07 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Merge branch 'release/2.6' into 26_topp1

be0f9cf

EmmonsCurse had a problem deploying to Metax_ci May 25, 2026 05:34 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

code check

6d6f6cb

ckl117 had a problem deploying to Metax_ci May 25, 2026 06:10 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

add env FD_ENABLE_TOP_P_ONE_OPT control top_p=1 opt

204c426

ckl117 had a problem deploying to Metax_ci May 25, 2026 06:29 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

defalut FD_ENABLE_TOP_P_ONE_OPT=0

755453a

Merge branch 'release/2.6' of https://github.com/PaddlePaddle/FastDeploy

b73b574

into 26_topp1

ckl117 had a problem deploying to Metax_ci May 25, 2026 08:11 — with GitHub Actions Failure

PaddlePaddle-bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick] [Optimization] TopP=1.0 using _random_sample (#7892) and Triton SamplerBackend (#7639)#7910

[Cherry-Pick] [Optimization] TopP=1.0 using _random_sample (#7892) and Triton SamplerBackend (#7639)#7910
ckl117 wants to merge 7 commits into
PaddlePaddle:release/2.6from
ckl117:26_topp1

ckl117 commented May 25, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 25, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 25, 2026 •

edited

Loading

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 25, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 25, 2026

Uh oh!

PaddlePaddle-bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ckl117 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 25, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 8/10 通过

2.2 可选任务 — 23/27 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ckl117 commented May 25, 2026 •

edited

Loading

PaddlePaddle-bot commented May 25, 2026 •

edited

Loading

codecov-commenter commented May 25, 2026 •

edited

Loading