[just test] tmp fix k3 in step0#7897
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务已全部通过(10/10),当前 PR 的必选 CI 不阻塞合并,建议通过。另有 4 个 Optional 任务失败,仅供参考。
2 任务状态汇总2.1 Required任务 : 10/10 通过
2.2 可选任务 — 27/31 通过
3 失败详情(仅 required)无 required 失败任务。 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7897 +/- ##
==========================================
Coverage ? 63.60%
==========================================
Files ? 462
Lines ? 64507
Branches ? 9887
==========================================
Hits ? 41030
Misses ? 20701
Partials ? 2776
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-22 18:42:07
📋 Review 摘要
PR 概述:将 capture_model() 与 capture_block_wise_graphs() 移至 deterministic 模式 RNG/share_inputs 重置之后,修复 deterministic 模式下 CUDAGraph 捕获时使用 stale 状态导致的 k3 计算错误。
变更范围:fastdeploy/worker/gpu_worker.py(graph_optimize_and_warm_up_model)
影响面 Tag:[Graph Optimization]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | gpu_worker.py:255 |
其他硬件 Worker 是否需同步相同的 capture 顺序修复 |
| 📝 PR 规范 | — | 标题缺少官方 Tag;PR 描述各 Section 内容为空 |
📝 PR 规范检查
存在两处规范问题:
- 标题
[just test] tmp fix k3 in step0使用了非官方 Tag,且含临时性描述文字; - PR 描述中 Motivation / Modifications / Usage or Command / Accuracy Tests 均为空,Checklist 全未勾选。
标题建议(可直接复制):
[BugFix] Fix CUDAGraph capture order in deterministic mode to use clean RNG state
PR 描述建议(可直接复制):
## Motivation
在 deterministic 模式下,`graph_optimize_and_warm_up_model` 先执行 `capture_model()` 再重置 RNG 和 `share_inputs`,导致 CUDAGraph 在 warmup 之后的 stale 状态(被消耗的 CUDA RNG、infer_seed/stop_flags/seq_lens 等旧数据)下完成捕获,进而引发 k3(step0)计算结果不确定。
## Modifications
- `fastdeploy/worker/gpu_worker.py`:将 `capture_model()` 和 `capture_block_wise_graphs()` 的调用位置从 deterministic 模式 RNG/share_inputs 重置块之前移至之后,确保 CUDAGraph 在干净初始状态下完成捕获。
## Usage or Command
N/A
## Accuracy Tests
N/A(此修复为执行顺序调整,不影响精度输出;deterministic 模式下输出一致性需在完整 CI 中验证)
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.(当前为临时验证 PR,正式合入前补充 `tests/graph_optimization/` 单测)
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
代码逻辑改动方向正确:在 deterministic 模式下,先重置 RNG 和 share_inputs 再执行 CUDAGraph capture,可确保捕获的图从确定性初始状态出发,修复思路清晰。建议正式合入前完善 PR 规范、补充多硬件同步检查,并将临时 [just test] 前缀去除。
| self.model_runner.share_inputs.reset_share_inputs() | ||
|
|
||
| # Capture CUDAGraph for decode phase (all modes) | ||
| self.model_runner.capture_model() |
There was a problem hiding this comment.
🟡 建议 多硬件同步检查
gpu_worker.py 修改了 CUDAGraph capture 的执行顺序(capture_model() 移到 deterministic reset 之后),如果其他硬件 Worker(dcu_worker.py、xpu_worker.py、iluvatar_worker.py、metax_worker.py 等)中有相同的 CUDAGraph capture 逻辑,是否也存在同样的 deterministic 模式问题?
建议:检查 fastdeploy/worker/{xpu,dcu,gcu,hpu,iluvatar,metax}_worker.py 中是否有类似 capture 顺序问题,如有,同步修复。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.