[Cherry-Pick] [BugFix] fix all reduce fusion accurate issue (#7923)#7922
[Cherry-Pick] [BugFix] fix all reduce fusion accurate issue (#7923)#7922BingooYang wants to merge 2 commits into
Conversation
|
Thanks for your contribution! |
|
root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-25 19:56:20
📋 Review 摘要
PR 概述:修复同时启用 FD_USE_PHI_RMSNORM=1 和 --enable-flashinfer-allreduce-fusion 时 proxy_rmsnorm 分支未走 fused allreduce 路径导致的精度问题。
变更范围:fastdeploy/model_executor/layers/normalization.py、tests/layers/trtllm_allreduce_rms_fusion.py
影响面 Tag:[OP]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | normalization.py:288 |
assert 用于运行时状态校验,Python -O 下会静默失效 |
| 📝 PR 规范 | — | 目标分支为 release/2.6,标题缺少 [Cherry-Pick] 前缀 |
📝 PR 规范检查
目标分支为 release/2.6(非 develop),根据规范标题需使用 Cherry-Pick 格式:[Cherry-Pick][Tag] 标题描述(#原PR号),当前标题缺少 [Cherry-Pick] 前缀及原 PR 号。
标题建议(可直接复制,原 PR 号请替换 <原PR号>):
[Cherry-Pick][BugFix] fix all reduce fusion accurate issue(#<原PR号>)
PR 描述结构完整,各必填 section(Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist)均已填写,无需修改。
总体评价
修复思路清晰:将 allreduce fusion 的启用条件(tp_size > 1、shape 上限、residual 非空、CUDA 平台)统一收敛到 use_allreduce_fused 变量,并在 proxy_rmsnorm 分支中同步走 fused 路径,消除了精度不一致的根因。测试新增了 TestRMSNormProxyAllreduceFused 覆盖 proxy 分支的三条关键路径(fused / fusion-disabled / token-too-large),质量较高。主要建议是将 hot path 中的 assert 改为显式 raise 以避免 Python 优化模式下的静默失效。
| input_tensor=x, | ||
| residual=residual_input, | ||
| weight=self.weight, | ||
| eps=self.eps, |
There was a problem hiding this comment.
🟡 建议 assert 被用于运行时内部状态校验
Python 在 -O(优化)模式下,所有 assert 语句会被完全移除,导致此处失败时静默吞掉错误、后续代码继续执行。
建议改为显式抛出异常:
if norm_out[0] is None:
raise RuntimeError("Trtllm-all-reduce fusion failed!")
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务当前 9/10 通过,仍有 1 个 required 失败任务(Approval,需要人工审批),无运行中/等待中的 required 任务;审批完成后 CI 才可继续满足合入要求。Optional 失败 2 个,仅供参考,不阻塞 required 结论。
2 任务状态汇总2.1 Required任务 : 9/10 通过
2.2 可选任务 — 24/26 通过
3 失败详情(仅 required)Approval — 需要人工审批(置信度: 高)该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。 4 代码与测试上下文核对
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7922 +/- ##
==============================================
Coverage ? 72.45%
==============================================
Files ? 382
Lines ? 54464
Branches ? 8523
==============================================
Hits ? 39460
Misses ? 12236
Partials ? 2768
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
修复同时开启 export FD_USE_PHI_RMSNORM=1 和 --enable-flashinfer-allreduce-fusion 引起的精度问题
Modifications
开启 export FD_USE_PHI_RMSNORM=1 环境变量,RMSNorm其中一个分支适配 all reduce fusion
Usage or Command
NA
Accuracy Tests
done
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.