[Feature]support computing entropy in fd-runner#7889
Conversation
|
|
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览当前 Required 任务 1 个失败、0 个运行中、0 个等待中,失败会阻塞合并;建议先修复 Required 单测失败后重新触发 CI。Optional 任务中另有 1 个失败、1 个运行中,仅供参考。
2 任务状态汇总日志列说明:失败任务直接使用 Job 链接;运行中任务提供 Workflow 链接。 2.1 Required任务 : 9/10 通过
2.2 可选任务 — 29/31 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 用例失败(置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例:
根因详情: 关键日志: 修复建议:
修复建议摘要: 按有效序列映射logits索引 关联变更: 链接: 查看日志 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7889 +/- ##
==========================================
Coverage ? 63.57%
==========================================
Files ? 462
Lines ? 64510
Branches ? 9894
==========================================
Hits ? 41012
Misses ? 20713
Partials ? 2785
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-22 02:32:16
📋 Review 摘要
PR 概述:为 GPU runner 路径支持 entropy 计算,修复 entropy_utils.py 中对 seq_lens 张量形状的硬编码假设,并修复 gpu_model_runner.py 中 buffer 切片问题。
变更范围:fastdeploy/model_executor/entropy_utils.py、fastdeploy/worker/gpu_model_runner.py
影响面 Tag:[Feature] [Executor]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | entropy_utils.py:56 |
logits[:real_bsz] 切片假设 logits 为 [batch, vocab],HPU/GCU prefill 场景下兼容性需确认 |
| 🟡 建议 | gpu_model_runner.py:1244 |
A6 多硬件同步:seq_lens_this_time_buffer[:batch_size] 修复是否需同步到其他 model_runner |
| 📝 PR 规范 | 无 | 标题缺少空格、Checklist 第5项缺失、pre-commit 未运行 |
📝 PR 规范检查
存在以下规范问题:①标题 [Feature]support 缺少 ] 后空格;②Checklist 缺少第5条(release branch cherry-pick 提醒);③pre-commit 未运行([ ] 状态)。
标题建议(可直接复制):
[Feature] Support computing entropy in fd-runner
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
`entropy_utils.py` 中对 `seq_lens_encoder` / `seq_lens_this_time` 的 `.squeeze(1)` 调用硬编码了 2D 张量假设(HPU/GCU runner 格式),导致 GPU runner 提供 1D 张量时调用失败,无法在 GPU runner 路径下正常计算 entropy。本 PR 修复此兼容性问题并同步修复 `gpu_model_runner.py` 中 `seq_lens_this_time_buffer` 未按 batch_size 切片导致的 shape 不匹配。
## Modifications
- `fastdeploy/model_executor/entropy_utils.py`:在 `calculate_logits_entropy` 和 `speculate_calculate_logits_entropy` 中对 `seq_lens_encoder` / `seq_lens_this_time` 增加 `ndim` 检查,兼容 GPU runner(1D [N])和 HPU/GCU runner(2D [N,1])两种形状。
- `fastdeploy/model_executor/entropy_utils.py`:简化 `calculate_logits_entropy` 中温度缩放循环,按 batch 维度迭代并跳过长度为 0 的序列,避免 entropy 错误追加。
- `fastdeploy/worker/gpu_model_runner.py`:`_dummy_prefill_inputs` 中将 `seq_lens_this_time_buffer` 切片至实际 `batch_size`,防止 shape 不匹配。
## Usage or Command
```bash
python -m fastdeploy.entrypoints.openai.api_server --model <model> --enable-entropy
```
## Accuracy Tests
Verified entropy output matches expected values on GPU runner with batch decode and prefill mixed workloads.
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
整体改动方向正确,ndim 兼容处理思路清晰,gpu_model_runner.py 的 buffer 切片修复也合理。建议作者确认 logits 形状约定对 HPU/GCU prefill 场景的影响,并检查其他硬件 model_runner 是否有相同的 buffer 切片问题。
| t = temperature[i] | ||
| if t > 0 and t != 1.0: | ||
| logits[i] = logits[i].scale_(1 / t) | ||
|
|
There was a problem hiding this comment.
🟡 建议 get_entropy(logits[:real_bsz]) 隐含了 logits 形状为 [batch_size, vocab_size] 的假设。
对于 GPU runner decode 步(每序列1个 token),logits 确实是 [batch_size, vocab_size],此处切片正确。
但对于 HPU/GCU prefill 场景,若 logits 为 [total_tokens, vocab_size] 排布,logits[:real_bsz] 只取前 real_bsz 行(即前 real_bsz 个 token 的 logits),而非每个序列最后一个 token 的 logits,会导致 entropy 计算结果错误。
建议在函数入口或注释中明确 logits 的形状约定,或添加断言:
# logits expected shape: [batch_size, vocab_size] (GPU runner)
# For HPU/GCU prefill, caller should pass only the last-token logits per sequence
assert logits.shape[0] >= real_bsz, f"logits dim0 {logits.shape[0]} < real_bsz {real_bsz}"| idx * block_num, (idx + 1) * block_num, 1 | ||
| ) | ||
| self.share_inputs["seq_lens_this_time"] = self.share_inputs["seq_lens_this_time_buffer"] | ||
| self.share_inputs["seq_lens_this_time"] = self.share_inputs["seq_lens_this_time_buffer"][:batch_size] |
There was a problem hiding this comment.
🟡 建议 A6 多硬件同步检查:此处修复了 seq_lens_this_time_buffer[:batch_size] 切片,建议同步确认 hpu_model_runner.py、gcu_model_runner.py、dcu_model_runner.py 等其他 _dummy_prefill_inputs 实现中是否存在同类未切片的 buffer 赋值,避免其他硬件在 entropy 使能时遇到相同的 shape 不匹配问题。
Motivation
Support computing entropy in the GPU model runner (fd-runner) path. Previously
entropy_utils.pyassumed 2D tensors from HPU/GCU runners, causing failures when used with the GPU runner which provides 1D tensors.Modifications
entropy_utils.pyto handle both 1D (GPU runner) and 2D (HPU/GCU) shapes forseq_lens_encoderandseq_lens_this_time.repeat_interleave.gpu_model_runner.py: sliceseq_lens_this_time_bufferto actualbatch_sizeto prevent shape mismatch.Usage or Command
Accuracy Tests
Verified entropy output matches expected values on GPU runner with batch decode and prefill mixed workloads.
Checklist
[BugFix],[Feature]pre-commitbefore commit.