[Feature]support computing entropy in fd-runner by rain7996 · Pull Request #7889 · PaddlePaddle/FastDeploy

rain7996 · 2026-05-21T17:31:15Z

Motivation

Support computing entropy in the GPU model runner (fd-runner) path. Previously entropy_utils.py assumed 2D tensors from HPU/GCU runners, causing failures when used with the GPU runner which provides 1D tensors.

Modifications

Add ndim checks in entropy_utils.py to handle both 1D (GPU runner) and 2D (HPU/GCU) shapes for seq_lens_encoder and seq_lens_this_time.
Simplify temperature scaling loop: iterate per-batch instead of per-token with repeat_interleave.
Skip zero-length sequences to avoid incorrect entropy appending.
Fix gpu_model_runner.py: slice seq_lens_this_time_buffer to actual batch_size to prevent shape mismatch.

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server --model <model> --enable-entropy

Accuracy Tests

Verified entropy output matches expected values on GPU runner with batch decode and prefill mixed workloads.

Checklist

Add at least a tag in the PR title: [BugFix], [Feature]
Format your code, run pre-commit before commit.
Add unit tests. No unit tests added — changes are verified via integration test with entropy enabled.
Provide accuracy results.

CLAassistant · 2026-05-21T17:31:22Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2026-05-21T17:31:27Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-21T18:06:24Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-23 01:55:37

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 8c97537
Merge base: 8080a25 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

当前 Required 任务 1 个失败、0 个运行中、0 个等待中，失败会阻塞合并；建议先修复 Required 单测失败后重新触发 CI。Optional 任务中另有 1 个失败、1 个运行中，仅供参考。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	38	2	1	0	0

2 任务状态汇总

日志列说明：失败任务直接使用 Job 链接；运行中任务提供 Workflow 链接。

2.1 Required任务 : 9/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	1h20m	PR问题：entropy按batch索引logits越界	按有效序列映射logits索引	Job	-
✅	其余 9 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 29/31 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Trigger Jenkins for PR`	7m21s	Job	-
⏳	`CI_HPU`	-	Workflow	-
✅	其余 29 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 用例失败（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 用例失败
置信度: 高
根因摘要: entropy按batch索引logits导致越界
分析器: 通用分析(fallback)

失败用例:

测试	错误	根因
`tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_basic_functionality`	IndexError: logits[2] 越界	PR 将普通 entropy 计算改为按 batch 下标访问 logits，但 logits 只包含有效序列行
`tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_entropy_list_clear`	IndexError: logits[2] 越界	中间 batch 的 `seq_lens_this_time=0` 后，第三个请求仍用 batch 下标 2 访问 2 行 logits
`tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_negative_inf_clip`	IndexError: logits[2] 越界	同上
`tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_temperature_effect`	IndexError: logits[2] 越界	同上
`tests/model_executor/test_entropy_utils.py::TestSpeculateCalculateLogitsEntropy::test_negative_inf_clip`	IndexError: logits[2] 越界	该用例实际调用普通 `calculate_logits_entropy`，存在同类索引问题

根因详情:
CI 日志显示 tests/model_executor/test_entropy_utils.py 共有 5 个失败、3 个通过，失败均为 IndexError: The starting index 2 of slice is out of bounds in tensor 0-th axis。本 PR 在 fastdeploy/model_executor/entropy_utils.py:50-57 将原先按 token 展开的 batch_id_per_token 逻辑改成 for i in range(real_bsz) 并直接执行 logits[i] / get_entropy(logits[:real_bsz])。但单测和现有 entropy 语义中，real_bsz=3 时第二个请求长度为 0，logits 只有两个有效序列行；第三个请求应映射到 logits[1]，现在直接访问 logits[2] 导致越界。

关键日志:

FAILED tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_temperature_effect
fastdeploy/model_executor/entropy_utils.py:55: in calculate_logits_entropy
    logits[i] = logits[i].scale_(1 / t)
E IndexError: (OutOfRange) The starting index 2 of slice is out of bounds
FAILED tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_basic_functionality
FAILED tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_entropy_list_clear
FAILED tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_negative_inf_clip
FAILED tests/model_executor/test_entropy_utils.py::TestSpeculateCalculateLogitsEntropy::test_negative_inf_clip

修复建议:

修改 fastdeploy/model_executor/entropy_utils.py:50-62，不要用 batch 下标直接索引 logits；应构建非零 real_seq_lens 对应的有效 logits 行号（或保留原 batch_id_per_token/token 顺序映射），对 seq_len=0 的 batch 跳过但不消耗 logits 行。
同步检查 get_entropy(logits[:real_bsz])，应改为只对有效 logits 行计算 entropy，并按有效 batch 映射写回 entropy_list[i]，避免 real_bsz 大于 logits 行数。
对 temperature 统一 flatten（兼容 [N] 与 [N,1]），避免后续在 if t > 0 中使用非标量 Tensor 产生歧义。

修复建议摘要: 按有效序列映射logits索引

关联变更: fastdeploy/model_executor/entropy_utils.py:50-62；失败测试 tests/model_executor/test_entropy_utils.py:55-81

链接: 查看日志

codecov-commenter · 2026-05-21T18:11:48Z

Codecov Report

❌ Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@8080a25). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/entropy_utils.py	86.66%	0 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7889   +/-   ##
==========================================
  Coverage           ?   63.57%           
==========================================
  Files              ?      462           
  Lines              ?    64510           
  Branches           ?     9894           
==========================================
  Hits               ?    41012           
  Misses             ?    20713           
  Partials           ?     2785

Flag	Coverage Δ
GPU	`72.69% <87.09%> (?)`
XPU	`7.11% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 02:32:16

📋 Review 摘要

PR 概述：为 GPU runner 路径支持 entropy 计算，修复 entropy_utils.py 中对 seq_lens 张量形状的硬编码假设，并修复 gpu_model_runner.py 中 buffer 切片问题。
变更范围：fastdeploy/model_executor/entropy_utils.py、fastdeploy/worker/gpu_model_runner.py
影响面 Tag：[Feature] [Executor]

问题

级别	文件	概述
🟡 建议	`entropy_utils.py:56`	`logits[:real_bsz]` 切片假设 logits 为 [batch, vocab]，HPU/GCU prefill 场景下兼容性需确认
🟡 建议	`gpu_model_runner.py:1244`	A6 多硬件同步：`seq_lens_this_time_buffer[:batch_size]` 修复是否需同步到其他 model_runner
📝 PR 规范	无	标题缺少空格、Checklist 第5项缺失、`pre-commit` 未运行

📝 PR 规范检查

存在以下规范问题：①标题 [Feature]support 缺少 ] 后空格；②Checklist 缺少第5条（release branch cherry-pick 提醒）；③pre-commit 未运行（[ ] 状态）。

标题建议（可直接复制）：

[Feature] Support computing entropy in fd-runner

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
`entropy_utils.py` 中对 `seq_lens_encoder` / `seq_lens_this_time` 的 `.squeeze(1)` 调用硬编码了 2D 张量假设（HPU/GCU runner 格式），导致 GPU runner 提供 1D 张量时调用失败，无法在 GPU runner 路径下正常计算 entropy。本 PR 修复此兼容性问题并同步修复 `gpu_model_runner.py` 中 `seq_lens_this_time_buffer` 未按 batch_size 切片导致的 shape 不匹配。

## Modifications
- `fastdeploy/model_executor/entropy_utils.py`：在 `calculate_logits_entropy` 和 `speculate_calculate_logits_entropy` 中对 `seq_lens_encoder` / `seq_lens_this_time` 增加 `ndim` 检查，兼容 GPU runner（1D [N]）和 HPU/GCU runner（2D [N,1]）两种形状。
- `fastdeploy/model_executor/entropy_utils.py`：简化 `calculate_logits_entropy` 中温度缩放循环，按 batch 维度迭代并跳过长度为 0 的序列，避免 entropy 错误追加。
- `fastdeploy/worker/gpu_model_runner.py`：`_dummy_prefill_inputs` 中将 `seq_lens_this_time_buffer` 切片至实际 `batch_size`，防止 shape 不匹配。

## Usage or Command
```bash
python -m fastdeploy.entrypoints.openai.api_server --model <model> --enable-entropy
```

## Accuracy Tests
Verified entropy output matches expected values on GPU runner with batch decode and prefill mixed workloads.

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体改动方向正确，ndim 兼容处理思路清晰，gpu_model_runner.py 的 buffer 切片修复也合理。建议作者确认 logits 形状约定对 HPU/GCU prefill 场景的影响，并检查其他硬件 model_runner 是否有相同的 buffer 切片问题。

PaddlePaddle-bot · 2026-05-21T18:37:15Z

+        t = temperature[i]
+        if t > 0 and t != 1.0:
+            logits[i] = logits[i].scale_(1 / t)



🟡 建议 get_entropy(logits[:real_bsz]) 隐含了 logits 形状为 [batch_size, vocab_size] 的假设。

对于 GPU runner decode 步（每序列1个 token），logits 确实是 [batch_size, vocab_size]，此处切片正确。

但对于 HPU/GCU prefill 场景，若 logits 为 [total_tokens, vocab_size] 排布，logits[:real_bsz] 只取前 real_bsz 行（即前 real_bsz 个 token 的 logits），而非每个序列最后一个 token 的 logits，会导致 entropy 计算结果错误。

建议在函数入口或注释中明确 logits 的形状约定，或添加断言：

# logits expected shape: [batch_size, vocab_size] (GPU runner) # For HPU/GCU prefill, caller should pass only the last-token logits per sequence assert logits.shape[0] >= real_bsz, f"logits dim0 {logits.shape[0]} < real_bsz {real_bsz}"

PaddlePaddle-bot · 2026-05-21T18:37:15Z

                idx * block_num, (idx + 1) * block_num, 1
            )
-        self.share_inputs["seq_lens_this_time"] = self.share_inputs["seq_lens_this_time_buffer"]
+        self.share_inputs["seq_lens_this_time"] = self.share_inputs["seq_lens_this_time_buffer"][:batch_size]


🟡 建议 A6 多硬件同步检查：此处修复了 seq_lens_this_time_buffer[:batch_size] 切片，建议同步确认 hpu_model_runner.py、gcu_model_runner.py、dcu_model_runner.py 等其他 _dummy_prefill_inputs 实现中是否存在同类未切片的 buffer 赋值，避免其他硬件在 entropy 使能时遇到相同的 shape 不匹配问题。

support compute entropy in fd-runner

8c97537

rain7996 had a problem deploying to Metax_ci May 21, 2026 17:31 — with GitHub Actions Failure

PaddlePaddle-bot reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]support computing entropy in fd-runner#7889

[Feature]support computing entropy in fd-runner#7889
rain7996 wants to merge 1 commit into
PaddlePaddle:developfrom
rain7996:develop

rain7996 commented May 21, 2026

Uh oh!

CLAassistant commented May 21, 2026

Uh oh!

paddle-bot Bot commented May 21, 2026

Uh oh!

PaddlePaddle-bot commented May 21, 2026 •

edited

Loading

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

codecov-commenter commented May 21, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 21, 2026

Uh oh!

PaddlePaddle-bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rain7996 commented May 21, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented May 21, 2026

Uh oh!

paddle-bot Bot commented May 21, 2026

Uh oh!

PaddlePaddle-bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 9/10 通过

2.2 可选任务 — 29/31 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

codecov-commenter commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PaddlePaddle-bot commented May 21, 2026 •

edited

Loading

codecov-commenter commented May 21, 2026 •

edited

Loading