Skip to content

[Feature]support computing entropy in fd-runner#7889

Open
rain7996 wants to merge 1 commit into
PaddlePaddle:developfrom
rain7996:develop
Open

[Feature]support computing entropy in fd-runner#7889
rain7996 wants to merge 1 commit into
PaddlePaddle:developfrom
rain7996:develop

Conversation

@rain7996
Copy link
Copy Markdown
Contributor

Motivation

Support computing entropy in the GPU model runner (fd-runner) path. Previously entropy_utils.py assumed 2D tensors from HPU/GCU runners, causing failures when used with the GPU runner which provides 1D tensors.

Modifications

  • Add ndim checks in entropy_utils.py to handle both 1D (GPU runner) and 2D (HPU/GCU) shapes for seq_lens_encoder and seq_lens_this_time.
  • Simplify temperature scaling loop: iterate per-batch instead of per-token with repeat_interleave.
  • Skip zero-length sequences to avoid incorrect entropy appending.
  • Fix gpu_model_runner.py: slice seq_lens_this_time_buffer to actual batch_size to prevent shape mismatch.

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server --model <model> --enable-entropy

Accuracy Tests

Verified entropy output matches expected values on GPU runner with batch decode and prefill mixed workloads.

Checklist

  • Add at least a tag in the PR title: [BugFix], [Feature]
  • Format your code, run pre-commit before commit.
  • Add unit tests. No unit tests added — changes are verified via integration test with entropy enabled.
  • Provide accuracy results.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 21, 2026

Thanks for your contribution!

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 21, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-23 01:55:37

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

当前 Required 任务 1 个失败、0 个运行中、0 个等待中,失败会阻塞合并;建议先修复 Required 单测失败后重新触发 CI。Optional 任务中另有 1 个失败、1 个运行中,仅供参考。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
41(0) 41 38 2 1 0 0

2 任务状态汇总

日志列说明:失败任务直接使用 Job 链接;运行中任务提供 Workflow 链接。

2.1 Required任务 : 9/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 1h20m PR问题:entropy按batch索引logits越界 按有效序列映射logits索引 Job -
其余 9 个必选任务通过 - - - - -

2.2 可选任务 — 29/31 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Trigger Jenkins for PR 7m21s Job -
CI_HPU - Workflow -
其余 29 个可选任务通过 - - -

3 失败详情(仅 required)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 用例失败(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 用例失败
  • 置信度: 高
  • 根因摘要: entropy按batch索引logits导致越界
  • 分析器: 通用分析(fallback)

失败用例:

测试 错误 根因
tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_basic_functionality IndexError: logits[2] 越界 PR 将普通 entropy 计算改为按 batch 下标访问 logits,但 logits 只包含有效序列行
tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_entropy_list_clear IndexError: logits[2] 越界 中间 batch 的 seq_lens_this_time=0 后,第三个请求仍用 batch 下标 2 访问 2 行 logits
tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_negative_inf_clip IndexError: logits[2] 越界 同上
tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_temperature_effect IndexError: logits[2] 越界 同上
tests/model_executor/test_entropy_utils.py::TestSpeculateCalculateLogitsEntropy::test_negative_inf_clip IndexError: logits[2] 越界 该用例实际调用普通 calculate_logits_entropy,存在同类索引问题

根因详情:
CI 日志显示 tests/model_executor/test_entropy_utils.py 共有 5 个失败、3 个通过,失败均为 IndexError: The starting index 2 of slice is out of bounds in tensor 0-th axis。本 PR 在 fastdeploy/model_executor/entropy_utils.py:50-57 将原先按 token 展开的 batch_id_per_token 逻辑改成 for i in range(real_bsz) 并直接执行 logits[i] / get_entropy(logits[:real_bsz])。但单测和现有 entropy 语义中,real_bsz=3 时第二个请求长度为 0,logits 只有两个有效序列行;第三个请求应映射到 logits[1],现在直接访问 logits[2] 导致越界。

关键日志:

FAILED tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_temperature_effect
fastdeploy/model_executor/entropy_utils.py:55: in calculate_logits_entropy
    logits[i] = logits[i].scale_(1 / t)
E IndexError: (OutOfRange) The starting index 2 of slice is out of bounds
FAILED tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_basic_functionality
FAILED tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_entropy_list_clear
FAILED tests/model_executor/test_entropy_utils.py::TestCalculateLogitsEntropy::test_negative_inf_clip
FAILED tests/model_executor/test_entropy_utils.py::TestSpeculateCalculateLogitsEntropy::test_negative_inf_clip

修复建议:

  1. 修改 fastdeploy/model_executor/entropy_utils.py:50-62,不要用 batch 下标直接索引 logits;应构建非零 real_seq_lens 对应的有效 logits 行号(或保留原 batch_id_per_token/token 顺序映射),对 seq_len=0 的 batch 跳过但不消耗 logits 行。
  2. 同步检查 get_entropy(logits[:real_bsz]),应改为只对有效 logits 行计算 entropy,并按有效 batch 映射写回 entropy_list[i],避免 real_bsz 大于 logits 行数。
  3. temperature 统一 flatten(兼容 [N][N,1]),避免后续在 if t > 0 中使用非标量 Tensor 产生歧义。

修复建议摘要: 按有效序列映射logits索引

关联变更: fastdeploy/model_executor/entropy_utils.py:50-62;失败测试 tests/model_executor/test_entropy_utils.py:55-81

链接: 查看日志

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 21, 2026

Codecov Report

❌ Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@8080a25). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/entropy_utils.py 86.66% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7889   +/-   ##
==========================================
  Coverage           ?   63.57%           
==========================================
  Files              ?      462           
  Lines              ?    64510           
  Branches           ?     9894           
==========================================
  Hits               ?    41012           
  Misses             ?    20713           
  Partials           ?     2785           
Flag Coverage Δ
GPU 72.69% <87.09%> (?)
XPU 7.11% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 02:32:16

📋 Review 摘要

PR 概述:为 GPU runner 路径支持 entropy 计算,修复 entropy_utils.py 中对 seq_lens 张量形状的硬编码假设,并修复 gpu_model_runner.py 中 buffer 切片问题。
变更范围fastdeploy/model_executor/entropy_utils.pyfastdeploy/worker/gpu_model_runner.py
影响面 Tag[Feature] [Executor]

问题

级别 文件 概述
🟡 建议 entropy_utils.py:56 logits[:real_bsz] 切片假设 logits 为 [batch, vocab],HPU/GCU prefill 场景下兼容性需确认
🟡 建议 gpu_model_runner.py:1244 A6 多硬件同步:seq_lens_this_time_buffer[:batch_size] 修复是否需同步到其他 model_runner
📝 PR 规范 标题缺少空格、Checklist 第5项缺失、pre-commit 未运行

📝 PR 规范检查

存在以下规范问题:①标题 [Feature]support 缺少 ] 后空格;②Checklist 缺少第5条(release branch cherry-pick 提醒);③pre-commit 未运行([ ] 状态)。

标题建议(可直接复制):

  • [Feature] Support computing entropy in fd-runner

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation
`entropy_utils.py` 中对 `seq_lens_encoder` / `seq_lens_this_time``.squeeze(1)` 调用硬编码了 2D 张量假设(HPU/GCU runner 格式),导致 GPU runner 提供 1D 张量时调用失败,无法在 GPU runner 路径下正常计算 entropy。本 PR 修复此兼容性问题并同步修复 `gpu_model_runner.py``seq_lens_this_time_buffer` 未按 batch_size 切片导致的 shape 不匹配。

## Modifications
- `fastdeploy/model_executor/entropy_utils.py`:在 `calculate_logits_entropy``speculate_calculate_logits_entropy` 中对 `seq_lens_encoder` / `seq_lens_this_time` 增加 `ndim` 检查,兼容 GPU runner(1D [N])和 HPU/GCU runner(2D [N,1])两种形状。
- `fastdeploy/model_executor/entropy_utils.py`:简化 `calculate_logits_entropy` 中温度缩放循环,按 batch 维度迭代并跳过长度为 0 的序列,避免 entropy 错误追加。
- `fastdeploy/worker/gpu_model_runner.py``_dummy_prefill_inputs` 中将 `seq_lens_this_time_buffer` 切片至实际 `batch_size`,防止 shape 不匹配。

## Usage or Command
```bash
python -m fastdeploy.entrypoints.openai.api_server --model <model> --enable-entropy
```

## Accuracy Tests
Verified entropy output matches expected values on GPU runner with batch decode and prefill mixed workloads.

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体改动方向正确,ndim 兼容处理思路清晰,gpu_model_runner.py 的 buffer 切片修复也合理。建议作者确认 logits 形状约定对 HPU/GCU prefill 场景的影响,并检查其他硬件 model_runner 是否有相同的 buffer 切片问题。

t = temperature[i]
if t > 0 and t != 1.0:
logits[i] = logits[i].scale_(1 / t)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 get_entropy(logits[:real_bsz]) 隐含了 logits 形状为 [batch_size, vocab_size] 的假设。

对于 GPU runner decode 步(每序列1个 token),logits 确实是 [batch_size, vocab_size],此处切片正确。

但对于 HPU/GCU prefill 场景,若 logits 为 [total_tokens, vocab_size] 排布,logits[:real_bsz] 只取前 real_bsz 行(即前 real_bsz 个 token 的 logits),而非每个序列最后一个 token 的 logits,会导致 entropy 计算结果错误。

建议在函数入口或注释中明确 logits 的形状约定,或添加断言:

# logits expected shape: [batch_size, vocab_size] (GPU runner)
# For HPU/GCU prefill, caller should pass only the last-token logits per sequence
assert logits.shape[0] >= real_bsz, f"logits dim0 {logits.shape[0]} < real_bsz {real_bsz}"

idx * block_num, (idx + 1) * block_num, 1
)
self.share_inputs["seq_lens_this_time"] = self.share_inputs["seq_lens_this_time_buffer"]
self.share_inputs["seq_lens_this_time"] = self.share_inputs["seq_lens_this_time_buffer"][:batch_size]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 A6 多硬件同步检查:此处修复了 seq_lens_this_time_buffer[:batch_size] 切片,建议同步确认 hpu_model_runner.pygcu_model_runner.pydcu_model_runner.py 等其他 _dummy_prefill_inputs 实现中是否存在同类未切片的 buffer 赋值,避免其他硬件在 entropy 使能时遇到相同的 shape 不匹配问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants