[Scheduler] Increase sleep interval in fetch loops and cancel schedule threashold for prefill instance by liyonghua0910 · Pull Request #7871 · PaddlePaddle/FastDeploy

liyonghua0910 · 2026-05-21T03:07:22Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-21T03:07:28Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-21T03:21:17Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-23 10:01:57

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 6473a75
Merge base: 31b12ee (branch: release/2.6)
查看完整 Diff
CI 详情

1 任务总览

Required 任务存在 2 个失败，需处理后再合入：Approval 需要人工审批；Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 因差异覆盖率未达 80% 阈值失败。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
37(0)	37	31	5	1	0	0

2 任务状态汇总

2.1 Required任务 : 8/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	8s	需要 Approval	请通过人工审批	Job	-
❌	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	1h12m	PR问题：差异覆盖率62%，3行未覆盖	补测3处未覆盖分支	Job	-
✅	其余 8 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 23/27 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	20m16s	Job	-
❌	`Check PR Template`	15s	Job	-
❌	`Trigger Jenkins for PR`	1m4s	Job	-
⏳	`CI_HPU`	-	Workflow	-
✅	其余 23 个可选任务通过	-	-	-

3 失败详情（仅 required）

Approval — 需要人工审批（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率阈值不达标（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 覆盖率阈值不达标
置信度: 高
根因摘要: 差异覆盖率62%，3行未覆盖
分析器: ci_analyze_unittest_fastdeploy

失败用例: 无。日志显示 TEST_EXIT_CODE=0，单元测试已通过，失败发生在覆盖率阈值校验阶段。

根因详情:
本次 PR 修改了 13 行代码，覆盖率报告统计其中 8 行参与差异覆盖率计算，只有 5 行被测试覆盖，总覆盖率为 62%，低于 80% 阈值，因此 Verify Code Coverage Threshold (80%) 步骤以 exit code 9 失败。未覆盖行集中在 fastdeploy/engine/common_engine_prepare_mixin.py:251,254 的 _fetch_loop() 正常/异常 sleep 分支，以及 fastdeploy/output/token_processor.py:671 的 prefill 发送 cache 等待分支；这些行均为本 PR 修改行，属于 PR 覆盖率问题。

代码上下文核验:
已读取变更文件与相关测试：tests/engine/test_common_engine.py 目前覆盖 _fetch_request_prefill() 路径但未直接覆盖 _fetch_loop() 正常/异常 sleep 分支；tests/output/test_token_processor.py 覆盖 prefill 成功/失败即时返回路径，但未覆盖等待分支中的 time.sleep(0.005)。

关键日志:

Coverage generation failed (exit code 9)
Diff Coverage
fastdeploy/engine/common_engine_prepare_mixin.py (0.0%): Missing lines 251,254
fastdeploy/engine/sched/resource_manager_v1.py (100%)
fastdeploy/output/token_processor.py (0.0%): Missing lines 671
fastdeploy/splitwise/splitwise_connector.py (100%)
Total:   8 lines
Missing: 3 lines
"total_percent_covered": 62

修复建议:

在 tests/engine/test_common_engine.py 补充 _fetch_loop() 单测，patch fastdeploy.engine.common_engine_prepare_mixin.time.sleep，构造 fetch_fn 正常执行和抛异常两条路径，覆盖 fastdeploy/engine/common_engine_prepare_mixin.py:251、254。
在 tests/output/test_token_processor.py 现有 test_recycle_resources_prefill_* 附近补充 prefill 等待发送 cache 的用例，mock engine_worker_queue.get_finished_req() 先返回空列表再返回 [(task_id, "finished")]，并 patch fastdeploy.output.token_processor.time.sleep，覆盖 fastdeploy/output/token_processor.py:671。

修复建议摘要: 补测3处未覆盖分支

关联变更: fastdeploy/engine/common_engine_prepare_mixin.py:251,254；fastdeploy/output/token_processor.py:671

codecov-commenter · 2026-05-21T04:29:23Z

Codecov Report

❌ Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@31b12ee). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/engine/common_engine_prepare_mixin.py	0.00%	2 Missing ⚠️
fastdeploy/engine/sched/resource_manager_v1.py	75.00%	1 Missing ⚠️
fastdeploy/output/token_processor.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7871   +/-   ##
==============================================
  Coverage               ?   72.37%           
==============================================
  Files                  ?      381           
  Lines                  ?    54235           
  Branches               ?     8475           
==============================================
  Hits                   ?    39250           
  Misses                 ?    12224           
  Partials               ?     2761

Flag	Coverage Δ
GPU	`72.37% <50.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-05-22T07:40:52Z

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 15:36:40

📋 Review 摘要

PR 概述：调整轮询 sleep 间隔（多处增大）、修复 block_num 可能为负的 Bug、取消 Prefill 实例的调度块阈值限制，并调减 prepare 线程数默认值。

变更范围：engine/、engine/sched/、output/、splitwise/、envs.py

影响面 Tag：[Engine] [Scheduler] [PD Disaggregation] [DataProcessor]

问题

级别	文件	概述
❓ 疑问	`engine/common_engine_prepare_mixin.py:251`	sleep 从 0.002s 增大到 0.02s（10倍），与 PR 标题"reduce sleep time"方向相反
❓ 疑问	`envs.py:195`	FD_PREFILL_PREPARE_REQ_THREAD_NUM 默认线程数 5→3，未提供性能基准数据
📝 PR 规范	—	标题缺 [Cherry-Pick] 格式及官方 Tag；描述所有 section 均为空

🔍 行内问题说明

engine/common_engine_prepare_mixin.py ~L251

❓ 疑问：sleep 间隔从 0.002s 增大到 0.02s（10 倍），但 PR 标题描述为"reduce sleep time"，方向相反。

当前改动将 _fetch_loop 的轮询频率降低了 10 倍，单次请求从入队到被 fetch 的最大延迟从 ~2ms 增加到 ~20ms。若意图是降低 CPU 轮询开销，请在 PR 描述中说明，并更正标题措辞（如改为"reduce CPU polling overhead by increasing fetch loop interval"）。

envs.py ~L195

❓ 疑问：FD_PREFILL_PREPARE_REQ_THREAD_NUM 默认值从 5 减少到 3，在高并发场景下可能降低 Prefill 实例的请求准备吞吐量。建议提供实验数据（如 QPS / TTFT 对比）说明 3 线程已足够，或在 PR 描述中注明可通过环境变量按需调整。

📝 PR 规范检查

当前 PR 提交到 release/2.6 分支（非 develop），按 D1 规范需使用 Cherry-Pick 格式 [Cherry-Pick][Tag] 描述(#原PR号)。此外标题含拼写错误（"threashold" → "threshold"），且描述模板五个 section 全部为空占位符。

标题建议（可直接复制）：

[Cherry-Pick][BugFix] fix negative block_num and adjust polling intervals for prefill instance(#XXXX)

（将 XXXX 替换为 develop 分支原 PR 号）

PR 描述建议（可直接复制）：

## Motivation
1. 修复 `get_new_block_nums` 中 `block_num` 可能为负数的 Bug，防止在 speculative decode 路径下 `min(block_num+1, max)` 仍返回负值
2. Prefill 实例取消 `can_schedule_block_num_threshold` 阈值限制，避免正在运行的请求因阈值检查被阻塞
3. 调整各轮询循环的 sleep 间隔，降低 CPU 自旋开销
4. `FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认线程数由 5 调整为 3

## Modifications
- `fastdeploy/engine/common_engine_prepare_mixin.py`: `_fetch_loop` 中 sleep 间隔 0.002s → 0.02s
- `fastdeploy/engine/sched/resource_manager_v1.py`:
  - `get_new_block_nums` 添加 `block_num = max(block_num, 0)` 防止负值
  - Prefill 实例（`splitwise_role == "prefill"`）`can_schedule_block_num_threshold` 固定为 0
- `fastdeploy/envs.py`: `FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值 5 → 3
- `fastdeploy/output/token_processor.py`: 等待 cache 传输的 sleep 0.002s → 0.005s
- `fastdeploy/splitwise/splitwise_connector.py`: `check_decode_allocated` 轮询间隔 0.001s → 0.005s

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

block_num = max(block_num, 0) 有效修复了负值 Bug，Prefill 实例取消阈值的逻辑合理；但 PR 标题描述的 "reduce sleep time" 与代码中所有 sleep 均增大（最高 10 倍）方向相反，建议作者确认意图并更正标题，同时补全 PR 描述各 section。

…e threashold for prefill instance

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 15:58:11

📋 Review 摘要

PR 概述：调整 PD 分离场景下多处循环的 sleep 时长、修复 prefill 实例调度阈值逻辑，并修复 block_num 可能为负值的 bug。
变更范围：fastdeploy/engine/（fetch 循环）、fastdeploy/engine/sched/（resource_manager）、fastdeploy/splitwise/、fastdeploy/output/、fastdeploy/envs.py
影响面 Tag：[PD Disaggregation] [Scheduler] [Engine]

问题

级别	文件	概述
❓ 疑问	`fastdeploy/engine/common_engine_prepare_mixin.py:251`	PR 标题称"reduce sleep time"，但 `_fetch_loop` sleep 从 2ms 增大至 20ms（10倍），与标题矛盾
🟡 建议	`fastdeploy/engine/sched/resource_manager_v1.py:248`	`block_num` 为负时静默截断为 0，无 warning 日志，建议补充以便排查过度分配
📝 PR 规范	—	标题缺 Tag；目标 `release/2.6` 分支未使用 Cherry-Pick 格式；Motivation / Modifications / Usage / Accuracy 各节均为空

📝 PR 规范检查

问题：

标题无 [Tag]，且 PR 目标分支为 release/2.6（非 develop），按规范需使用 Cherry-Pick 格式：[Cherry-Pick][Tag] 描述(#原PR号)；
PR 描述各节（Motivation、Modifications、Usage or Command、Accuracy Tests）均为空/占位符，Checklist 全部未勾选。

标题建议（可直接复制）：

[Cherry-Pick][PD Disaggregation] adjust sleep time in loops and cancel schedule threshold for prefill instance(#XXXX)

⚠️ 请将 #XXXX 替换为对应 develop 分支的原始 PR 编号。

PR 描述建议（可直接复制）：

## Motivation
针对 PD 分离（Prefill-Decode disaggregation）场景的稳定性和性能优化：
1. 修复 `get_new_block_nums` 中 `block_num` 可能为负值导致异常分配的 bug；
2. prefill 实例在 `_allocate_decode_and_extend` 中取消调度阈值，避免运行中的请求因 block 不足被阻塞；
3. 调整各轮询循环的 sleep 时长，减少 CPU 空转开销；
4. 将 `FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值从 5 降至 3，降低 prefill 实例准备线程数。

## Modifications
- `fastdeploy/engine/common_engine_prepare_mixin.py`：`_fetch_loop` 中 sleep 从 0.002s 调整为 0.02s
- `fastdeploy/engine/sched/resource_manager_v1.py`：`get_new_block_nums` 增加 `block_num = max(block_num, 0)` 防止负值；`_allocate_decode_and_extend` 中 prefill 实例跳过调度阈值检查（threshold=0）
- `fastdeploy/envs.py`：`FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值 5 → 3
- `fastdeploy/output/token_processor.py`：等待发送 cache 时 sleep 从 0.002s 调整为 0.005s
- `fastdeploy/splitwise/splitwise_connector.py`：`check_decode_allocated` 轮询 sleep 从 0.001s 调整为 0.005s

## Usage or Command
N/A

## Accuracy Tests
N/A（本次变更为调度逻辑和 sleep 调整，不影响模型精度）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

变更逻辑整体可理解，max(block_num, 0) 修复和 prefill 实例取消调度阈值方向合理。主要需要作者澄清 PR 标题与实际 sleep 方向的矛盾，并补充完整的描述和 Cherry-Pick 来源 PR 编号。

PaddlePaddle-bot · 2026-05-22T08:02:19Z

                    self._pause_cond.wait_for(lambda: not self.is_paused)
                fetch_fn()
-                time.sleep(0.002)
+                time.sleep(0.02)


❓ 疑问 PR 标题称 "reduce sleep time in loops"，但此处 sleep 从 0.002s（2ms）增大为 0.02s（20ms），增大了 10 倍。

同样地，token_processor.py（2ms→5ms）和 splitwise_connector.py（1ms→5ms）也均为增大。请确认：

PR 标题是否表述有误（应为 "adjust" 而非 "reduce"）？

_fetch_loop 增大 sleep 的预期效果是什么（减少 CPU 轮询开销）？降低轮询频率会增加请求入队延迟，是否已在高并发场景验证过影响？

PaddlePaddle-bot · 2026-05-22T08:02:19Z

            request.num_computed_tokens + num_new_tokens + self.config.cache_config.block_size - 1
        ) // self.config.cache_config.block_size - len(request.block_tables)
-
+        block_num = max(block_num, 0)


🟡 建议 当 block_num 计算结果为负值时（已分配的 block 超出当前所需），此处静默截断为 0。

截断逻辑本身正确，但建议补充 warning 日志，便于排查潜在的 block 过度分配根因：

if block_num < 0: self.llm_logger.warning( f"block_num negative ({block_num}) for req {request.request_id}, " f"num_computed={request.num_computed_tokens}, num_new={num_new_tokens}, " f"allocated={len(request.block_tables)}, clamping to 0" ) block_num = max(block_num, 0)

liyonghua0910 had a problem deploying to Metax_ci May 21, 2026 03:07 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

liyonghua0910 force-pushed the release/2.6+20260521_pd_test branch from 2fcfbeb to 130fb03 Compare May 22, 2026 07:19

liyonghua0910 had a problem deploying to Metax_ci May 22, 2026 07:19 — with GitHub Actions Failure

[Scheduler] Increase sleep interval in fetch loops and cancel schedul…

6473a75

…e threashold for prefill instance

liyonghua0910 force-pushed the release/2.6+20260521_pd_test branch from 130fb03 to 6473a75 Compare May 22, 2026 07:47

liyonghua0910 had a problem deploying to Metax_ci May 22, 2026 07:47 — with GitHub Actions Failure

liyonghua0910 changed the title ~~reduce sleep time in loops and cancel schedule threashold for prefill instance~~ [Scheduler] Increase sleep interval in fetch loops and cancel schedule threashold for prefill instance May 22, 2026

liyonghua0910 marked this pull request as ready for review May 22, 2026 07:58

PaddlePaddle-bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scheduler] Increase sleep interval in fetch loops and cancel schedule threashold for prefill instance#7871

[Scheduler] Increase sleep interval in fetch loops and cancel schedule threashold for prefill instance#7871
liyonghua0910 wants to merge 1 commit into
PaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260521_pd_test

liyonghua0910 commented May 21, 2026

Uh oh!

paddle-bot Bot commented May 21, 2026

Uh oh!

PaddlePaddle-bot commented May 21, 2026 •

edited

Loading

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 21, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented May 22, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 22, 2026

Uh oh!

PaddlePaddle-bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liyonghua0910 commented May 21, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 21, 2026

Uh oh!

PaddlePaddle-bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 8/10 通过

2.2 可选任务 — 23/27 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented May 22, 2026

📋 Review 摘要

问题

🔍 行内问题说明

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PaddlePaddle-bot commented May 21, 2026 •

edited

Loading

codecov-commenter commented May 21, 2026 •

edited

Loading