Skip to content

[Scheduler] Increase sleep interval in fetch loops and cancel schedule threashold for prefill instance#7871

Open
liyonghua0910 wants to merge 1 commit into
PaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260521_pd_test
Open

[Scheduler] Increase sleep interval in fetch loops and cancel schedule threashold for prefill instance#7871
liyonghua0910 wants to merge 1 commit into
PaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260521_pd_test

Conversation

@liyonghua0910
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 21, 2026

Thanks for your contribution!

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 21, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-23 10:01:57

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

Required 任务存在 2 个失败,需处理后再合入:Approval 需要人工审批;Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 因差异覆盖率未达 80% 阈值失败。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
37(0) 37 31 5 1 0 0

2 任务状态汇总

2.1 Required任务 : 8/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 8s 需要 Approval 请通过人工审批 Job -
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 1h12m PR问题:差异覆盖率62%,3行未覆盖 补测3处未覆盖分支 Job -
其余 8 个必选任务通过 - - - - -

2.2 可选任务 — 23/27 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 20m16s Job -
Check PR Template 15s Job -
Trigger Jenkins for PR 1m4s Job -
CI_HPU - Workflow -
其余 23 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 需要人工审批(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率阈值不达标(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 覆盖率阈值不达标
  • 置信度: 高
  • 根因摘要: 差异覆盖率62%,3行未覆盖
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例: 无。日志显示 TEST_EXIT_CODE=0,单元测试已通过,失败发生在覆盖率阈值校验阶段。

根因详情:
本次 PR 修改了 13 行代码,覆盖率报告统计其中 8 行参与差异覆盖率计算,只有 5 行被测试覆盖,总覆盖率为 62%,低于 80% 阈值,因此 Verify Code Coverage Threshold (80%) 步骤以 exit code 9 失败。未覆盖行集中在 fastdeploy/engine/common_engine_prepare_mixin.py:251,254_fetch_loop() 正常/异常 sleep 分支,以及 fastdeploy/output/token_processor.py:671 的 prefill 发送 cache 等待分支;这些行均为本 PR 修改行,属于 PR 覆盖率问题。

代码上下文核验:
已读取变更文件与相关测试:tests/engine/test_common_engine.py 目前覆盖 _fetch_request_prefill() 路径但未直接覆盖 _fetch_loop() 正常/异常 sleep 分支;tests/output/test_token_processor.py 覆盖 prefill 成功/失败即时返回路径,但未覆盖等待分支中的 time.sleep(0.005)

关键日志:

Coverage generation failed (exit code 9)
Diff Coverage
fastdeploy/engine/common_engine_prepare_mixin.py (0.0%): Missing lines 251,254
fastdeploy/engine/sched/resource_manager_v1.py (100%)
fastdeploy/output/token_processor.py (0.0%): Missing lines 671
fastdeploy/splitwise/splitwise_connector.py (100%)
Total:   8 lines
Missing: 3 lines
"total_percent_covered": 62

修复建议:

  1. tests/engine/test_common_engine.py 补充 _fetch_loop() 单测,patch fastdeploy.engine.common_engine_prepare_mixin.time.sleep,构造 fetch_fn 正常执行和抛异常两条路径,覆盖 fastdeploy/engine/common_engine_prepare_mixin.py:251254
  2. tests/output/test_token_processor.py 现有 test_recycle_resources_prefill_* 附近补充 prefill 等待发送 cache 的用例,mock engine_worker_queue.get_finished_req() 先返回空列表再返回 [(task_id, "finished")],并 patch fastdeploy.output.token_processor.time.sleep,覆盖 fastdeploy/output/token_processor.py:671

修复建议摘要: 补测3处未覆盖分支

关联变更: fastdeploy/engine/common_engine_prepare_mixin.py:251,254fastdeploy/output/token_processor.py:671

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 21, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@31b12ee). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/common_engine_prepare_mixin.py 0.00% 2 Missing ⚠️
fastdeploy/engine/sched/resource_manager_v1.py 75.00% 1 Missing ⚠️
fastdeploy/output/token_processor.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7871   +/-   ##
==============================================
  Coverage               ?   72.37%           
==============================================
  Files                  ?      381           
  Lines                  ?    54235           
  Branches               ?     8475           
==============================================
  Hits                   ?    39250           
  Misses                 ?    12224           
  Partials               ?     2761           
Flag Coverage Δ
GPU 72.37% <50.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 15:36:40

📋 Review 摘要

PR 概述:调整轮询 sleep 间隔(多处增大)、修复 block_num 可能为负的 Bug、取消 Prefill 实例的调度块阈值限制,并调减 prepare 线程数默认值。

变更范围engine/engine/sched/output/splitwise/envs.py

影响面 Tag[Engine] [Scheduler] [PD Disaggregation] [DataProcessor]


问题

级别 文件 概述
❓ 疑问 engine/common_engine_prepare_mixin.py:251 sleep 从 0.002s 增大到 0.02s(10倍),与 PR 标题"reduce sleep time"方向相反
❓ 疑问 envs.py:195 FD_PREFILL_PREPARE_REQ_THREAD_NUM 默认线程数 5→3,未提供性能基准数据
📝 PR 规范 标题缺 [Cherry-Pick] 格式及官方 Tag;描述所有 section 均为空

🔍 行内问题说明

engine/common_engine_prepare_mixin.py ~L251

❓ 疑问:sleep 间隔从 0.002s 增大到 0.02s(10 倍),但 PR 标题描述为"reduce sleep time",方向相反。

当前改动将 _fetch_loop 的轮询频率降低了 10 倍,单次请求从入队到被 fetch 的最大延迟从 ~2ms 增加到 ~20ms。若意图是降低 CPU 轮询开销,请在 PR 描述中说明,并更正标题措辞(如改为"reduce CPU polling overhead by increasing fetch loop interval")。

envs.py ~L195

❓ 疑问:FD_PREFILL_PREPARE_REQ_THREAD_NUM 默认值从 5 减少到 3,在高并发场景下可能降低 Prefill 实例的请求准备吞吐量。建议提供实验数据(如 QPS / TTFT 对比)说明 3 线程已足够,或在 PR 描述中注明可通过环境变量按需调整。


📝 PR 规范检查

当前 PR 提交到 release/2.6 分支(非 develop),按 D1 规范需使用 Cherry-Pick 格式 [Cherry-Pick][Tag] 描述(#原PR号)。此外标题含拼写错误("threashold" → "threshold"),且描述模板五个 section 全部为空占位符。

标题建议(可直接复制):

[Cherry-Pick][BugFix] fix negative block_num and adjust polling intervals for prefill instance(#XXXX)

(将 XXXX 替换为 develop 分支原 PR 号)

PR 描述建议(可直接复制):

## Motivation
1. 修复 `get_new_block_nums``block_num` 可能为负数的 Bug,防止在 speculative decode 路径下 `min(block_num+1, max)` 仍返回负值
2. Prefill 实例取消 `can_schedule_block_num_threshold` 阈值限制,避免正在运行的请求因阈值检查被阻塞
3. 调整各轮询循环的 sleep 间隔,降低 CPU 自旋开销
4. `FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认线程数由 5 调整为 3

## Modifications
- `fastdeploy/engine/common_engine_prepare_mixin.py`: `_fetch_loop` 中 sleep 间隔 0.002s → 0.02s
- `fastdeploy/engine/sched/resource_manager_v1.py`:
  - `get_new_block_nums` 添加 `block_num = max(block_num, 0)` 防止负值
  - Prefill 实例(`splitwise_role == "prefill"``can_schedule_block_num_threshold` 固定为 0
- `fastdeploy/envs.py`: `FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值 5 → 3
- `fastdeploy/output/token_processor.py`: 等待 cache 传输的 sleep 0.002s → 0.005s
- `fastdeploy/splitwise/splitwise_connector.py`: `check_decode_allocated` 轮询间隔 0.001s → 0.005s

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

block_num = max(block_num, 0) 有效修复了负值 Bug,Prefill 实例取消阈值的逻辑合理;但 PR 标题描述的 "reduce sleep time" 与代码中所有 sleep 均增大(最高 10 倍)方向相反,建议作者确认意图并更正标题,同时补全 PR 描述各 section。

@liyonghua0910 liyonghua0910 force-pushed the release/2.6+20260521_pd_test branch from 130fb03 to 6473a75 Compare May 22, 2026 07:47
@liyonghua0910 liyonghua0910 changed the title reduce sleep time in loops and cancel schedule threashold for prefill instance [Scheduler] Increase sleep interval in fetch loops and cancel schedule threashold for prefill instance May 22, 2026
@liyonghua0910 liyonghua0910 marked this pull request as ready for review May 22, 2026 07:58
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 15:58:11

📋 Review 摘要

PR 概述:调整 PD 分离场景下多处循环的 sleep 时长、修复 prefill 实例调度阈值逻辑,并修复 block_num 可能为负值的 bug。
变更范围fastdeploy/engine/(fetch 循环)、fastdeploy/engine/sched/(resource_manager)、fastdeploy/splitwise/fastdeploy/output/fastdeploy/envs.py
影响面 Tag[PD Disaggregation] [Scheduler] [Engine]

问题

级别 文件 概述
❓ 疑问 fastdeploy/engine/common_engine_prepare_mixin.py:251 PR 标题称"reduce sleep time",但 _fetch_loop sleep 从 2ms 增大至 20ms(10倍),与标题矛盾
🟡 建议 fastdeploy/engine/sched/resource_manager_v1.py:248 block_num 为负时静默截断为 0,无 warning 日志,建议补充以便排查过度分配
📝 PR 规范 标题缺 Tag;目标 release/2.6 分支未使用 Cherry-Pick 格式;Motivation / Modifications / Usage / Accuracy 各节均为空

📝 PR 规范检查

问题

  1. 标题无 [Tag],且 PR 目标分支为 release/2.6(非 develop),按规范需使用 Cherry-Pick 格式:[Cherry-Pick][Tag] 描述(#原PR号)
  2. PR 描述各节(Motivation、Modifications、Usage or Command、Accuracy Tests)均为空/占位符,Checklist 全部未勾选。

标题建议(可直接复制):

  • [Cherry-Pick][PD Disaggregation] adjust sleep time in loops and cancel schedule threshold for prefill instance(#XXXX)

⚠️ 请将 #XXXX 替换为对应 develop 分支的原始 PR 编号。

PR 描述建议(可直接复制):

## Motivation
针对 PD 分离(Prefill-Decode disaggregation)场景的稳定性和性能优化:
1. 修复 `get_new_block_nums``block_num` 可能为负值导致异常分配的 bug;
2. prefill 实例在 `_allocate_decode_and_extend` 中取消调度阈值,避免运行中的请求因 block 不足被阻塞;
3. 调整各轮询循环的 sleep 时长,减少 CPU 空转开销;
4.`FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值从 5 降至 3,降低 prefill 实例准备线程数。

## Modifications
- `fastdeploy/engine/common_engine_prepare_mixin.py``_fetch_loop` 中 sleep 从 0.002s 调整为 0.02s
- `fastdeploy/engine/sched/resource_manager_v1.py``get_new_block_nums` 增加 `block_num = max(block_num, 0)` 防止负值;`_allocate_decode_and_extend` 中 prefill 实例跳过调度阈值检查(threshold=0)
- `fastdeploy/envs.py``FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值 5 → 3
- `fastdeploy/output/token_processor.py`:等待发送 cache 时 sleep 从 0.002s 调整为 0.005s
- `fastdeploy/splitwise/splitwise_connector.py``check_decode_allocated` 轮询 sleep 从 0.001s 调整为 0.005s

## Usage or Command
N/A

## Accuracy Tests
N/A(本次变更为调度逻辑和 sleep 调整,不影响模型精度)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

变更逻辑整体可理解,max(block_num, 0) 修复和 prefill 实例取消调度阈值方向合理。主要需要作者澄清 PR 标题与实际 sleep 方向的矛盾,并补充完整的描述和 Cherry-Pick 来源 PR 编号。

self._pause_cond.wait_for(lambda: not self.is_paused)
fetch_fn()
time.sleep(0.002)
time.sleep(0.02)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 PR 标题称 "reduce sleep time in loops",但此处 sleep 从 0.002s(2ms)增大0.02s(20ms),增大了 10 倍。

同样地,token_processor.py(2ms→5ms)和 splitwise_connector.py(1ms→5ms)也均为增大。请确认:

  1. PR 标题是否表述有误(应为 "adjust" 而非 "reduce")?
  2. _fetch_loop 增大 sleep 的预期效果是什么(减少 CPU 轮询开销)?降低轮询频率会增加请求入队延迟,是否已在高并发场景验证过影响?

request.num_computed_tokens + num_new_tokens + self.config.cache_config.block_size - 1
) // self.config.cache_config.block_size - len(request.block_tables)

block_num = max(block_num, 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议block_num 计算结果为负值时(已分配的 block 超出当前所需),此处静默截断为 0。

截断逻辑本身正确,但建议补充 warning 日志,便于排查潜在的 block 过度分配根因:

if block_num < 0:
    self.llm_logger.warning(
        f"block_num negative ({block_num}) for req {request.request_id}, "
        f"num_computed={request.num_computed_tokens}, num_new={num_new_tokens}, "
        f"allocated={len(request.block_tables)}, clamping to 0"
    )
block_num = max(block_num, 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants