[Scheduler] Increase sleep interval in fetch loops and cancel schedule threashold for prefill instance#7871
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务存在 2 个失败,需处理后再合入:
2 任务状态汇总2.1 Required任务 : 8/10 通过
2.2 可选任务 — 23/27 通过
3 失败详情(仅 required)Approval — 需要人工审批(置信度: 高)该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率阈值不达标(置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例: 无。日志显示 根因详情: 代码上下文核验: 关键日志: 修复建议:
修复建议摘要: 补测3处未覆盖分支 关联变更: |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7871 +/- ##
==============================================
Coverage ? 72.37%
==============================================
Files ? 381
Lines ? 54235
Branches ? 8475
==============================================
Hits ? 39250
Misses ? 12224
Partials ? 2761
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2fcfbeb to
130fb03
Compare
📋 Review 摘要PR 概述:调整轮询 sleep 间隔(多处增大)、修复 变更范围: 影响面 Tag: 问题
🔍 行内问题说明
📝 PR 规范检查当前 PR 提交到 标题建议(可直接复制):
(将 XXXX 替换为 develop 分支原 PR 号) PR 描述建议(可直接复制): ## Motivation
1. 修复 `get_new_block_nums` 中 `block_num` 可能为负数的 Bug,防止在 speculative decode 路径下 `min(block_num+1, max)` 仍返回负值
2. Prefill 实例取消 `can_schedule_block_num_threshold` 阈值限制,避免正在运行的请求因阈值检查被阻塞
3. 调整各轮询循环的 sleep 间隔,降低 CPU 自旋开销
4. `FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认线程数由 5 调整为 3
## Modifications
- `fastdeploy/engine/common_engine_prepare_mixin.py`: `_fetch_loop` 中 sleep 间隔 0.002s → 0.02s
- `fastdeploy/engine/sched/resource_manager_v1.py`:
- `get_new_block_nums` 添加 `block_num = max(block_num, 0)` 防止负值
- Prefill 实例(`splitwise_role == "prefill"`)`can_schedule_block_num_threshold` 固定为 0
- `fastdeploy/envs.py`: `FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值 5 → 3
- `fastdeploy/output/token_processor.py`: 等待 cache 传输的 sleep 0.002s → 0.005s
- `fastdeploy/splitwise/splitwise_connector.py`: `check_decode_allocated` 轮询间隔 0.001s → 0.005s
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
|
…e threashold for prefill instance
130fb03 to
6473a75
Compare
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-22 15:58:11
📋 Review 摘要
PR 概述:调整 PD 分离场景下多处循环的 sleep 时长、修复 prefill 实例调度阈值逻辑,并修复 block_num 可能为负值的 bug。
变更范围:fastdeploy/engine/(fetch 循环)、fastdeploy/engine/sched/(resource_manager)、fastdeploy/splitwise/、fastdeploy/output/、fastdeploy/envs.py
影响面 Tag:[PD Disaggregation] [Scheduler] [Engine]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| ❓ 疑问 | fastdeploy/engine/common_engine_prepare_mixin.py:251 |
PR 标题称"reduce sleep time",但 _fetch_loop sleep 从 2ms 增大至 20ms(10倍),与标题矛盾 |
| 🟡 建议 | fastdeploy/engine/sched/resource_manager_v1.py:248 |
block_num 为负时静默截断为 0,无 warning 日志,建议补充以便排查过度分配 |
| 📝 PR 规范 | — | 标题缺 Tag;目标 release/2.6 分支未使用 Cherry-Pick 格式;Motivation / Modifications / Usage / Accuracy 各节均为空 |
📝 PR 规范检查
问题:
- 标题无
[Tag],且 PR 目标分支为release/2.6(非develop),按规范需使用 Cherry-Pick 格式:[Cherry-Pick][Tag] 描述(#原PR号); - PR 描述各节(Motivation、Modifications、Usage or Command、Accuracy Tests)均为空/占位符,Checklist 全部未勾选。
标题建议(可直接复制):
[Cherry-Pick][PD Disaggregation] adjust sleep time in loops and cancel schedule threshold for prefill instance(#XXXX)
⚠️ 请将#XXXX替换为对应develop分支的原始 PR 编号。
PR 描述建议(可直接复制):
## Motivation
针对 PD 分离(Prefill-Decode disaggregation)场景的稳定性和性能优化:
1. 修复 `get_new_block_nums` 中 `block_num` 可能为负值导致异常分配的 bug;
2. prefill 实例在 `_allocate_decode_and_extend` 中取消调度阈值,避免运行中的请求因 block 不足被阻塞;
3. 调整各轮询循环的 sleep 时长,减少 CPU 空转开销;
4. 将 `FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值从 5 降至 3,降低 prefill 实例准备线程数。
## Modifications
- `fastdeploy/engine/common_engine_prepare_mixin.py`:`_fetch_loop` 中 sleep 从 0.002s 调整为 0.02s
- `fastdeploy/engine/sched/resource_manager_v1.py`:`get_new_block_nums` 增加 `block_num = max(block_num, 0)` 防止负值;`_allocate_decode_and_extend` 中 prefill 实例跳过调度阈值检查(threshold=0)
- `fastdeploy/envs.py`:`FD_PREFILL_PREPARE_REQ_THREAD_NUM` 默认值 5 → 3
- `fastdeploy/output/token_processor.py`:等待发送 cache 时 sleep 从 0.002s 调整为 0.005s
- `fastdeploy/splitwise/splitwise_connector.py`:`check_decode_allocated` 轮询 sleep 从 0.001s 调整为 0.005s
## Usage or Command
N/A
## Accuracy Tests
N/A(本次变更为调度逻辑和 sleep 调整,不影响模型精度)
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
变更逻辑整体可理解,max(block_num, 0) 修复和 prefill 实例取消调度阈值方向合理。主要需要作者澄清 PR 标题与实际 sleep 方向的矛盾,并补充完整的描述和 Cherry-Pick 来源 PR 编号。
| self._pause_cond.wait_for(lambda: not self.is_paused) | ||
| fetch_fn() | ||
| time.sleep(0.002) | ||
| time.sleep(0.02) |
There was a problem hiding this comment.
❓ 疑问 PR 标题称 "reduce sleep time in loops",但此处 sleep 从 0.002s(2ms)增大为 0.02s(20ms),增大了 10 倍。
同样地,token_processor.py(2ms→5ms)和 splitwise_connector.py(1ms→5ms)也均为增大。请确认:
- PR 标题是否表述有误(应为 "adjust" 而非 "reduce")?
_fetch_loop增大 sleep 的预期效果是什么(减少 CPU 轮询开销)?降低轮询频率会增加请求入队延迟,是否已在高并发场景验证过影响?
| request.num_computed_tokens + num_new_tokens + self.config.cache_config.block_size - 1 | ||
| ) // self.config.cache_config.block_size - len(request.block_tables) | ||
|
|
||
| block_num = max(block_num, 0) |
There was a problem hiding this comment.
🟡 建议 当 block_num 计算结果为负值时(已分配的 block 超出当前所需),此处静默截断为 0。
截断逻辑本身正确,但建议补充 warning 日志,便于排查潜在的 block 过度分配根因:
if block_num < 0:
self.llm_logger.warning(
f"block_num negative ({block_num}) for req {request.request_id}, "
f"num_computed={request.num_computed_tokens}, num_new={num_new_tokens}, "
f"allocated={len(request.block_tables)}, clamping to 0"
)
block_num = max(block_num, 0)
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.