bugfix AS block leaks#7890
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-22 11:16:02
📋 Review 摘要
PR 概述:修复在 enable_prefix_caching + 分层 KV Cache 路径下,can_allocate_gpu_blocks 未通过 _get_can_schedule_prefill_threshold_block 计算预留块阈值,导致 AS block 泄漏的问题。
变更范围:fastdeploy/engine/sched/resource_manager_v1.py(调度资源管理)
影响面 Tag:[Scheduler] [KVCache]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 📝 PR 规范 | — | 标题缺少官方 Tag;描述模板各段均为空 |
| ❓ 疑问 | resource_manager_v1.py:1205 |
新增注释拼写错误:Warnig → Warning |
📝 PR 规范检查
PR 标题 bugfix AS block leaks 使用了非官方的小写 bugfix 形式,需规范化为 [BugFix];描述模板中 Motivation、Modifications 等段落均为占位符/空白。
标题建议(可直接复制):
[BugFix] Fix AS block leaks when enable_prefix_caching with hierarchical kvcache
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
在开启 `enable_prefix_caching` 且配置了分层 KV Cache(`num_cpu_blocks > 0` 或 `kvcache_storage_backend`)时,调度器在准入新 Prefill 请求前的 GPU Block 可用性检查直接使用了原始 block 数(`need_prefill_tokens / block_size`),而未调用 `_get_can_schedule_prefill_threshold_block` 计算包含 running 请求预留块的阈值。这导致在阈值偏低时分层 Cache 会为请求分配 block 进行前缀匹配,但后续实际分配时 GPU block 不足,引发 AS block 泄漏(storage blocks leak)。
## Modifications
- `fastdeploy/engine/sched/resource_manager_v1.py`:在两处 `enable_prefix_caching + 分层 KV Cache` 路径的 `can_allocate_gpu_blocks` 检查中,改为先调用 `_get_can_schedule_prefill_threshold_block` 计算含预留块的阈值,再传入 `can_allocate_gpu_blocks`,与文件中其余调用点保持一致。同时在 `_free_blocks` 前添加注释说明潜在的 storage block leak 风险。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
修复方案逻辑正确,将两处遗漏的 can_allocate_gpu_blocks 调用统一改为通过 _get_can_schedule_prefill_threshold_block 计算阈值,与文件内其他调用点保持一致,能有效防止分层 Cache 场景下的死锁和 block 泄漏。建议补充 PR 描述和标题格式,并修正注释中的拼写错误 Warnig → Warning。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览✅ 所有 Required 任务均已通过,当前 CI 不阻塞合并;仍有 3 个 Optional 任务失败、1 个 Optional 任务等待中,仅供参考。
2 任务状态汇总2.1 Required任务 : 10/10 通过
2.2 可选任务 — 27/31 通过
3 失败详情(仅 required)无 required 失败任务。 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7890 +/- ##
==========================================
Coverage ? 63.98%
==========================================
Files ? 462
Lines ? 64482
Branches ? 9880
==========================================
Hits ? 41256
Misses ? 20450
Partials ? 2776
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.