Skip to content

bugfix AS block leaks#7890

Open
zccjjj wants to merge 1 commit into
PaddlePaddle:developfrom
zccjjj:ASBlockLeaks
Open

bugfix AS block leaks#7890
zccjjj wants to merge 1 commit into
PaddlePaddle:developfrom
zccjjj:ASBlockLeaks

Conversation

@zccjjj
Copy link
Copy Markdown
Contributor

@zccjjj zccjjj commented May 22, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 22, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 11:16:02

📋 Review 摘要

PR 概述:修复在 enable_prefix_caching + 分层 KV Cache 路径下,can_allocate_gpu_blocks 未通过 _get_can_schedule_prefill_threshold_block 计算预留块阈值,导致 AS block 泄漏的问题。
变更范围fastdeploy/engine/sched/resource_manager_v1.py(调度资源管理)
影响面 Tag[Scheduler] [KVCache]

问题

级别 文件 概述
📝 PR 规范 标题缺少官方 Tag;描述模板各段均为空
❓ 疑问 resource_manager_v1.py:1205 新增注释拼写错误:WarnigWarning

📝 PR 规范检查

PR 标题 bugfix AS block leaks 使用了非官方的小写 bugfix 形式,需规范化为 [BugFix];描述模板中 Motivation、Modifications 等段落均为占位符/空白。

标题建议(可直接复制):

  • [BugFix] Fix AS block leaks when enable_prefix_caching with hierarchical kvcache

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation
在开启 `enable_prefix_caching` 且配置了分层 KV Cache(`num_cpu_blocks > 0``kvcache_storage_backend`)时,调度器在准入新 Prefill 请求前的 GPU Block 可用性检查直接使用了原始 block 数(`need_prefill_tokens / block_size`),而未调用 `_get_can_schedule_prefill_threshold_block` 计算包含 running 请求预留块的阈值。这导致在阈值偏低时分层 Cache 会为请求分配 block 进行前缀匹配,但后续实际分配时 GPU block 不足,引发 AS block 泄漏(storage blocks leak)。

## Modifications
- `fastdeploy/engine/sched/resource_manager_v1.py`:在两处 `enable_prefix_caching + 分层 KV Cache` 路径的 `can_allocate_gpu_blocks` 检查中,改为先调用 `_get_can_schedule_prefill_threshold_block` 计算含预留块的阈值,再传入 `can_allocate_gpu_blocks`,与文件中其余调用点保持一致。同时在 `_free_blocks` 前添加注释说明潜在的 storage block leak 风险。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

修复方案逻辑正确,将两处遗漏的 can_allocate_gpu_blocks 调用统一改为通过 _get_can_schedule_prefill_threshold_block 计算阈值,与文件内其他调用点保持一致,能有效防止分层 Cache 场景下的死锁和 block 泄漏。建议补充 PR 描述和标题格式,并修正注释中的拼写错误 WarnigWarning

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 22, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-22 15:35:45

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

✅ 所有 Required 任务均已通过,当前 CI 不阻塞合并;仍有 3 个 Optional 任务失败、1 个 Optional 任务等待中,仅供参考。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
54(13) 41 37 3 0 1 0

2 任务状态汇总

2.1 Required任务 : 10/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
其余 10 个必选任务通过 - - - - -

2.2 可选任务 — 27/31 通过

可选任务不阻塞合并,失败/等待仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 15m51s Job -
Check PR Template 15s Job -
Trigger Jenkins for PR 7m22s Job -
⏸️ CI_HPU - - -
其余 27 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 22, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3ec5011). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/sched/resource_manager_v1.py 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7890   +/-   ##
==========================================
  Coverage           ?   63.98%           
==========================================
  Files              ?      462           
  Lines              ?    64482           
  Branches           ?     9880           
==========================================
  Hits               ?    41256           
  Misses             ?    20450           
  Partials           ?     2776           
Flag Coverage Δ
GPU 73.04% <50.00%> (?)
XPU 16.01% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants