[Scheduler] Defer block recycling to accelerate LRU node freeing by liyonghua0910 · Pull Request #7885 · PaddlePaddle/FastDeploy

liyonghua0910 · 2026-05-21T12:55:47Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

In the LRU eviction loop of free_block_ids_async, each iteration calls recycle_gpu_blocks individually, which causes frequent heap operations and slows down the overall freeing process. This PR defers block recycling to a single batch call after the loop completes.

Modifications

Defer recycle_gpu_blocks calls inside the LRU freeing loop to a single batch call after the loop, reducing the overhead of repeated heap operations.
Add defer_recycle parameter to _handle_free_gpu_node_without_cpu to support deferred block recycling.
Fix the LRU leaf node freeing logic: disconnect the child node from its parent first, then check whether the parent should be added to the LRU heap, avoiding duplicate freeing.
Add warning logs to help diagnose duplicate node issues in the LRU heap.

Usage or Command

No additional configuration required. The optimization takes effect automatically.

Accuracy Tests

Only affects KV Cache block recycling timing, no impact on model output accuracy.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-21T12:55:58Z

Thanks for your contribution!

codecov-commenter · 2026-05-21T13:36:36Z

Codecov Report

❌ Patch coverage is 68.18182% with 7 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@8080a25). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/cache_manager/prefix_cache_manager.py	68.18%	4 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7885   +/-   ##
==========================================
  Coverage           ?   63.58%           
==========================================
  Files              ?      462           
  Lines              ?    64487           
  Branches           ?     9882           
==========================================
  Hits               ?    41007           
  Misses             ?    20704           
  Partials           ?     2776

Flag	Coverage Δ
GPU	`72.71% <68.18%> (?)`
XPU	`7.11% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-05-21T13:58:24Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-23 01:53:10

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 3c777a7
Merge base: 8080a25 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

当前 Required 任务 9/10 通过，仍有 1 个 Required 任务失败：Approval。该失败为人工审批未完成，不是代码测试失败；代码相关 Required 测试（包括 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage）已通过。另有 3 个 Optional 任务失败，仅供参考。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	38	4	0	0	0

2 任务状态汇总

日志列说明：失败任务直接使用日志链接；运行中任务使用 Job 链接。

2.1 Required任务 : 9/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	6s	需要 Approval：人工审批未完成	请通过人工审批	Job	-
✅	其余 9 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 29/32 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	1m27s	Job	-
❌	`CI_HPU`	1h4m	Job	-
❌	`Trigger Jenkins for PR`	7m34s	Job	-
✅	其余 29 个可选任务通过	-	-	-

3 失败详情（仅 required）

Approval — 需要人工审批（置信度: 高）

根因摘要

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

修复建议摘要

请通过人工审批；审批通过后等待 CI 自动继续或按需 rerun。

关键日志

Job
Process completed with exit code 6.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-22 11:00:55

📋 Review 摘要

PR 概述：将 LRU 驱逐循环中的逐次 recycle_gpu_blocks 调用延迟到循环结束后批量执行，以减少 heap 操作开销，同时修复了父节点与子节点断连顺序问题。
变更范围：fastdeploy/cache_manager/prefix_cache_manager.py
影响面 Tag：[KVCache]

问题

级别	文件	概述
🔴 Bug	`prefix_cache_manager.py:1360`	`defer_recycle=True` 路径未清空 `node.reverved_dec_block_ids`，存在双重回收风险
🔴 Bug	`prefix_cache_manager.py:1360`	删除了 `node.cache_status = CacheStatus.CPU` 赋值，节点驱逐后状态未更新
❓ 疑问	`prefix_cache_manager.py:1488`	父节点已在 heap 时 `continue` 跳过，父节点二次入堆是否会触发真正重复释放

📝 PR 规范检查

PR 标题使用了 [Scheduler] 标签，但唯一变更文件 fastdeploy/cache_manager/prefix_cache_manager.py 属于 cache_manager/ 模块，按 architecture.md 影响面判断表，fastdeploy/cache_manager/ 对应 [KVCache] 标签。建议修正标题标签。

标题建议（可直接复制）：

[KVCache] Defer block recycling to accelerate LRU node freeing

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
In the LRU eviction loop of `free_block_ids_async`, each iteration calls `recycle_gpu_blocks` individually, which causes frequent heap operations and slows down the overall freeing process. This PR defers block recycling to a single batch call after the loop completes.

## Modifications
- Defer `recycle_gpu_blocks` calls inside the LRU freeing loop to a single batch call after the loop, reducing the overhead of repeated heap operations.
- Add `defer_recycle` parameter to `_handle_free_gpu_node_without_cpu` to support deferred block recycling.
- Fix the LRU leaf node freeing logic: disconnect the child node from its parent first, then check whether the parent should be added to the LRU heap, avoiding duplicate freeing.
- Add warning logs to help diagnose duplicate node issues in the LRU heap.

## Usage or Command
No additional configuration required. The optimization takes effect automatically.

## Accuracy Tests
Only affects KV Cache block recycling timing, no impact on model output accuracy.

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

延迟批量回收的优化方向合理，但 PR 在删除 node.cache_status 状态更新和 node.reverved_dec_block_ids = [] 清零时未充分评估副作用，存在节点状态残留和双重 block 回收的 P0 风险，建议修复后重新提交。

PaddlePaddle-bot · 2026-05-22T03:02:55Z

+            )
+            return []
+        else:
+            return blocks_to_recycle


🔴 Bug defer_recycle=True 路径未清空 node.reverved_dec_block_ids，存在双重回收风险

原代码在 _handle_free_gpu_node_without_cpu 中有 node.reverved_dec_block_ids = []，确保节点释放后引用被清空。PR 将此逻辑移除，当 defer_recycle=True 时，调用方收到 blocks 列表后延迟回收，但节点的 reverved_dec_block_ids 字段仍保留旧值。若该节点在批量回收完成前被其他路径再次访问（例如 free_nodes_directly 或 swap 路径），可能引发同一 block 被重复 recycle。

建议修复：在 defer_recycle=True 分支的 return 前清空：

blocks_to_recycle = list(node.reverved_dec_block_ids) + [node.block_id] node.reverved_dec_block_ids = [] # 立即清空，防止双重回收 if not defer_recycle: ... else: return blocks_to_recycle

PaddlePaddle-bot · 2026-05-22T03:02:55Z

+            )
+            return []
+        else:
+            return blocks_to_recycle


🔴 Bug node.cache_status 未在 _handle_free_gpu_node_without_cpu 中更新

原代码在函数入口处设置 node.cache_status = CacheStatus.CPU（标记节点已被驱逐），PR 将该行完整删除，两条路径（defer_recycle=True/False）均未更新 cache_status。节点已从 node_map 删除但 cache_status 仍为 GPU，若外部代码（如 PD 分离传输路径 recv_data_transfer_result 中的 node.cache_status.value == CacheStatus.GPU.value 判断）通过其他引用访问该节点，会误判其仍为 GPU 状态。

建议修复：在函数开头恢复状态更新（或调整为更合适的驱逐态枚举值）：

node.cache_status = CacheStatus.CPU # 或新增 EVICTED 状态

PaddlePaddle-bot · 2026-05-22T03:02:55Z

+                                    logger.warning(
+                                        f"Node {parent.node_id} is already in gpu lru leaf heap, duplicated node free may occured!"
+                                    )
                                    continue


❓ 疑问 continue 跳过时父节点已在 heap 中——但 total_gpu_free_count 未增加，是否会导致循环提前退出？

当 parent in self.gpu_lru_leaf_set 时执行 continue，本次迭代没有递增 total_gpu_free_count（当前被驱逐的叶节点计数已在上方 +1），逻辑上正确。但 warning 日志描述为 "duplicated node free may occured"：如果父节点已经在 heap 中被弹出并处理过（已从 node_map 删除），再次被加入 heap 就会导致真正的重复释放。

建议确认：父节点被加入 gpu_lru_leaf_heap 前，是否应先检查其是否已被从 node_map 删除（即是否已经历过 _handle_free_gpu_node_without_cpu）？若已删除则跳过入堆。

[KVCache] Defer block recycling to accelerate LRU node freeing

1ba1273

liyonghua0910 had a problem deploying to Metax_ci May 21, 2026 12:55 — with GitHub Actions Failure

liyonghua0910 changed the title ~~[KVCache] Defer block recycling to accelerate LRU node freeing~~ [Scheduler] Defer block recycling to accelerate LRU node freeing May 21, 2026

This comment was marked as outdated.

Sign in to view

liyonghua0910 mentioned this pull request May 21, 2026

[Cherry-Pick][Scheduler] Defer block recycling to accelerate LRU node freeing #7886

Open

5 tasks

[style] precommit

3c777a7

liyonghua0910 had a problem deploying to Metax_ci May 22, 2026 02:33 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scheduler] Defer block recycling to accelerate LRU node freeing#7885

[Scheduler] Defer block recycling to accelerate LRU node freeing#7885
liyonghua0910 wants to merge 2 commits into
PaddlePaddle:developfrom
liyonghua0910:develop+20260521_free_blocks

liyonghua0910 commented May 21, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 21, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 21, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented May 21, 2026 •

edited

Loading

根因摘要

修复建议摘要

关键日志

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 22, 2026

Uh oh!

PaddlePaddle-bot May 22, 2026

Uh oh!

PaddlePaddle-bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liyonghua0910 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 21, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 9/10 通过

2.2 可选任务 — 29/32 通过

3 失败详情（仅 required）

根因摘要

修复建议摘要

关键日志

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liyonghua0910 commented May 21, 2026 •

edited

Loading

codecov-commenter commented May 21, 2026 •

edited

Loading

PaddlePaddle-bot commented May 21, 2026 •

edited

Loading