[PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation by juncaipeng · Pull Request #7107 · PaddlePaddle/FastDeploy

juncaipeng · 2026-03-31T07:55:32Z

Motivation

优化处理抢占请求，如果开了cache池化，将抢占请求的cache写入storage
p实例的请求向d申请block，d实例考虑给running的请求预留block ids
修复写出cache时会修改request中的prompt_token_ids的bug

Modifications

参考comments

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-31T07:55:39Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 主要面向 PD Disaggregation / v1 调度链路，在发生 preempt 时将请求的 KV cache（含可选 output tokens）写入 storage backend，以便后续复用/恢复，同时对调度资源判定与缓存分配策略做了小幅调整以降低卡死风险。

Changes:

新增环境变量开关：控制 preempt 请求是否触发 cache 写入 storage。
在 ResourceManagerV1._trigger_preempt() 中，对 preempt 的请求按角色（P/D）写入 storage cache。
PrefixCacheManager.can_allocate_gpu_blocks() 增加参数以控制是否尝试主动腾挪 GPU blocks，并修复 write-to-storage 时 token_ids 的构造方式。

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
fastdeploy/envs.py	增加 preempt cache 写入 storage 的环境变量开关。
fastdeploy/engine/sched/resource_manager_v1.py	preempt 路径新增写 storage 行为，并调整 prefill 阈值计算函数签名与部分资源检查逻辑。
fastdeploy/engine/common_engine.py	将 decode 资源申请失败的日志级别从 error 调整为 warning。
fastdeploy/cache_manager/prefix_cache_manager.py	扩展 GPU block 可分配判定接口；修正写 storage 时 token_ids 拼接逻辑，避免原列表被就地修改。

fastdeploy/envs.py

fastdeploy/engine/sched/resource_manager_v1.py

juncaipeng · 2026-03-31T08:28:55Z

fastdeploy/cache_manager/prefix_cache_manager.py

                gpu_recv_block_ids = []
                match_cpu_blocks_num = len(match_cpu_block_ids)
-                if self.can_allocate_gpu_blocks(num_blocks=match_cpu_blocks_num):
+                if self.can_allocate_gpu_blocks(num_blocks=match_cpu_blocks_num, try_free_gpu_blocks=False):


避免可能的死锁卡住

juncaipeng · 2026-03-31T08:29:14Z

fastdeploy/cache_manager/prefix_cache_manager.py

                        task_id=req_id,
                        keys=no_match_block_keys,
-                        token_ids=input_token_ids,
+                        token_ids=input_token_ids if self.kvcache_storage_backend == "attention_store" else None,


减少跨进程传输的数据量

juncaipeng · 2026-03-31T08:29:43Z

fastdeploy/cache_manager/prefix_cache_manager.py

+
        if self.config.cache_config.enable_output_caching:
-            token_ids += request.output_token_ids
+            input_token_ids = token_ids + request.output_token_ids


修复修改request中的prompt_token_ids的bug

juncaipeng · 2026-03-31T08:30:19Z

fastdeploy/engine/sched/resource_manager_v1.py

                    self._free_blocks(preempted_req)
                    llm_logger.info(f"Preemption is triggered! Preempted request id: {preempted_req.request_id}")
                else:
+                    if envs.FD_SAVE_OUTPUT_CACHE_FOR_PREEMPTED_REQUEST:


重调度请求的cache写出到storage

juncaipeng · 2026-03-31T08:31:38Z

fastdeploy/engine/sched/resource_manager_v1.py

            if self.available_batch() == 0:
                return False
-            if not self.cache_manager.can_allocate_gpu_blocks(need_prealloc_prefill_blocks):
+            total_need_blocks = self._get_can_schedule_prefill_threshold_block(need_prealloc_prefill_blocks)


p实例的请求向d申请block，d实例考虑给running的请求预留block ids

codecov-commenter · 2026-03-31T10:16:20Z

Codecov Report

❌ Patch coverage is 57.14286% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@25d64ef). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/cache_manager/prefix_cache_manager.py	50.00%	5 Missing and 1 partial ⚠️
fastdeploy/engine/sched/resource_manager_v1.py	66.66%	0 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7107   +/-   ##
==========================================
  Coverage           ?   73.93%           
==========================================
  Files              ?      402           
  Lines              ?    56582           
  Branches           ?     8945           
==========================================
  Hits               ?    41833           
  Misses             ?    11811           
  Partials           ?     2938

Flag	Coverage Δ
GPU	`73.93% <57.14%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

fastdeploy/envs.py

Copilot · 2026-03-31T12:25:54Z

fastdeploy/engine/sched/resource_manager_v1.py

+                    if envs.FD_SAVE_OUTPUT_CACHE_FOR_PREEMPTED_REQUEST:
+                        if self.config.cache_config.kvcache_storage_backend:
+                            self.cache_manager.write_cache_to_storage_decode(preempted_req)
                    self._free_blocks(preempted_req)


这里在持有 ResourceManager 的 lock 的情况下同步调用 write_cache_to_storage_decode()，该调用会等待 cache_transfer 线程回执（is_sync=True），可能阻塞调度线程并放大抢占路径的尾延迟。建议将写回逻辑移出锁（或提交到线程池异步执行），同时确保在写回完成前不要 recycle 对应的 GPU block（例如延后 _free_blocks 或在写回任务中持有必要的 block_ids 快照）。

fastdeploy/cache_manager/prefix_cache_manager.py

…efine PD Disaggregation (PaddlePaddle#7107) * Write the cache of preempted req to storage * up * fix

Write the cache of preempted req to storage

d208a47

Copilot AI review requested due to automatic review settings March 31, 2026 07:55

juncaipeng had a problem deploying to Metax_ci March 31, 2026 07:55 — with GitHub Actions Error

Copilot started reviewing on behalf of juncaipeng March 31, 2026 07:56 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

fastdeploy/envs.py Outdated Show resolved Hide resolved

fastdeploy/envs.py Outdated Show resolved Hide resolved

fastdeploy/engine/sched/resource_manager_v1.py Show resolved Hide resolved

fastdeploy/engine/sched/resource_manager_v1.py Show resolved Hide resolved

up

b275e52

juncaipeng temporarily deployed to Metax_ci March 31, 2026 08:27 — with GitHub Actions Inactive

juncaipeng commented Mar 31, 2026

View reviewed changes

juncaipeng changed the title ~~[PD Disaggregation] Write the cache of preempted req to storage~~ [PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation Mar 31, 2026

rainyfly previously approved these changes Mar 31, 2026

View reviewed changes

fix

ef30e0c

Copilot AI review requested due to automatic review settings March 31, 2026 12:20

juncaipeng dismissed rainyfly’s stale review via ef30e0c March 31, 2026 12:20

juncaipeng temporarily deployed to Metax_ci March 31, 2026 12:20 — with GitHub Actions Inactive

Copilot started reviewing on behalf of juncaipeng March 31, 2026 12:21 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Jiang-Jia-Jun approved these changes Apr 1, 2026

View reviewed changes

Jiang-Jia-Jun merged commit af51fc4 into PaddlePaddle:develop Apr 1, 2026
38 of 42 checks passed

mattheliu pushed a commit to mattheliu/FastDeploy that referenced this pull request Apr 1, 2026

[PD Disaggregation] Write the cache of preempted req to storage and r…

cbcb8cf

…efine PD Disaggregation (PaddlePaddle#7107) * Write the cache of preempted req to storage * up * fix

Conversation

juncaipeng commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juncaipeng Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

juncaipeng Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

juncaipeng Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

juncaipeng Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

juncaipeng Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

juncaipeng commented Mar 31, 2026 •

edited

Loading

codecov-commenter commented Mar 31, 2026 •

edited

Loading