[Models] add fleet model fallback by xiaoguoguo626807 · Pull Request #7732 · PaddlePaddle/FastDeploy

xiaoguoguo626807 · 2026-05-07T07:07:32Z

Motivation

新增 PaddleFleet 作为模型推理后端（--model-impl paddlefleet），通过将 PaddleFleet TransformerLayer 中的 core_attention 替换为 FastDeploy Attention 内核，实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

Modifications

config.py: 新增 paddlefleet 到 ModelImpl 类型定义
engine/args_utils.py: 支持 --model-impl paddlefleet CLI 参数，并补充校验逻辑
model_executor/models/paddleformers/base_fleet.py: 新增 PaddleFleetModelBase 基类、FastDeployAttention 层及 patch_paddlefleet_core_attention 替换函数
model_executor/models/paddleformers/__init__.py: 注册 PaddleFleetForCausalLM 模型类

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet

Accuracy Tests

N/A（本 PR 新增 PaddleFleet 推理后端，尚未提供与参考实现的 logits 对齐数据）

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-07T07:07:39Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-08T09:59:31Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-25 20:36:45

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 5de13a6
Merge base: cb2d7c0 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

Required 任务当前 8/10 通过，仍有 2 个 required 失败任务 需要处理：run_tests_with_coverage 覆盖率门禁失败，以及 Approval 需要人工审批。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	37	4	1	0	0

2 任务状态汇总

日志列说明：失败任务直接使用 CI 日志链接；Required 失败任务已在表格中给出根因和修复建议。

2.1 Required任务 : 8/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	1h25m	PR问题：新增 PaddleFleet 代码差异覆盖率仅 5%	补充 base_fleet 相关单测	Job	-
❌	`Approval`	20s	需要 Approval	请通过人工审批	Job	-
✅	其余 8 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 29/32 通过

可选任务不阻塞合并，失败仅供参考。本次不对 optional 失败任务做深度分析。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	1m23s	Job	-
❌	`Trigger Jenkins for PR`	14s	Job	-
⏳	`CI_HPU`	-	Workflow	-
✅	其余 29 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率门禁（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 覆盖率门禁
置信度: 高
根因摘要: 新增 PaddleFleet 代码差异覆盖率仅 5%
分析器: ci_analyze_unittest_fastdeploy

失败用例: 无。日志显示 TEST_EXIT_CODE=0，单测均通过；失败发生在覆盖率检查步骤。

根因详情:
本 PR 新增 PaddleFleet fallback 后端，但新增代码缺少对应单元测试覆盖，导致 diff-cover --fail-under=80 未通过。CI 生成的 diff_coverage.json 显示总差异覆盖率 total_percent_covered=5，共 278 个 violation；其中 fastdeploy/model_executor/models/paddleformers/base_fleet.py 覆盖率仅 1.84%，大量新增逻辑（FastDeployAttention、PaddleFleetModelBase、patch_paddlefleet_core_attention）未被测试覆盖。

关键日志:

All tests passed
Coverage generation failed (exit code 9)
GPU Patch Coverage Details:
"fastdeploy/model_executor/models/paddleformers/base_fleet.py": {"percent_covered": 1.838235294117652, ...}
"fastdeploy/model_executor/models/model_base.py": {"percent_covered": 20.0, "violation_lines": [198, 200, 201, 203]}
"fastdeploy/model_executor/models/paddleformers/__init__.py": {"percent_covered": 22.222222222222214, ...}
"total_num_violations": 278,
"total_percent_covered": 5

修复建议:

在 tests/ 下补充 PaddleFleet 后端单测，覆盖 fastdeploy/model_executor/models/paddleformers/base_fleet.py 中 FastDeployAttention.forward（含 3D/4D 输入、异常分支）、PaddleFleetModelBase.compute_logits/embed_input_ids/forward 和 patch_paddlefleet_core_attention 的核心路径。
补充轻量 mock 测试覆盖 fastdeploy/model_executor/models/model_base.py L197-L209 的 model_impl == "paddlefleet" 分支，以及 fastdeploy/model_executor/models/paddleformers/__init__.py L44-L57 的注册逻辑；可通过 mock is_paddlefleet_available()、fake PaddleFleet layer/model 避免真实依赖。
若该后端暂无法在 CI 环境构造有效单测，请按项目规范申请差异覆盖率豁免或调整覆盖率统计范围，并在 PR 中说明原因。

修复建议摘要: 补充 base_fleet 相关单测

关联变更: fastdeploy/model_executor/models/paddleformers/base_fleet.py L51-L685、fastdeploy/model_executor/models/model_base.py L197-L209、fastdeploy/model_executor/models/paddleformers/__init__.py L44-L57

Approval — 需要人工审批（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

codecov-commenter · 2026-05-08T11:49:57Z

Codecov Report

❌ Patch coverage is 3.38983% with 285 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@cb2d7c0). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
.../model_executor/models/paddleformers/base_fleet.py	1.47%	267 Missing and 1 partial ⚠️
...oy/model_executor/models/paddleformers/__init__.py	11.11%	7 Missing and 1 partial ⚠️
fastdeploy/model_executor/models/model_base.py	0.00%	4 Missing and 1 partial ⚠️
fastdeploy/model_executor/utils.py	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7732   +/-   ##
==========================================
  Coverage           ?   63.65%           
==========================================
  Files              ?      468           
  Lines              ?    65247           
  Branches           ?    10015           
==========================================
  Hits               ?    41536           
  Misses             ?    20886           
  Partials           ?     2825

Flag	Coverage Δ
GPU	`72.67% <3.38%> (?)`
XPU	`7.04% <0.33%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-25 14:54:00

📋 Review 摘要

PR 概述：新增 PaddleFleet 作为模型推理后端，通过替换 PaddleFleet TransformerLayer 中的 core_attention 为 FastDeploy Attention 内核，复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

变更范围：model_executor/models/、config.py、engine/args_utils.py、graph_optimization/decorator.py

影响面 Tag：[Models] [FDConfig] [Engine]

问题

级别	文件	概述
🔴 Bug	`model_base.py:203`	pip 安装命令字符串拼接缺少空格，命令无法执行
🟡 建议	`base_fleet.py`	`assert` 用于运行时校验，Python `-O` 优化模式下会被跳过
🟡 建议	`base_fleet.py`	`load_weights` 为空实现（pass），存在框架调用被静默跳过的风险
❓ 疑问	`base_fleet.py`	`fd_layer_id = layer_number`：PaddleFleet `layer_number` 从 1 开始，是否与 FastDeploy 0-indexed `layer_id` 对齐？
📝 PR 规范	—	Checklist `[x] Provide accuracy results` 与 `Accuracy Tests: N/A` 不一致

🟡 `base_fleet.py` — `assert` 用于运行时校验

FastDeployAttention.forward 中两处 assert 用于防御性输入校验，在 Python -O 模式下会被静默跳过：

# 当前（有风险）
assert forward_meta is not None, "forward_meta must be provided"
assert kv_compressed is not None, "kv_compressed must be provided when use"

# 建议改为
if forward_meta is None:
    raise ValueError("forward_meta must be provided in config")
if kv_compressed is None:
    raise ValueError("kv_compressed must be provided when multi_latent_attention is enabled")

🟡 `base_fleet.py` — `load_weights` 空实现

load_weights 方法为 pass，但框架在动态加载或热更新场景可能调用此方法，导致权重被静默跳过。权重已在 __init__ 中通过 AutoModelForCausalLM.from_pretrained 加载，建议明确记录此行为或在被框架调用时保护：

@paddle.no_grad()
def load_weights(self, weights: Iterable[tuple[str, paddle.Tensor]]):
    # Weights are loaded in __init__ via AutoModelForCausalLM.from_pretrained.
    # This method is intentionally a no-op for PaddleFleet backend.
    pass

至少添加注释说明，避免后续维护者误认为这是遗漏实现。

❓ `base_fleet.py` — `layer_id` 1-indexed vs 0-indexed 对齐

在 patch_paddlefleet_core_attention 中：

fd_layer_id = layer_number  # PaddleFleet layer_number 从 1 开始
fd_attn_instance = Attention(fd_config=fd_config, layer_id=fd_layer_id)

PaddleFleet 的 layer_number 从 1 开始，FastDeploy 的 KV Cache 通常使用 0-indexed layer_id。如果 Attention 的 layer_id 是 0-indexed，则第 0 层 KV Cache slot 永远不会被使用，且最后一层会越界。

建议确认 Attention 类的 layer_id 语义，如果是 0-indexed 则应改为：

fd_layer_id = layer_number - 1

📝 PR 规范检查

Checklist 中 [x] Provide accuracy results. 被勾选，但 ## Accuracy Tests 节内容为 N/A，两者不一致。对于新引入的推理后端，建议补充与参考实现的 logits 对齐结果，或明确在 Checklist 中取消勾选并在 Accuracy Tests 段说明原因。

PR 描述建议（可直接复制，仅修改 Checklist 勾选状态以修正不一致）：

## Motivation
新增 PaddleFleet 作为模型推理后端（`--model-impl paddlefleet`），通过将 PaddleFleet TransformerLayer 中的 `core_attention` 替换为 FastDeploy Attention 内核，实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

## Modifications
- `config.py`: 新增 `paddlefleet` 到 `ModelImpl` 类型定义
- `engine/args_utils.py`: 支持 `--model-impl paddlefleet` CLI 参数，并补充校验逻辑
- `model_executor/models/paddleformers/base_fleet.py`: 新增 `PaddleFleetModelBase` 基类、`FastDeployAttention` 层及 `patch_paddlefleet_core_attention` 替换函数
- `model_executor/models/paddleformers/__init__.py`: 注册 `PaddleFleetForCausalLM` 模型类

## Usage or Command
```bash
python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet
```

## Accuracy Tests
N/A（本 PR 新增 PaddleFleet 推理后端，尚未提供与参考实现的 logits 对齐数据）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体实现思路清晰，通过 core_attention 替换实现了 PaddleFleet 与 FastDeploy 的对接。存在一处 🔴 Bug（pip 命令字符串拼接缺少空格），需修复后才能正确指引用户安装依赖；另有 layer_id 1-indexed 对齐问题值得作者确认，以避免潜在 KV Cache 越界。

PaddlePaddle-bot · 2026-05-25T06:57:24Z

+            if is_paddlefleet_available():
+                backend_arch = "PaddleFleetForCausalLM"
+            else:
+                raise ImportError(


🔴 Bug pip 安装命令字符串拼接缺少空格，命令无法执行

第3行字符串末尾缺少空格，与第4行直接拼接后变为 ...dev20260507--extra-index-url...，用户复制执行会报错。

建议修复：

"python -m pip install paddlefleet==0.3.0.dev20260507 " "--extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/ " "--extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu126/"

即在 dev20260507 后面加一个空格。

add fleet fallback

6898863

xiaoguoguo626807 had a problem deploying to Metax_ci May 7, 2026 07:07 — with GitHub Actions Failure

remove fleet depend

18cc86b

xiaoguoguo626807 had a problem deploying to Metax_ci May 8, 2026 09:34 — with GitHub Actions Failure

change import juage

5e81aaf

xiaoguoguo626807 had a problem deploying to Metax_ci May 8, 2026 09:57 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

change import juage

4acab59

xiaoguoguo626807 had a problem deploying to Metax_ci May 8, 2026 10:22 — with GitHub Actions Error

change import juage

856ffc4

xiaoguoguo626807 temporarily deployed to Metax_ci May 8, 2026 10:25 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

update base fleet for mla

24cd474

xiaoguoguo626807 had a problem deploying to Metax_ci May 21, 2026 10:09 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

revert print

e28b10f

xiaoguoguo626807 had a problem deploying to Metax_ci May 25, 2026 06:20 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

PaddlePaddle-bot suggested changes May 25, 2026

View reviewed changes

fix fleet import in base_fleet.py

5de13a6

xiaoguoguo626807 had a problem deploying to Metax_ci May 25, 2026 08:06 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Models] add fleet model fallback#7732

[Models] add fleet model fallback#7732
xiaoguoguo626807 wants to merge 8 commits into
PaddlePaddle:developfrom
xiaoguoguo626807:fleet

xiaoguoguo626807 commented May 7, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 7, 2026

Uh oh!

PaddlePaddle-bot commented May 8, 2026 •

edited

Loading

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 8, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xiaoguoguo626807 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 7, 2026

Uh oh!

PaddlePaddle-bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 8/10 通过

2.2 可选任务 — 29/32 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

🟡 base_fleet.py — assert 用于运行时校验

🟡 base_fleet.py — load_weights 空实现

❓ base_fleet.py — layer_id 1-indexed vs 0-indexed 对齐

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xiaoguoguo626807 commented May 7, 2026 •

edited

Loading

PaddlePaddle-bot commented May 8, 2026 •

edited

Loading

codecov-commenter commented May 8, 2026 •

edited

Loading

🟡 `base_fleet.py` — `assert` 用于运行时校验

🟡 `base_fleet.py` — `load_weights` 空实现

❓ `base_fleet.py` — `layer_id` 1-indexed vs 0-indexed 对齐