Skip to content

Ck#7869

Open
lizexu123 wants to merge 38 commits into
PaddlePaddle:developfrom
lizexu123:ck
Open

Ck#7869
lizexu123 wants to merge 38 commits into
PaddlePaddle:developfrom
lizexu123:ck

Conversation

@lizexu123
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 20, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ lonelygsh
✅ lizexu123
❌ root


root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-20 19:17:48

📋 Review 摘要

PR 概述:为 benchmark-compare skill 新增 vLLM 作为第三对比框架,支持 fd/sg/vllm 三框架中任意两个的灵活对比,允许单框架结果缺省。
变更范围.claude/skills/benchmark-compare/scripts/(benchmark 工具脚本)
影响面 Tag[Benchmark]

问题

级别 文件 概述
🟡 建议 extract_metrics.py baseline 不在 active 框架中时,diff_pct 静默为空,无任何警告日志
🟡 建议 generate_report.py:929 html.replace("FastDeploy", ...) 全文替换可能误替换 HTML 注释、JS 变量名等非展示内容
❓ 疑问 launch_service.sh:310 vLLM --data-parallel-size 参数跨版本兼容性需确认
📝 PR 规范 标题 "Ck" 缺少 Tag 且无实质描述;Motivation / Modifications / Usage 节均为空

📝 PR 规范检查

标题 Ck 不符合 [Tag] 标题描述 格式;PR 描述中 Motivation、Modifications、Usage or Command 三节均为空,仅剩模板注释。

标题建议(可直接复制):

  • [Benchmark] Add vLLM as third benchmark framework for comparison

PR 描述建议(可直接复制,完整结构):

## Motivation

benchmark-compare skill 原仅支持 FastDeploy 与 SGLang 两框架对比,无法将 vLLM 纳入对比场景。本 PR 扩展为三框架(fd/sg/vllm)支持,任意框架的结果均可缺省,提升工具灵活性。

## Modifications

- `extract_metrics.py`- 新增模块级常量 `HIGHER_IS_BETTER` / `LOWER_IS_BETTER`,取代原函数内部局部集合;
  - `compute_comparison` 接口改为 `(all_metrics, baseline="sg")`,支持任意框架子集对比;新增 `--baseline` CLI 参数控制 diff_pct 基准框架;
  - `--fd-result` / `--sg-result``required=True` 改为 `default=None`,新增 `--vllm-result` / `--vllm-config` 参数;至少提供一个结果文件才允许运行。
- `generate_report.py`- `scan_log_dir` 文件名正则扩展,支持 `_vllm.txt` 后缀识别;
  - 新增 `--frameworks`(默认 `fd,sg`)支持三选二对比,新增 `--vllm-attention` / `--vllm-version` 参数;
  - 通过 left/right 框架映射把数据绑定到 HTML 模板 fd/sg 槽位,并在非默认框架组合时替换 HTML 显示文本标签。
- `launch_service.sh`- 新增 `launch_vllm()` 函数,支持 vLLM 服务启动(TP/DP/EP/量化映射);
  - `--framework` 参数扩展支持 `vllm` 选项。

## Usage or Command

```bash
# 提取 FD 与 vLLM 对比指标(以 vLLM 为 baseline)
python3 extract_metrics.py \
    --fd-result fd_result.txt --vllm-result vllm_result.txt \
    --model-path /path/to/model --baseline vllm --output metrics.json

# 生成 vLLM vs SGLang 对比报告
python3 generate_report.py \
    --log-dir ./logs --frameworks vllm,sg \
    --model-name DeepSeek-R1 --output report.html

# 单卡启动 vLLM 服务
bash launch_service.sh --framework vllm --model /path/to/model \
    --port 8380 --gpus 2 --venv /path/to/vllm_env/.venv
```

## Accuracy Tests

N/A(工具脚本变更,不涉及模型推理精度)

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

工具脚本逻辑清晰,vLLM 三框架扩展思路合理,代码结构优化(常量提升、多框架循环)值得肯定。建议关注 baseline 缺省时的静默处理和 HTML 字符串替换边界问题,PR 标题和描述需按规范补全后合入。

left_name = framework_display[left_fw]
right_name = framework_display[right_fw]
if left_name != "FastDeploy" or right_name != "SGLang":
# 用 placeholder 中转避免 FastDeploy→X 后再被 SGLang 替换误伤
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 html.replace("FastDeploy", "__LEFT_FW__") 对整个 HTML 字符串做全文替换,可能误替换 HTML 注释、JavaScript 变量名或其他非显示文本中的 "FastDeploy"/"SGLang" 字符串。

虽然当前三个框架名称在 CSS 类名(.fd/.sg)中不含全名,实际误触发风险较低,但建议改用更精确的正则替换或在 generate_html 阶段通过参数传入框架名,避免后处理字符串替换的脆弱性。

建议修复方式:在 generate_html(valid_data, config)config 中传入 left_name/right_name,在模板渲染阶段用 {left_name} 占位符替换,彻底消除后处理替换。


# DP (data parallelism)
if [[ "$DP" -gt 1 ]]; then
CMD+=" --data-parallel-size $DP"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 vLLM 的 --data-parallel-size 参数是较新版本才支持的特性(部分版本通过 VLLM_DP_SIZE 环境变量或多进程方式实现),如果目标环境 vLLM 版本不支持该参数,启动时会报错。

建议在脚本或文档中注明支持此参数的最低 vLLM 版本要求,或添加版本检测:

# 示例:检测 vLLM 是否支持 --data-parallel-size
python -c "import vllm; assert vllm.__version__ >= '0.6.0'" 2>/dev/null || echo '[WARN] vLLM 版本可能不支持 --data-parallel-size'

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 20, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-21 12:48:43

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

Required 任务当前 9/10 通过,仍有 1 个 required 任务失败Approval。该失败为人工审批类阻塞项,需完成审批后再继续观察 CI。Optional 任务有 4 个失败,不阻塞合并。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 37 5 0 0 0

2 任务状态汇总

日志列说明:失败任务直接使用 CI 日志链接;运行中任务使用 Job 链接。

2.1 Required任务 : 9/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 7s 需要 Approval 请通过人工审批 Job -
其余 9 个必选任务通过 - - - - -

2.2 可选任务 — 28/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 15m8s Job -
Check PR Template 13s Job -
CI_HPU 1h47m Job -
Trigger Jenkins for PR 2m39s Job -
其余 28 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 需要 Approval(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

  • 根因摘要:需要人工审批。
  • 修复建议摘要:请通过人工审批。

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@f254649). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7869   +/-   ##
==========================================
  Coverage           ?   63.69%           
==========================================
  Files              ?      462           
  Lines              ?    64435           
  Branches           ?     9881           
==========================================
  Hits               ?    41039           
  Misses             ?    20601           
  Partials           ?     2795           
Flag Coverage Δ
GPU 72.84% <ø> (?)
XPU 7.12% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants