Skip to content

[CI] DEBUG validate installation of paddleformers[paddlefleet]==1.1.0.dev20260507#7916

Open
xiaoguoguo626807 wants to merge 3 commits into
PaddlePaddle:developfrom
xiaoguoguo626807:updatefleet
Open

[CI] DEBUG validate installation of paddleformers[paddlefleet]==1.1.0.dev20260507#7916
xiaoguoguo626807 wants to merge 3 commits into
PaddlePaddle:developfrom
xiaoguoguo626807:updatefleet

Conversation

@xiaoguoguo626807
Copy link
Copy Markdown

@xiaoguoguo626807 xiaoguoguo626807 commented May 25, 2026

Motivation

Validate the installation behavior and compatibility of paddleformers[paddlefleet]==1.1.0.dev20260507 in the current CI environment for debugging and verification purposes.

Modifications

Added temporary CI changes to verify installation of:
paddleformers[paddlefleet]==1.1.0.dev20260507
Used for debugging package installation and dependency compatibility issues.

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 25, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@8a4ac65). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7916   +/-   ##
==========================================
  Coverage           ?   63.64%           
==========================================
  Files              ?      467           
  Lines              ?    64965           
  Branches           ?     9962           
==========================================
  Hits               ?    41347           
  Misses             ?    20810           
  Partials           ?     2808           
Flag Coverage Δ
GPU 72.70% <ø> (?)
XPU 7.07% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 25, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-26 00:12:44

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

Required 任务有 1 个失败、0 个运行中/等待中,当前不建议合入;可选任务有 2 个失败、1 个等待中,仅供参考。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 38 3 0 1 0

2 任务状态汇总

2.1 Required任务 : 9/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 1h29m PR问题:CI依赖改动引入版本冲突 移除冲突安装或对齐依赖版本 Job -
其余 9 个必选任务通过 - - - - -

2.2 可选任务 — 29/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 1m41s Job -
Trigger Jenkins for PR 1m47s Job -
⏸️ CI_HPU - - -
其余 29 个可选任务通过 - - -

3 失败详情(仅 required)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 依赖问题/测试失败(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 依赖问题 / 测试失败
  • 置信度: 高
  • 根因摘要: CI依赖改动引入版本冲突
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试 错误 根因
tests/e2e/test_Qwen3VLMoe_serving.py::test_consistency_between_runs AssertionError: 500 != 200 视频解码依赖异常导致服务返回 500
tests/e2e/test_Qwen3VLMoe_serving.py::test_non_streaming_chat openai.InternalServerError torchcodec.decoders 无法导入
tests/e2e/test_Qwen3VL_serving.py::test_consistency_between_runs AssertionError: 500 != 200 视频解码依赖异常导致服务返回 500
tests/e2e/test_Qwen3VL_serving.py::test_non_streaming_chat openai.InternalServerError torchcodec.decoders 无法导入
tests/e2e/test_pd_reorder.py::test_model_against_baseline[...] RuntimeError: Failed to get result from worker Engine worker 断连/无结果返回
tests/model_loader/test_model_cache.py::test_model_cache[...] RuntimeError: Failed to get result from worker Engine worker 断连/无结果返回
tests/model_loader/test_torch_model.py::test_model_against_baseline[...] RuntimeError: Failed to get result from worker Engine worker 断连/无结果返回

根因详情:
PR 在 .github/workflows/_unit_test_coverage.yml:242-247 于测试前新增 paddleformers[paddlefleet]==1.1.0.dev20260507 安装,并随后卸载/重装 paddlepaddle-gpu nightly。日志显示该安装链路将 paddleformers 1.1.1 替换为 1.1.0.dev20260507、将 transformers 4.57.6 替换为 5.3.0,与 fastdeploy-gpu 的依赖约束冲突;同时 paddlefleet 要求 paddlepaddle_gpu==3.4.0.post20260429+...,但后续又被重装为 paddlepaddle-gpu 3.5.0.dev20260524。在该环境下,VL 测试触发 fastdeploy/input/utils/video.py:112-116from torchcodec.decoders import VideoDecoder 时失败,服务返回 500;模型加载类测试则表现为 engine worker broken pipe/结果队列为空。

关键日志:

fastdeploy-gpu 0.0.0 requires paddleformers>=1.1.1, but you have paddleformers 1.1.0.dev20260507 which is incompatible.
fastdeploy-gpu 0.0.0 requires transformers<5.0.0,>=4.55.1, but you have transformers 5.3.0 which is incompatible.
Successfully installed ... paddlefleet-0.3.0.dev20260506 paddleformers-1.1.0.dev20260507 paddlepaddle_gpu-3.4.0.post20260429+f2d27632b14 ... transformers-5.3.0
paddlefleet 0.3.0.dev20260506 requires paddlepaddle_gpu==3.4.0.post20260429+f2d27632b14, but you have paddlepaddle-gpu 3.5.0.dev20260524 which is incompatible.
ModuleNotFoundError: No module named 'torchcodec.decoders'; 'torchcodec' is not a package
FAILED tests/e2e/test_Qwen3VL_serving.py::test_consistency_between_runs - assert 500 == 200
FAILED tests/model_loader/test_model_cache.py::test_model_cache[...] - RuntimeError: Failed to get result from worker:

修复建议:

  1. .github/workflows/_unit_test_coverage.yml:242-247:移除本次调试用的强制安装/卸载链路,或将其放到非 required 的独立验证 job,避免污染 required 单测环境。
  2. 如果必须验证该 paddleformers[paddlefleet] 版本,请同步对齐依赖约束:不要把 transformers 升到 5.x;保证 paddleformers 满足 fastdeploy-gpu>=1.1.1 要求;并避免在安装 paddlefleet 后再强制替换其要求的 paddlepaddle_gpu 版本。
  3. 对 VL case,可在修复依赖后复查 fastdeploy/input/utils/video.py:112-116torchcodec.decoders 导入是否恢复正常。

修复建议摘要: 移除冲突安装或对齐依赖版本

关联变更: .github/workflows/_unit_test_coverage.yml:242-247;同类安装改动也出现在 _base_test.yml_stable_test.yml_accuracy_test.yml_logprob_test_linux.yml_gpu_4cards_case_test.yml_pre_ce_test.yml_golang_router_test.yml

链接: 查看日志

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-25 20:08:07

📋 Review 摘要

PR 概述:临时调试 PR,在 8 个 CI workflow 中验证 paddleformers[paddlefleet]==1.1.0.dev20260507 的安装兼容性
变更范围.github/workflows/(全部 8 个 CI workflow 文件)
影响面 Tag[CI]

问题

级别 文件 概述
🟡 建议 .github/workflows/_accuracy_test.yml:187 安装顺序存在依赖冲突风险,实际验证的组合与预期不符
❓ 疑问 .github/workflows/_unit_test_coverage.yml:242 pip cache remove 仅在此 workflow 出现,会清除全局缓存拖慢后续 CI
📝 PR 规范 Checklist 勾选状态与实际不符

📝 PR 规范检查

PR 标题含 "DEBUG" 字样,描述也明确说明是 "temporary CI changes",但目标分支为 develop,需确认是否有后续 revert 计划。

Checklist 中 "Add unit tests" 和 "Provide accuracy results" 均勾选为 [x],但本 PR 为纯 CI workflow 调试变更,无需单测和精度结果,应改为 [ ] 并在 PR 中说明原因。

标题建议(可直接复制):

  • [CI] Validate paddleformers[paddlefleet]==1.1.0.dev20260507 installation compatibility

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation
Validate the installation behavior and compatibility of paddleformers[paddlefleet]==1.1.0.dev20260507 in the current CI environment for debugging and verification purposes.

## Modifications
Added temporary CI changes to verify installation of paddleformers[paddlefleet]==1.1.0.dev20260507 across 8 CI workflow files:
- Install `paddleformers[paddlefleet]==1.1.0.dev20260507` with extra index URLs
- Uninstall existing `paddlepaddle-gpu` and reinstall nightly version
- (`_unit_test_coverage.yml` only) Clear pip cache for transformers/paddleformers/fastdeploy-gpu before installation

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests. (CI workflow changes do not require unit tests)
- [ ] Provide accuracy results. (N/A for CI-only changes)
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本 PR 为临时调试变更,逻辑上存在安装顺序问题(先装 paddleformers 再 uninstall/reinstall paddlepaddle 会导致验证的依赖组合与预期不符),建议调整安装顺序后再合入,并在验证完成后及时 revert。


python -m pip install ${fastdeploy_wheel_url}
python -m pip install paddleformers[paddlefleet]==1.1.0.dev20260507 --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/ --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu126/
python -m pip uninstall paddlepaddle-gpu -y || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 安装顺序存在依赖冲突风险

当前顺序:先安装 paddleformers[paddlefleet]==1.1.0.dev20260507(其依赖解析会拉取特定版本的 paddlepaddle-gpu),随即 uninstall paddlepaddle-gpu,再重装 nightly 版本。

这会导致 paddleformers 安装时声明的 paddlepaddle 版本依赖被 nightly 版本覆盖,实际验证的是「paddleformers + nightly paddlepaddle」的组合,而非 paddleformers 自身声明的依赖组合。

建议修复方式:若目的是验证 paddleformers 能与 nightly paddlepaddle 共存,应先安装 nightly paddlepaddle,再安装 paddleformers(去掉中间的 uninstall 步骤):

python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
python -m pip install paddleformers[paddlefleet]==1.1.0.dev20260507 --extra-index-url ...

else
echo "Warning: tests/plugins directory not found, skipping setup.py install"
fi
python -m pip cache remove transformers
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 pip cache remove 会清除 CI 构建缓存,导致后续所有 CI 运行重新下载这些包,显著增加 CI 耗时。

其他 7 个 workflow 均未添加此步骤,仅 _unit_test_coverage.yml 有,是否为有意为之?如果只是为了强制重新安装,可改用 --no-cache-dir 参数替代全局 cache 清除。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants