Merge processor by luukunn · Pull Request #7109 · PaddlePaddle/FastDeploy

luukunn · 2026-03-31T08:34:43Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-31T08:35:09Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 旨在将多模态（VL）相关的 image processor 与 processor 路由逻辑做统一/迁移：把原先分散在各 VL 子目录下的 image_processor / image_preprocessor 逐步迁移到 fastdeploy/input/image_processors/，并新增一个统一的 MultiModalProcessor 供 InputPreprocessor 在多模态场景下分发使用，同时保留旧路径的兼容导入入口。

Changes:

新增 fastdeploy/input/multimodal_processor.py，用 model_type 统一封装 QwenVL/Qwen3VL/PaddleOCRVL/Ernie4.5VL 的请求处理分发。
新增 fastdeploy/input/image_processors/ 下的多个 processor 实现，并将旧 VL 目录内的 image_processor 模块改为兼容层转发导入。
更新相关单测 patch 路径，使其指向新的 image_processors 模块位置。

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/input/test_image_preprocessor_adaptive.py	更新 patch 目标路径以匹配 image preprocessor 迁移后的模块位置
fastdeploy/input/preprocess.py	多模态分支统一改为创建 `MultiModalProcessor`（替代原先按架构分别 import 的处理器）
fastdeploy/input/multimodal_processor.py	新增统一多模态 processor，封装 VL 请求处理流程与模型类型分发
fastdeploy/input/image_processors/init.py	汇总导出新的 image processors（便于统一 import）
fastdeploy/input/image_processors/adaptive_processor.py	新增/迁移 AdaptiveImageProcessor 实现（从原 Ernie4.5 VL 路径迁移）
fastdeploy/input/image_processors/qwen_processor.py	新增/迁移 QwenVL ImageProcessor 实现
fastdeploy/input/image_processors/qwen3_processor.py	新增/迁移 Qwen3VL ImageProcessor 实现
fastdeploy/input/image_processors/paddleocr_processor.py	新增/迁移 PaddleOCRVL ImageProcessor 实现
fastdeploy/input/qwen_vl_processor/image_processor.py	旧路径兼容层：转发导入到新 `image_processors.qwen_processor`
fastdeploy/input/qwen3_vl_processor/image_processor.py	旧路径兼容层：转发导入到新 `image_processors.qwen3_processor`
fastdeploy/input/paddleocr_vl_processor/image_processor.py	旧路径兼容层：转发导入到新 `image_processors.paddleocr_processor`
fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/image_preprocessor_adaptive.py	旧路径兼容层：转发导入到新 `image_processors.adaptive_processor`
fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/get_image_preprocessor.py	旧路径兼容层：转发导入到新 `image_processors.adaptive_processor`
fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/init.py	旧路径兼容层：从新模块重导出 AdaptiveImageProcessor/get_image_preprocessor

Copilot · 2026-03-31T08:43:57Z

fastdeploy/input/preprocess.py

+                from fastdeploy.input.multimodal_processor import (
+                    ERNIE4_5_VL,
+                    PADDLEOCR_VL,
+                    QWEN3_VL,
+                    QWEN_VL,
+                    MultiModalProcessor,
+                )


PR 标题未按模板要求包含标签（如 [DataProcessor] ...），且 PR 描述中的 Motivation/Modifications/Usage/Tests 等段落仍为空。请补全标题与描述，以便 CI/审核流程正确识别变更类别并方便后续维护。

Copilot · 2026-03-31T08:43:58Z

fastdeploy/input/image_processors/adaptive_processor.py

+    List["np.ndarray"],
+    List["paddle.Tensor"],
+    List[List["PIL.Image.Image"]],
+    List[List["np.ndarrray"]],


VideoInput 的类型标注里写成了 np.ndarrray（多了一个 r），属于明显拼写错误，会误导类型检查/阅读者。建议更正为 np.ndarray。

Suggested change

List[List["np.ndarrray"]],

List[List["np.ndarray"]],

Copilot · 2026-03-31T08:43:58Z

fastdeploy/input/image_processors/adaptive_processor.py

+def make_batched_images(images) -> List[List[ImageInput]]:
+    """
+    Accepts images in list or nested list format, and makes a list of images for preprocessing.
+        images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`):
+            The input image.
+
+    Returns:
+        list: A list of images.


make_batched_images 的返回值实际是扁平的 List[ImageInput]（例如嵌套 list 分支会 flatten），但函数签名标注为 List[List[ImageInput]]。建议修正返回类型标注/注释，避免后续调用方按嵌套结构使用导致类型与语义不一致。

Suggested change

def make_batched_images(images) -> List[List[ImageInput]]:

"""

Accepts images in list or nested list format, and makes a list of images for preprocessing.

images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`):

The input image.

Returns:

list: A list of images.

def make_batched_images(images) -> List[ImageInput]:

"""

Accepts images in list or nested list format, and returns a flat list of images for preprocessing.

images (`Union[ImageInput, List[ImageInput], List[List[ImageInput]]]`):

The input image or images.

Returns:

List[ImageInput]: A flat list of images.

Copilot · 2026-03-31T08:43:58Z

fastdeploy/input/image_processors/adaptive_processor.py

+
+# Copied from transformers.models.llava_next_video.image_processing_llava_next_video.make_batched_videos
+def make_batched_videos(videos) -> List[VideoInput]:
+    """dummy"""


该文件新增/保留了中文注释与占位文档（如 """设定pixels"""、"""dummy"""、以及中文 TODO 注释）。仓库约定代码注释需使用英文，且“dummy”无法表达意图；建议将这些注释/Docstring 改为英文并补充实际含义。

Suggested change

"""dummy"""

"""

Normalize different video input formats into a list of batched videos.

This helper accepts multiple representations of videos (nested lists of images,

4D NumPy arrays, or 4D Paddle tensors) and converts them into a common format

that downstream image processors can consume.

Args:

videos (`VideoInput` or nested structures of `VideoInput`):

The input video data. It can be:

- A list of videos, where each video is a list/tuple of frames (PIL images,

NumPy arrays, or Paddle tensors).

- A single video represented as a list/tuple of frames.

- A 4D array or tensor representing a single video, which will be split

into a list of frames.

Returns:

List[VideoInput]: A list where each element represents one video, and each

video is a list-like collection of frames.

Raises:

ValueError: If the provided input cannot be interpreted as batched videos.

"""

Copilot · 2026-03-31T08:43:59Z

fastdeploy/input/image_processors/adaptive_processor.py

+            assert isinstance(min_pixels, int) and min_pixels >= 0, "min_pixels must be positive int"
+            data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}")
+            self.min_pixels = min_pixels
+            self.size["min_pixels"] = int(min_pixels)
+        if max_pixels is not None:
+            assert isinstance(max_pixels, int) and max_pixels > 0, "max_pixels must be positive int"


set_pixels 使用 assert 做入参校验（min/max_pixels）。assert 在 Python -O 优化模式下会被移除，可能导致无校验地写入非法值。建议改为显式抛出 ValueError/TypeError 并给出清晰错误信息。

Suggested change

assert isinstance(min_pixels, int) and min_pixels >= 0, "min_pixels must be positive int"

data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}")

self.min_pixels = min_pixels

self.size["min_pixels"] = int(min_pixels)

if max_pixels is not None:

assert isinstance(max_pixels, int) and max_pixels > 0, "max_pixels must be positive int"

if not isinstance(min_pixels, int):

raise TypeError("min_pixels must be an int")

if min_pixels < 0:

raise ValueError("min_pixels must be a non-negative integer")

data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}")

self.min_pixels = min_pixels

self.size["min_pixels"] = int(min_pixels)

if max_pixels is not None:

if not isinstance(max_pixels, int):

raise TypeError("max_pixels must be an int")

if max_pixels <= 0:

raise ValueError("max_pixels must be a positive integer")

Copilot · 2026-03-31T08:43:59Z

fastdeploy/input/image_processors/paddleocr_processor.py

+def make_batched_images(images) -> List[List[ImageInput]]:
+    """
+    Accepts images in list or nested list format, and makes a list of images for preprocessing.
+
+    Args:
+        images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`):
+            The input image.
+
+    Returns:
+        list: A list of images.
+    """
+    if isinstance(images, (list, tuple)) and isinstance(images[0], (list, tuple)) and is_valid_image(images[0][0]):
+        return [img for img_list in images for img in img_list]
+


make_batched_images 的返回值实际是扁平的 List[ImageInput]，但类型标注为 List[List[ImageInput]]。建议统一为真实返回类型，避免后续静态检查和调用方理解偏差。

Copilot · 2026-03-31T08:43:59Z

fastdeploy/input/multimodal_processor.py

+        max_tokens = max_model_len - len(request["prompt_token_ids"])
+        if request.get("max_tokens") is None:
+            request["max_tokens"] = max(1, max_tokens)
+        else:
+            request["max_tokens"] = min(max_tokens, request["max_tokens"])


process_request_dict 允许 max_model_len=None，但这里直接计算 max_tokens = max_model_len - len(prompt_token_ids) 会在 max_model_len 为 None 时触发 TypeError。建议：要么将 max_model_len 设为必填参数，要么在 None 时跳过/使用默认值来计算 max_tokens。

Suggested change

max_tokens = max_model_len - len(request["prompt_token_ids"])

if request.get("max_tokens") is None:

request["max_tokens"] = max(1, max_tokens)

else:

request["max_tokens"] = min(max_tokens, request["max_tokens"])

if max_model_len is not None:

max_tokens = max_model_len - len(request["prompt_token_ids"])

if request.get("max_tokens") is None:

request["max_tokens"] = max(1, max_tokens)

else:

request["max_tokens"] = min(max_tokens, request["max_tokens"])

else:

# When max_model_len is None, do not derive max_tokens from it.

# If caller did not provide max_tokens, fall back to a minimal safe default.

if request.get("max_tokens") is None:

request["max_tokens"] = 1

Copilot · 2026-03-31T08:43:59Z

fastdeploy/input/preprocess.py

+                from fastdeploy.input.multimodal_processor import (
+                    ERNIE4_5_VL,
+                    PADDLEOCR_VL,
+                    QWEN3_VL,
+                    QWEN_VL,
+                    MultiModalProcessor,
+                )

-                    self.processor = Ernie4_5_VLProcessor(
-                        model_name_or_path=self.model_name_or_path,
-                        limit_mm_per_prompt=self.limit_mm_per_prompt,
-                        mm_processor_kwargs=self.mm_processor_kwargs,
-                        reasoning_parser_obj=reasoning_parser_obj,
-                        tool_parser_obj=tool_parser_obj,
-                        enable_processor_cache=self.enable_processor_cache,
-                    )
+                if ErnieArchitectures.contains_ernie_arch(architecture):
+                    model_type = ERNIE4_5_VL
                elif "PaddleOCRVL" in architecture:
-                    if not envs.ENABLE_V1_DATA_PROCESSOR:
-                        from fastdeploy.input.paddleocr_vl_processor import (
-                            PaddleOCRVLProcessor,
-                        )
-                    else:
-                        from fastdeploy.input.v1.paddleocr_vl_processor import (
-                            PaddleOCRVLProcessor,
-                        )
-
-                    self.processor = PaddleOCRVLProcessor(
-                        config=self.model_config,
-                        model_name_or_path=self.model_name_or_path,
-                        limit_mm_per_prompt=self.limit_mm_per_prompt,
-                        mm_processor_kwargs=self.mm_processor_kwargs,
-                        reasoning_parser_obj=reasoning_parser_obj,
-                    )
+                    model_type = PADDLEOCR_VL
                elif "Qwen2_5_VL" in architecture:
-                    if not envs.ENABLE_V1_DATA_PROCESSOR:
-                        from fastdeploy.input.qwen_vl_processor import QwenVLProcessor
-                    else:
-                        from fastdeploy.input.v1.qwen_vl_processor import (
-                            QwenVLProcessor,
-                        )
-
-                    self.processor = QwenVLProcessor(
-                        config=self.model_config,
-                        model_name_or_path=self.model_name_or_path,
-                        limit_mm_per_prompt=self.limit_mm_per_prompt,
-                        mm_processor_kwargs=self.mm_processor_kwargs,
-                        reasoning_parser_obj=reasoning_parser_obj,
-                        enable_processor_cache=self.enable_processor_cache,
-                    )
+                    model_type = QWEN_VL
                elif "Qwen3VL" in architecture:
-                    if not envs.ENABLE_V1_DATA_PROCESSOR:
-                        from fastdeploy.input.qwen3_vl_processor import Qwen3VLProcessor
-                    else:
-                        from fastdeploy.input.v1.qwen3_vl_processor import (
-                            Qwen3VLProcessor,
-                        )
-
-                    self.processor = Qwen3VLProcessor(
-                        config=self.model_config,
-                        model_name_or_path=self.model_name_or_path,
-                        limit_mm_per_prompt=self.limit_mm_per_prompt,
-                        mm_processor_kwargs=self.mm_processor_kwargs,
-                        reasoning_parser_obj=reasoning_parser_obj,
-                        enable_processor_cache=self.enable_processor_cache,
-                    )
+                    model_type = QWEN3_VL
                else:
                    raise ValueError(f"Unsupported model processor architecture: {architecture}. ")

+                self.processor = MultiModalProcessor(
+                    model_name_or_path=self.model_name_or_path,
+                    model_type=model_type,
+                    config=self.model_config,
+                    limit_mm_per_prompt=self.limit_mm_per_prompt,
+                    mm_processor_kwargs=self.mm_processor_kwargs,
+                    reasoning_parser_obj=reasoning_parser_obj,
+                    tool_parser_obj=tool_parser_obj,
+                    enable_processor_cache=self.enable_processor_cache,
+                )


InputPreprocessor.create_processor() 的多模态分支现在统一走 MultiModalProcessor，但现有单测（tests/input/test_preprocess.py）只覆盖了非 MM 分支和“不支持架构”异常分支，未覆盖 MM 架构路由（ERNIE/PaddleOCR/Qwen/Qwen3）以及传参（limit_mm_per_prompt/mm_processor_kwargs 等）。建议补充对应单测，防止后续重构导致路由回归。

fastdeploy/input/image_processors/adaptive_processor.py

codecov-commenter · 2026-03-31T10:29:50Z

Codecov Report

❌ Patch coverage is 63.57243% with 259 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@76cf5e9). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/input/multimodal_processor.py	22.87%	196 Missing and 13 partials ⚠️
...astdeploy/input/image_processors/qwen_processor.py	83.52%	6 Missing and 8 partials ⚠️
...ploy/input/image_processors/paddleocr_processor.py	86.17%	11 Missing and 2 partials ⚠️
...stdeploy/input/image_processors/qwen3_processor.py	84.70%	6 Missing and 7 partials ⚠️
...eploy/input/image_processors/adaptive_processor.py	96.22%	5 Missing and 1 partial ⚠️
fastdeploy/input/preprocess.py	57.14%	3 Missing ⚠️
...essor/image_preprocessor/get_image_preprocessor.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7109   +/-   ##
==========================================
  Coverage           ?   73.07%           
==========================================
  Files              ?      408           
  Lines              ?    56839           
  Branches           ?     9001           
==========================================
  Hits               ?    41535           
  Misses             ?    12368           
  Partials           ?     2936

Flag	Coverage Δ
GPU	`73.07% <63.57%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

luukunn added 4 commits March 30, 2026 18:36

first commit

e91968d

step 9~10

a48fc5f

update multimodal

f5bdd67

update multimodal

7646a49

Copilot AI review requested due to automatic review settings March 31, 2026 08:34

luukunn had a problem deploying to Metax_ci March 31, 2026 08:34 — with GitHub Actions Failure

Copilot started reviewing on behalf of luukunn March 31, 2026 08:35 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge processor#7109

Merge processor#7109
luukunn wants to merge 4 commits intoPaddlePaddle:developfrom
luukunn:merge_processor_3

luukunn commented Mar 31, 2026

Uh oh!

paddle-bot bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Uh oh!

codecov-commenter commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    """dummy"""
+    """
+    Normalize different video input formats into a list of batched videos.
+    This helper accepts multiple representations of videos (nested lists of images,
+D NumPy arrays, or 4D Paddle tensors) and converts them into a common format
+    that downstream image processors can consume.
+    Args:
+        videos (`VideoInput` or nested structures of `VideoInput`):
+            The input video data. It can be:
+            - A list of videos, where each video is a list/tuple of frames (PIL images,
+              NumPy arrays, or Paddle tensors).
+            - A single video represented as a list/tuple of frames.
+            - A 4D array or tensor representing a single video, which will be split
+              into a list of frames.
+    Returns:
+        List[VideoInput]: A list where each element represents one video, and each
+        video is a list-like collection of frames.
+    Raises:
+        ValueError: If the provided input cannot be interpreted as batched videos.
+    """

Conversation

luukunn commented Mar 31, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Mar 31, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants