Skip to content

Merge processor#7109

Open
luukunn wants to merge 4 commits intoPaddlePaddle:developfrom
luukunn:merge_processor_3
Open

Merge processor#7109
luukunn wants to merge 4 commits intoPaddlePaddle:developfrom
luukunn:merge_processor_3

Conversation

@luukunn
Copy link
Copy Markdown
Collaborator

@luukunn luukunn commented Mar 31, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings March 31, 2026 08:34
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 31, 2026

Thanks for your contribution!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在将多模态(VL)相关的 image processor 与 processor 路由逻辑做统一/迁移:把原先分散在各 VL 子目录下的 image_processor / image_preprocessor 逐步迁移到 fastdeploy/input/image_processors/,并新增一个统一的 MultiModalProcessorInputPreprocessor 在多模态场景下分发使用,同时保留旧路径的兼容导入入口。

Changes:

  • 新增 fastdeploy/input/multimodal_processor.py,用 model_type 统一封装 QwenVL/Qwen3VL/PaddleOCRVL/Ernie4.5VL 的请求处理分发。
  • 新增 fastdeploy/input/image_processors/ 下的多个 processor 实现,并将旧 VL 目录内的 image_processor 模块改为兼容层转发导入。
  • 更新相关单测 patch 路径,使其指向新的 image_processors 模块位置。

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/input/test_image_preprocessor_adaptive.py 更新 patch 目标路径以匹配 image preprocessor 迁移后的模块位置
fastdeploy/input/preprocess.py 多模态分支统一改为创建 MultiModalProcessor(替代原先按架构分别 import 的处理器)
fastdeploy/input/multimodal_processor.py 新增统一多模态 processor,封装 VL 请求处理流程与模型类型分发
fastdeploy/input/image_processors/init.py 汇总导出新的 image processors(便于统一 import)
fastdeploy/input/image_processors/adaptive_processor.py 新增/迁移 AdaptiveImageProcessor 实现(从原 Ernie4.5 VL 路径迁移)
fastdeploy/input/image_processors/qwen_processor.py 新增/迁移 QwenVL ImageProcessor 实现
fastdeploy/input/image_processors/qwen3_processor.py 新增/迁移 Qwen3VL ImageProcessor 实现
fastdeploy/input/image_processors/paddleocr_processor.py 新增/迁移 PaddleOCRVL ImageProcessor 实现
fastdeploy/input/qwen_vl_processor/image_processor.py 旧路径兼容层:转发导入到新 image_processors.qwen_processor
fastdeploy/input/qwen3_vl_processor/image_processor.py 旧路径兼容层:转发导入到新 image_processors.qwen3_processor
fastdeploy/input/paddleocr_vl_processor/image_processor.py 旧路径兼容层:转发导入到新 image_processors.paddleocr_processor
fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/image_preprocessor_adaptive.py 旧路径兼容层:转发导入到新 image_processors.adaptive_processor
fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/get_image_preprocessor.py 旧路径兼容层:转发导入到新 image_processors.adaptive_processor
fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/init.py 旧路径兼容层:从新模块重导出 AdaptiveImageProcessor/get_image_preprocessor

Comment on lines +109 to +115
from fastdeploy.input.multimodal_processor import (
ERNIE4_5_VL,
PADDLEOCR_VL,
QWEN3_VL,
QWEN_VL,
MultiModalProcessor,
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 标题未按模板要求包含标签(如 [DataProcessor] ...),且 PR 描述中的 Motivation/Modifications/Usage/Tests 等段落仍为空。请补全标题与描述,以便 CI/审核流程正确识别变更类别并方便后续维护。

Copilot uses AI. Check for mistakes.
List["np.ndarray"],
List["paddle.Tensor"],
List[List["PIL.Image.Image"]],
List[List["np.ndarrray"]],
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VideoInput 的类型标注里写成了 np.ndarrray(多了一个 r),属于明显拼写错误,会误导类型检查/阅读者。建议更正为 np.ndarray

Suggested change
List[List["np.ndarrray"]],
List[List["np.ndarray"]],

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +87
def make_batched_images(images) -> List[List[ImageInput]]:
"""
Accepts images in list or nested list format, and makes a list of images for preprocessing.
images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`):
The input image.

Returns:
list: A list of images.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_batched_images 的返回值实际是扁平的 List[ImageInput](例如嵌套 list 分支会 flatten),但函数签名标注为 List[List[ImageInput]]。建议修正返回类型标注/注释,避免后续调用方按嵌套结构使用导致类型与语义不一致。

Suggested change
def make_batched_images(images) -> List[List[ImageInput]]:
"""
Accepts images in list or nested list format, and makes a list of images for preprocessing.
images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`):
The input image.
Returns:
list: A list of images.
def make_batched_images(images) -> List[ImageInput]:
"""
Accepts images in list or nested list format, and returns a flat list of images for preprocessing.
images (`Union[ImageInput, List[ImageInput], List[List[ImageInput]]]`):
The input image or images.
Returns:
List[ImageInput]: A flat list of images.

Copilot uses AI. Check for mistakes.

# Copied from transformers.models.llava_next_video.image_processing_llava_next_video.make_batched_videos
def make_batched_videos(videos) -> List[VideoInput]:
"""dummy"""
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该文件新增/保留了中文注释与占位文档(如 """设定pixels""""""dummy"""、以及中文 TODO 注释)。仓库约定代码注释需使用英文,且“dummy”无法表达意图;建议将这些注释/Docstring 改为英文并补充实际含义。

Suggested change
"""dummy"""
"""
Normalize different video input formats into a list of batched videos.
This helper accepts multiple representations of videos (nested lists of images,
4D NumPy arrays, or 4D Paddle tensors) and converts them into a common format
that downstream image processors can consume.
Args:
videos (`VideoInput` or nested structures of `VideoInput`):
The input video data. It can be:
- A list of videos, where each video is a list/tuple of frames (PIL images,
NumPy arrays, or Paddle tensors).
- A single video represented as a list/tuple of frames.
- A 4D array or tensor representing a single video, which will be split
into a list of frames.
Returns:
List[VideoInput]: A list where each element represents one video, and each
video is a list-like collection of frames.
Raises:
ValueError: If the provided input cannot be interpreted as batched videos.
"""

Copilot uses AI. Check for mistakes.
Comment on lines +197 to +202
assert isinstance(min_pixels, int) and min_pixels >= 0, "min_pixels must be positive int"
data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}")
self.min_pixels = min_pixels
self.size["min_pixels"] = int(min_pixels)
if max_pixels is not None:
assert isinstance(max_pixels, int) and max_pixels > 0, "max_pixels must be positive int"
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_pixels 使用 assert 做入参校验(min/max_pixels)。assert 在 Python -O 优化模式下会被移除,可能导致无校验地写入非法值。建议改为显式抛出 ValueError/TypeError 并给出清晰错误信息。

Suggested change
assert isinstance(min_pixels, int) and min_pixels >= 0, "min_pixels must be positive int"
data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}")
self.min_pixels = min_pixels
self.size["min_pixels"] = int(min_pixels)
if max_pixels is not None:
assert isinstance(max_pixels, int) and max_pixels > 0, "max_pixels must be positive int"
if not isinstance(min_pixels, int):
raise TypeError("min_pixels must be an int")
if min_pixels < 0:
raise ValueError("min_pixels must be a non-negative integer")
data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}")
self.min_pixels = min_pixels
self.size["min_pixels"] = int(min_pixels)
if max_pixels is not None:
if not isinstance(max_pixels, int):
raise TypeError("max_pixels must be an int")
if max_pixels <= 0:
raise ValueError("max_pixels must be a positive integer")

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +54
def make_batched_images(images) -> List[List[ImageInput]]:
"""
Accepts images in list or nested list format, and makes a list of images for preprocessing.

Args:
images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`):
The input image.

Returns:
list: A list of images.
"""
if isinstance(images, (list, tuple)) and isinstance(images[0], (list, tuple)) and is_valid_image(images[0][0]):
return [img for img_list in images for img in img_list]

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_batched_images 的返回值实际是扁平的 List[ImageInput],但类型标注为 List[List[ImageInput]]。建议统一为真实返回类型,避免后续静态检查和调用方理解偏差。

Copilot uses AI. Check for mistakes.
Comment on lines +315 to +319
max_tokens = max_model_len - len(request["prompt_token_ids"])
if request.get("max_tokens") is None:
request["max_tokens"] = max(1, max_tokens)
else:
request["max_tokens"] = min(max_tokens, request["max_tokens"])
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_request_dict 允许 max_model_len=None,但这里直接计算 max_tokens = max_model_len - len(prompt_token_ids) 会在 max_model_len 为 None 时触发 TypeError。建议:要么将 max_model_len 设为必填参数,要么在 None 时跳过/使用默认值来计算 max_tokens

Suggested change
max_tokens = max_model_len - len(request["prompt_token_ids"])
if request.get("max_tokens") is None:
request["max_tokens"] = max(1, max_tokens)
else:
request["max_tokens"] = min(max_tokens, request["max_tokens"])
if max_model_len is not None:
max_tokens = max_model_len - len(request["prompt_token_ids"])
if request.get("max_tokens") is None:
request["max_tokens"] = max(1, max_tokens)
else:
request["max_tokens"] = min(max_tokens, request["max_tokens"])
else:
# When max_model_len is None, do not derive max_tokens from it.
# If caller did not provide max_tokens, fall back to a minimal safe default.
if request.get("max_tokens") is None:
request["max_tokens"] = 1

Copilot uses AI. Check for mistakes.
Comment on lines +109 to +137
from fastdeploy.input.multimodal_processor import (
ERNIE4_5_VL,
PADDLEOCR_VL,
QWEN3_VL,
QWEN_VL,
MultiModalProcessor,
)

self.processor = Ernie4_5_VLProcessor(
model_name_or_path=self.model_name_or_path,
limit_mm_per_prompt=self.limit_mm_per_prompt,
mm_processor_kwargs=self.mm_processor_kwargs,
reasoning_parser_obj=reasoning_parser_obj,
tool_parser_obj=tool_parser_obj,
enable_processor_cache=self.enable_processor_cache,
)
if ErnieArchitectures.contains_ernie_arch(architecture):
model_type = ERNIE4_5_VL
elif "PaddleOCRVL" in architecture:
if not envs.ENABLE_V1_DATA_PROCESSOR:
from fastdeploy.input.paddleocr_vl_processor import (
PaddleOCRVLProcessor,
)
else:
from fastdeploy.input.v1.paddleocr_vl_processor import (
PaddleOCRVLProcessor,
)

self.processor = PaddleOCRVLProcessor(
config=self.model_config,
model_name_or_path=self.model_name_or_path,
limit_mm_per_prompt=self.limit_mm_per_prompt,
mm_processor_kwargs=self.mm_processor_kwargs,
reasoning_parser_obj=reasoning_parser_obj,
)
model_type = PADDLEOCR_VL
elif "Qwen2_5_VL" in architecture:
if not envs.ENABLE_V1_DATA_PROCESSOR:
from fastdeploy.input.qwen_vl_processor import QwenVLProcessor
else:
from fastdeploy.input.v1.qwen_vl_processor import (
QwenVLProcessor,
)

self.processor = QwenVLProcessor(
config=self.model_config,
model_name_or_path=self.model_name_or_path,
limit_mm_per_prompt=self.limit_mm_per_prompt,
mm_processor_kwargs=self.mm_processor_kwargs,
reasoning_parser_obj=reasoning_parser_obj,
enable_processor_cache=self.enable_processor_cache,
)
model_type = QWEN_VL
elif "Qwen3VL" in architecture:
if not envs.ENABLE_V1_DATA_PROCESSOR:
from fastdeploy.input.qwen3_vl_processor import Qwen3VLProcessor
else:
from fastdeploy.input.v1.qwen3_vl_processor import (
Qwen3VLProcessor,
)

self.processor = Qwen3VLProcessor(
config=self.model_config,
model_name_or_path=self.model_name_or_path,
limit_mm_per_prompt=self.limit_mm_per_prompt,
mm_processor_kwargs=self.mm_processor_kwargs,
reasoning_parser_obj=reasoning_parser_obj,
enable_processor_cache=self.enable_processor_cache,
)
model_type = QWEN3_VL
else:
raise ValueError(f"Unsupported model processor architecture: {architecture}. ")

self.processor = MultiModalProcessor(
model_name_or_path=self.model_name_or_path,
model_type=model_type,
config=self.model_config,
limit_mm_per_prompt=self.limit_mm_per_prompt,
mm_processor_kwargs=self.mm_processor_kwargs,
reasoning_parser_obj=reasoning_parser_obj,
tool_parser_obj=tool_parser_obj,
enable_processor_cache=self.enable_processor_cache,
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputPreprocessor.create_processor() 的多模态分支现在统一走 MultiModalProcessor,但现有单测(tests/input/test_preprocess.py)只覆盖了非 MM 分支和“不支持架构”异常分支,未覆盖 MM 架构路由(ERNIE/PaddleOCR/Qwen/Qwen3)以及传参(limit_mm_per_prompt/mm_processor_kwargs 等)。建议补充对应单测,防止后续重构导致路由回归。

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 63.57243% with 259 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@76cf5e9). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/input/multimodal_processor.py 22.87% 196 Missing and 13 partials ⚠️
...astdeploy/input/image_processors/qwen_processor.py 83.52% 6 Missing and 8 partials ⚠️
...ploy/input/image_processors/paddleocr_processor.py 86.17% 11 Missing and 2 partials ⚠️
...stdeploy/input/image_processors/qwen3_processor.py 84.70% 6 Missing and 7 partials ⚠️
...eploy/input/image_processors/adaptive_processor.py 96.22% 5 Missing and 1 partial ⚠️
fastdeploy/input/preprocess.py 57.14% 3 Missing ⚠️
...essor/image_preprocessor/get_image_preprocessor.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7109   +/-   ##
==========================================
  Coverage           ?   73.07%           
==========================================
  Files              ?      408           
  Lines              ?    56839           
  Branches           ?     9001           
==========================================
  Hits               ?    41535           
  Misses             ?    12368           
  Partials           ?     2936           
Flag Coverage Δ
GPU 73.07% <63.57%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants