Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 旨在将多模态(VL)相关的 image processor 与 processor 路由逻辑做统一/迁移:把原先分散在各 VL 子目录下的 image_processor / image_preprocessor 逐步迁移到 fastdeploy/input/image_processors/,并新增一个统一的 MultiModalProcessor 供 InputPreprocessor 在多模态场景下分发使用,同时保留旧路径的兼容导入入口。
Changes:
- 新增
fastdeploy/input/multimodal_processor.py,用model_type统一封装 QwenVL/Qwen3VL/PaddleOCRVL/Ernie4.5VL 的请求处理分发。 - 新增
fastdeploy/input/image_processors/下的多个 processor 实现,并将旧 VL 目录内的 image_processor 模块改为兼容层转发导入。 - 更新相关单测 patch 路径,使其指向新的
image_processors模块位置。
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/input/test_image_preprocessor_adaptive.py | 更新 patch 目标路径以匹配 image preprocessor 迁移后的模块位置 |
| fastdeploy/input/preprocess.py | 多模态分支统一改为创建 MultiModalProcessor(替代原先按架构分别 import 的处理器) |
| fastdeploy/input/multimodal_processor.py | 新增统一多模态 processor,封装 VL 请求处理流程与模型类型分发 |
| fastdeploy/input/image_processors/init.py | 汇总导出新的 image processors(便于统一 import) |
| fastdeploy/input/image_processors/adaptive_processor.py | 新增/迁移 AdaptiveImageProcessor 实现(从原 Ernie4.5 VL 路径迁移) |
| fastdeploy/input/image_processors/qwen_processor.py | 新增/迁移 QwenVL ImageProcessor 实现 |
| fastdeploy/input/image_processors/qwen3_processor.py | 新增/迁移 Qwen3VL ImageProcessor 实现 |
| fastdeploy/input/image_processors/paddleocr_processor.py | 新增/迁移 PaddleOCRVL ImageProcessor 实现 |
| fastdeploy/input/qwen_vl_processor/image_processor.py | 旧路径兼容层:转发导入到新 image_processors.qwen_processor |
| fastdeploy/input/qwen3_vl_processor/image_processor.py | 旧路径兼容层:转发导入到新 image_processors.qwen3_processor |
| fastdeploy/input/paddleocr_vl_processor/image_processor.py | 旧路径兼容层:转发导入到新 image_processors.paddleocr_processor |
| fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/image_preprocessor_adaptive.py | 旧路径兼容层:转发导入到新 image_processors.adaptive_processor |
| fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/get_image_preprocessor.py | 旧路径兼容层:转发导入到新 image_processors.adaptive_processor |
| fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/init.py | 旧路径兼容层:从新模块重导出 AdaptiveImageProcessor/get_image_preprocessor |
| from fastdeploy.input.multimodal_processor import ( | ||
| ERNIE4_5_VL, | ||
| PADDLEOCR_VL, | ||
| QWEN3_VL, | ||
| QWEN_VL, | ||
| MultiModalProcessor, | ||
| ) |
There was a problem hiding this comment.
PR 标题未按模板要求包含标签(如 [DataProcessor] ...),且 PR 描述中的 Motivation/Modifications/Usage/Tests 等段落仍为空。请补全标题与描述,以便 CI/审核流程正确识别变更类别并方便后续维护。
| List["np.ndarray"], | ||
| List["paddle.Tensor"], | ||
| List[List["PIL.Image.Image"]], | ||
| List[List["np.ndarrray"]], |
There was a problem hiding this comment.
VideoInput 的类型标注里写成了 np.ndarrray(多了一个 r),属于明显拼写错误,会误导类型检查/阅读者。建议更正为 np.ndarray。
| List[List["np.ndarrray"]], | |
| List[List["np.ndarray"]], |
| def make_batched_images(images) -> List[List[ImageInput]]: | ||
| """ | ||
| Accepts images in list or nested list format, and makes a list of images for preprocessing. | ||
| images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`): | ||
| The input image. | ||
|
|
||
| Returns: | ||
| list: A list of images. |
There was a problem hiding this comment.
make_batched_images 的返回值实际是扁平的 List[ImageInput](例如嵌套 list 分支会 flatten),但函数签名标注为 List[List[ImageInput]]。建议修正返回类型标注/注释,避免后续调用方按嵌套结构使用导致类型与语义不一致。
| def make_batched_images(images) -> List[List[ImageInput]]: | |
| """ | |
| Accepts images in list or nested list format, and makes a list of images for preprocessing. | |
| images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`): | |
| The input image. | |
| Returns: | |
| list: A list of images. | |
| def make_batched_images(images) -> List[ImageInput]: | |
| """ | |
| Accepts images in list or nested list format, and returns a flat list of images for preprocessing. | |
| images (`Union[ImageInput, List[ImageInput], List[List[ImageInput]]]`): | |
| The input image or images. | |
| Returns: | |
| List[ImageInput]: A flat list of images. |
|
|
||
| # Copied from transformers.models.llava_next_video.image_processing_llava_next_video.make_batched_videos | ||
| def make_batched_videos(videos) -> List[VideoInput]: | ||
| """dummy""" |
There was a problem hiding this comment.
该文件新增/保留了中文注释与占位文档(如 """设定pixels"""、"""dummy"""、以及中文 TODO 注释)。仓库约定代码注释需使用英文,且“dummy”无法表达意图;建议将这些注释/Docstring 改为英文并补充实际含义。
| """dummy""" | |
| """ | |
| Normalize different video input formats into a list of batched videos. | |
| This helper accepts multiple representations of videos (nested lists of images, | |
| 4D NumPy arrays, or 4D Paddle tensors) and converts them into a common format | |
| that downstream image processors can consume. | |
| Args: | |
| videos (`VideoInput` or nested structures of `VideoInput`): | |
| The input video data. It can be: | |
| - A list of videos, where each video is a list/tuple of frames (PIL images, | |
| NumPy arrays, or Paddle tensors). | |
| - A single video represented as a list/tuple of frames. | |
| - A 4D array or tensor representing a single video, which will be split | |
| into a list of frames. | |
| Returns: | |
| List[VideoInput]: A list where each element represents one video, and each | |
| video is a list-like collection of frames. | |
| Raises: | |
| ValueError: If the provided input cannot be interpreted as batched videos. | |
| """ |
| assert isinstance(min_pixels, int) and min_pixels >= 0, "min_pixels must be positive int" | ||
| data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}") | ||
| self.min_pixels = min_pixels | ||
| self.size["min_pixels"] = int(min_pixels) | ||
| if max_pixels is not None: | ||
| assert isinstance(max_pixels, int) and max_pixels > 0, "max_pixels must be positive int" |
There was a problem hiding this comment.
set_pixels 使用 assert 做入参校验(min/max_pixels)。assert 在 Python -O 优化模式下会被移除,可能导致无校验地写入非法值。建议改为显式抛出 ValueError/TypeError 并给出清晰错误信息。
| assert isinstance(min_pixels, int) and min_pixels >= 0, "min_pixels must be positive int" | |
| data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}") | |
| self.min_pixels = min_pixels | |
| self.size["min_pixels"] = int(min_pixels) | |
| if max_pixels is not None: | |
| assert isinstance(max_pixels, int) and max_pixels > 0, "max_pixels must be positive int" | |
| if not isinstance(min_pixels, int): | |
| raise TypeError("min_pixels must be an int") | |
| if min_pixels < 0: | |
| raise ValueError("min_pixels must be a non-negative integer") | |
| data_processor_logger.info(f"{msg} AdaptiveImageProcessor set min_pixels = {min_pixels}") | |
| self.min_pixels = min_pixels | |
| self.size["min_pixels"] = int(min_pixels) | |
| if max_pixels is not None: | |
| if not isinstance(max_pixels, int): | |
| raise TypeError("max_pixels must be an int") | |
| if max_pixels <= 0: | |
| raise ValueError("max_pixels must be a positive integer") |
| def make_batched_images(images) -> List[List[ImageInput]]: | ||
| """ | ||
| Accepts images in list or nested list format, and makes a list of images for preprocessing. | ||
|
|
||
| Args: | ||
| images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`): | ||
| The input image. | ||
|
|
||
| Returns: | ||
| list: A list of images. | ||
| """ | ||
| if isinstance(images, (list, tuple)) and isinstance(images[0], (list, tuple)) and is_valid_image(images[0][0]): | ||
| return [img for img_list in images for img in img_list] | ||
|
|
There was a problem hiding this comment.
make_batched_images 的返回值实际是扁平的 List[ImageInput],但类型标注为 List[List[ImageInput]]。建议统一为真实返回类型,避免后续静态检查和调用方理解偏差。
| max_tokens = max_model_len - len(request["prompt_token_ids"]) | ||
| if request.get("max_tokens") is None: | ||
| request["max_tokens"] = max(1, max_tokens) | ||
| else: | ||
| request["max_tokens"] = min(max_tokens, request["max_tokens"]) |
There was a problem hiding this comment.
process_request_dict 允许 max_model_len=None,但这里直接计算 max_tokens = max_model_len - len(prompt_token_ids) 会在 max_model_len 为 None 时触发 TypeError。建议:要么将 max_model_len 设为必填参数,要么在 None 时跳过/使用默认值来计算 max_tokens。
| max_tokens = max_model_len - len(request["prompt_token_ids"]) | |
| if request.get("max_tokens") is None: | |
| request["max_tokens"] = max(1, max_tokens) | |
| else: | |
| request["max_tokens"] = min(max_tokens, request["max_tokens"]) | |
| if max_model_len is not None: | |
| max_tokens = max_model_len - len(request["prompt_token_ids"]) | |
| if request.get("max_tokens") is None: | |
| request["max_tokens"] = max(1, max_tokens) | |
| else: | |
| request["max_tokens"] = min(max_tokens, request["max_tokens"]) | |
| else: | |
| # When max_model_len is None, do not derive max_tokens from it. | |
| # If caller did not provide max_tokens, fall back to a minimal safe default. | |
| if request.get("max_tokens") is None: | |
| request["max_tokens"] = 1 |
| from fastdeploy.input.multimodal_processor import ( | ||
| ERNIE4_5_VL, | ||
| PADDLEOCR_VL, | ||
| QWEN3_VL, | ||
| QWEN_VL, | ||
| MultiModalProcessor, | ||
| ) | ||
|
|
||
| self.processor = Ernie4_5_VLProcessor( | ||
| model_name_or_path=self.model_name_or_path, | ||
| limit_mm_per_prompt=self.limit_mm_per_prompt, | ||
| mm_processor_kwargs=self.mm_processor_kwargs, | ||
| reasoning_parser_obj=reasoning_parser_obj, | ||
| tool_parser_obj=tool_parser_obj, | ||
| enable_processor_cache=self.enable_processor_cache, | ||
| ) | ||
| if ErnieArchitectures.contains_ernie_arch(architecture): | ||
| model_type = ERNIE4_5_VL | ||
| elif "PaddleOCRVL" in architecture: | ||
| if not envs.ENABLE_V1_DATA_PROCESSOR: | ||
| from fastdeploy.input.paddleocr_vl_processor import ( | ||
| PaddleOCRVLProcessor, | ||
| ) | ||
| else: | ||
| from fastdeploy.input.v1.paddleocr_vl_processor import ( | ||
| PaddleOCRVLProcessor, | ||
| ) | ||
|
|
||
| self.processor = PaddleOCRVLProcessor( | ||
| config=self.model_config, | ||
| model_name_or_path=self.model_name_or_path, | ||
| limit_mm_per_prompt=self.limit_mm_per_prompt, | ||
| mm_processor_kwargs=self.mm_processor_kwargs, | ||
| reasoning_parser_obj=reasoning_parser_obj, | ||
| ) | ||
| model_type = PADDLEOCR_VL | ||
| elif "Qwen2_5_VL" in architecture: | ||
| if not envs.ENABLE_V1_DATA_PROCESSOR: | ||
| from fastdeploy.input.qwen_vl_processor import QwenVLProcessor | ||
| else: | ||
| from fastdeploy.input.v1.qwen_vl_processor import ( | ||
| QwenVLProcessor, | ||
| ) | ||
|
|
||
| self.processor = QwenVLProcessor( | ||
| config=self.model_config, | ||
| model_name_or_path=self.model_name_or_path, | ||
| limit_mm_per_prompt=self.limit_mm_per_prompt, | ||
| mm_processor_kwargs=self.mm_processor_kwargs, | ||
| reasoning_parser_obj=reasoning_parser_obj, | ||
| enable_processor_cache=self.enable_processor_cache, | ||
| ) | ||
| model_type = QWEN_VL | ||
| elif "Qwen3VL" in architecture: | ||
| if not envs.ENABLE_V1_DATA_PROCESSOR: | ||
| from fastdeploy.input.qwen3_vl_processor import Qwen3VLProcessor | ||
| else: | ||
| from fastdeploy.input.v1.qwen3_vl_processor import ( | ||
| Qwen3VLProcessor, | ||
| ) | ||
|
|
||
| self.processor = Qwen3VLProcessor( | ||
| config=self.model_config, | ||
| model_name_or_path=self.model_name_or_path, | ||
| limit_mm_per_prompt=self.limit_mm_per_prompt, | ||
| mm_processor_kwargs=self.mm_processor_kwargs, | ||
| reasoning_parser_obj=reasoning_parser_obj, | ||
| enable_processor_cache=self.enable_processor_cache, | ||
| ) | ||
| model_type = QWEN3_VL | ||
| else: | ||
| raise ValueError(f"Unsupported model processor architecture: {architecture}. ") | ||
|
|
||
| self.processor = MultiModalProcessor( | ||
| model_name_or_path=self.model_name_or_path, | ||
| model_type=model_type, | ||
| config=self.model_config, | ||
| limit_mm_per_prompt=self.limit_mm_per_prompt, | ||
| mm_processor_kwargs=self.mm_processor_kwargs, | ||
| reasoning_parser_obj=reasoning_parser_obj, | ||
| tool_parser_obj=tool_parser_obj, | ||
| enable_processor_cache=self.enable_processor_cache, | ||
| ) |
There was a problem hiding this comment.
InputPreprocessor.create_processor() 的多模态分支现在统一走 MultiModalProcessor,但现有单测(tests/input/test_preprocess.py)只覆盖了非 MM 分支和“不支持架构”异常分支,未覆盖 MM 架构路由(ERNIE/PaddleOCR/Qwen/Qwen3)以及传参(limit_mm_per_prompt/mm_processor_kwargs 等)。建议补充对应单测,防止后续重构导致路由回归。
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7109 +/- ##
==========================================
Coverage ? 73.07%
==========================================
Files ? 408
Lines ? 56839
Branches ? 9001
==========================================
Hits ? 41535
Misses ? 12368
Partials ? 2936
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.