fix: parse multimodal tool messages by CUHKSZzxy · Pull Request #4680 · InternLM/lmdeploy

CUHKSZzxy · 2026-06-16T04:00:57Z

Summary

Parse multimodal content in tool messages instead of only user messages.
Preserve tool-message metadata such as tool_call_id while converting multimodal parts.
Add regression coverage for a tool-result image payload and multimodal input detection.

Validation

Diff whitespace check passed.
Syntax compilation passed for touched Python files.
Replayed the original multimodal tool-result chat-completions payload against a real Qwen3.5 LMDeploy server and confirmed it completes normally instead of returning a prompt-processing error.

Assistance

Assisted with Codex + GPT-5.5 xHigh Fast

Copilot

Pull request overview

This PR updates LMDeploy’s multimodal message preprocessing so that multimodal tool messages (e.g., tool results containing image parts) are parsed the same way as multimodal user messages, while preserving tool-message metadata needed for tool-call correlation.

Changes:

Extend MultimodalProcessor._parse_multimodal_item to parse multimodal tool messages (not just user) and preserve message metadata when rewriting content.
Refactor multimodal-input detection to use a shared MULTIMODAL_TYPES constant and broaden detection to include tool messages.
Add regression tests for tool-result image parsing and tool-role multimodal detection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`lmdeploy/serve/processors/multimodal.py`	Parses multimodal parts in `tool` messages and centralizes supported multimodal type detection.
`tests/test_lmdeploy/test_content_merge.py`	Adds regression coverage for tool-result image payload parsing and tool-role multimodal input detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CUHKSZzxy · 2026-06-16T04:29:11Z

+    assert parsed[2]['content'][1]['type'] == Modality.IMAGE
+    assert parsed[2]['content'][1]['data'] == f'loaded:{image_data_url}'
+    assert parsed[2]['content'][1]['detail'] == 'auto'


This part does not apply to LMDeploy: Modality overrides __eq__ to compare against strings, so both "image" == Modality.IMAGE and Modality.IMAGE == "image" evaluate true. The parser does store modality.value, but the assertion against Modality.IMAGE is valid in this repo. I only addressed the reasonable test-surface issue by removing the heavyweight VisionModel import/collector assertion.

fix: parse multimodal tool messages

c56414e

Copilot AI review requested due to automatic review settings June 16, 2026 04:00

Copilot started reviewing on behalf of CUHKSZzxy June 16, 2026 04:01 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

lvhan028 added the Bug:P1 label Jun 16, 2026

test: avoid heavyweight vision import

0d3bc74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parse multimodal tool messages#4680

fix: parse multimodal tool messages#4680
CUHKSZzxy wants to merge 2 commits into
InternLM:mainfrom
CUHKSZzxy:fix/mm_parsing

CUHKSZzxy commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

CUHKSZzxy Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CUHKSZzxy commented Jun 16, 2026

Summary

Validation

Assistance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

CUHKSZzxy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants