fix: parse multimodal tool messages#4680
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates LMDeploy’s multimodal message preprocessing so that multimodal tool messages (e.g., tool results containing image parts) are parsed the same way as multimodal user messages, while preserving tool-message metadata needed for tool-call correlation.
Changes:
- Extend
MultimodalProcessor._parse_multimodal_itemto parse multimodaltoolmessages (not justuser) and preserve message metadata when rewriting content. - Refactor multimodal-input detection to use a shared
MULTIMODAL_TYPESconstant and broaden detection to include tool messages. - Add regression tests for tool-result image parsing and tool-role multimodal detection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
lmdeploy/serve/processors/multimodal.py |
Parses multimodal parts in tool messages and centralizes supported multimodal type detection. |
tests/test_lmdeploy/test_content_merge.py |
Adds regression coverage for tool-result image payload parsing and tool-role multimodal input detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert parsed[2]['content'][1]['type'] == Modality.IMAGE | ||
| assert parsed[2]['content'][1]['data'] == f'loaded:{image_data_url}' | ||
| assert parsed[2]['content'][1]['detail'] == 'auto' |
There was a problem hiding this comment.
This part does not apply to LMDeploy: Modality overrides __eq__ to compare against strings, so both "image" == Modality.IMAGE and Modality.IMAGE == "image" evaluate true. The parser does store modality.value, but the assertion against Modality.IMAGE is valid in this repo. I only addressed the reasonable test-surface issue by removing the heavyweight VisionModel import/collector assertion.
Summary
toolmessages instead of onlyusermessages.tool_call_idwhile converting multimodal parts.Validation
Assistance
Assisted with Codex + GPT-5.5 xHigh Fast