Add Qwen3VL MCore Export support from PR 895#1482
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds bidirectional Megatron Core ↔ Hugging Face weight mappings for Qwen3-VL, registers them as a plugin, includes tests validating import/export symmetry and prefix rules, and updates changelog and deployment docs for Qwen 3‑VL support. ChangesQwen3-VL Megatron Core Integration
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.rst`:
- Line 136: Move the changelog entry "Add Megatron Core export/import mapping
for Qwen3-VL (``Qwen3VLForConditionalGeneration``) vision-language models..."
out of the 0.42 (2026-03-10) section and place it under the current
unreleased/0.45 section header in CHANGELOG.rst, preserving the existing
formatting and inline code markup; ensure you remove the duplicate from 0.42 and
verify the entry appears exactly once under the 0.45 (unreleased) section.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7f30ba6d-8de4-4386-b197-7f8189e56d24
📒 Files selected for processing (5)
CHANGELOG.rstdocs/source/deployment/3_unified_hf.rstmodelopt/torch/export/plugins/mcore_common.pymodelopt/torch/export/plugins/mcore_qwen3vl.pytests/gpu_megatron/torch/export/test_mcore_qwen3vl.py
|
|
/claude review |
Claude Review SummarySmall, additive PR that clones the Qwen3 mcore mapping with Findings
Most impactful
Risk: Low-to-moderate. Code is purely additive, no existing arch behavior changes. Worst case is a broken Qwen3-VL round-trip that only manifests at runtime — which is exactly why the test placement matters. |
| "local_experts.linear_fc2": NameRemapping( | ||
| "model.language_model.layers.{}.mlp.experts.{}.down_proj." | ||
| ), | ||
| } No newline at end of file |
There was a problem hiding this comment.
[IMPORTANT Compatibility] Only Qwen3VLForConditionalGeneration is registered in mcore_common.py, but this mapping intentionally includes MoE rules (router, local_experts.linear_fc1, local_experts.linear_fc2) "for Qwen3-VL MoE variants like 30B-A3B" (see line 81). Following the Qwen3 precedent (Qwen3ForCausalLM + Qwen3MoeForCausalLM are both registered to the same dict in mcore_common.py:55-56), the MoE arch (likely Qwen3VLMoeForConditionalGeneration) needs its own entry — otherwise the dense arch lookup will fail and MoE checkpoints won't dispatch here. Either:
- Add
"Qwen3VLMoeForConditionalGeneration": qwen3vl_causal_lm_{import,export}tomcore_common.py, or - If MoE support is out of scope for this PR, drop the MoE-only entries (
router,local_experts.*) here and the comment on line 81 to avoid implying support that isn't wired up.
Right now the MoE rules are dead code.
|
|
||
| from modelopt.torch.export.plugins.mcore_custom import ( | ||
| COL_TP, | ||
| REPLICATE, |
There was a problem hiding this comment.
[SUGGESTION] This test file only inspects the static dict mappings — no GPU, no Megatron-Core, and no actual model are required to run it. mcore_custom.py and mcore_qwen3vl.py import megatron lazily via import_plugin("megatron") so the imports work without it. Placing this in tests/gpu_megatron/ means the only signal protecting these mappings runs in the most expensive CI lane. Consider moving to tests/unit/torch/export/test_mcore_qwen3vl.py (which is what the PR description originally said) so the registration/symmetry checks run in the fast pre-merge lane.
Note also that what's tested here is essentially "the dict literal we wrote matches the dict literal we wrote" — these checks won't catch the real correctness risk, which is whether the HF key layout (especially lm_head. at root vs. model.language_model.lm_head.) actually matches the published Qwen3-VL checkpoint. A small integration test that loads a tiny config and asserts no missing/unexpected keys against an HF state-dict snapshot would be far more valuable.
| # Final layer norm | ||
| "final_layernorm": NameRemapping("model.language_model.norm.", REPLICATE), | ||
| # Output layer (lm_head is at root level, not under language_model) | ||
| "output_layer": NameRemapping("lm_head.", COL_TP), |
There was a problem hiding this comment.
[IMPORTANT Export] Worth double-checking against the published Qwen3-VL checkpoint: in recent transformers (≥4.45), several *ForConditionalGeneration VLMs (including the Qwen2.5-VL / Qwen3-VL families) moved lm_head into the inner language model — i.e. the safetensors key is model.language_model.lm_head.weight, not lm_head.weight at root. If that's the case for the Qwen3-VL release you're targeting, both output_layer mappings (here and line 98) will silently fail to find the tensor on import and write to the wrong location on export, and tie_word_embeddings interaction will also be off.
The PR description says you've round-tripped Qwen3-VL-8B-Instruct, so this may already be verified — but the Qwen3 mapping (mcore_qwen.py:35) inherited a root-level lm_head. from a different architecture pattern, and copying it without checking is the most likely place this PR could be wrong. Worth grepping the actual safetensors keys (safe_open(...).keys()) once and confirming.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1482 +/- ##
=======================================
Coverage 76.77% 76.77%
=======================================
Files 473 474 +1
Lines 51418 51422 +4
=======================================
+ Hits 39476 39481 +5
+ Misses 11942 11941 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add Megatron Core export/import mapping for Qwen3-VL (Qwen3VLForConditionalGeneration). Handles the model.language_model. weight prefix and supports both dense and MoE variants. Signed-off-by: Hung-Yueh <hungyuehc@nvidia.com> mv test_mcore_qwen3vl.py to tests/gpu_megatron/torch/export/ Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
2423ae7 to
a7d1170
Compare
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
This PR is duplicated from PR #895. Since the original branch source is not available now, we create a new branch where we can update this PR.
What does this PR do?
new feature:
Overview: Add Qwen3-VL (Vision-Language) model support to the Megatron Core export/import
plugin, enabling HuggingFace-to-mcore weight conversion for PTQ/QAT/QAD workflows
Details
Qwen3-VL has a different weight structure from Qwen3 text-only models:
This PR adds:
Qwen3VLForConditionalGeneration and Megatron Core, handling the language_model prefix for
all decoder layers, QKV merging/slicing, gated MLP merging/slicing, Q/K layer norms.
all_mcore_hf_export_mapping and all_mcore_hf_import_mapping.
Usage
Testing
covering:
prefix, lm_head. at root, QKVMerging, GatedMLPMerging, REPLICATE
for layernorms, TP sharding configs
parallel_config
prefixes
language_model. prefix, lm_head unchanged
Before your PR is "Ready for review"
tests/unit/torch/export/test_mcore_qwen3vl.pydocs/source/deployment/3_unified_hf.rstCHANGELOG.rstAdditional Information
Companion Megatron-LM PR adds Qwen3VLModel, Qwen3VLDataset, and pretrain_qwenvl.py. Please see this PR NVIDIA/Megatron-LM#3444
Summary by CodeRabbit
New Features
Documentation
Tests