Skip to content

Support Qwen3.5-VL (dense + MoE) via Megatron-Bridge#2075

Open
demouo wants to merge 1 commit into
THUDM:mainfrom
demouo:support_qwen35_all_vlm_megatron_bridge
Open

Support Qwen3.5-VL (dense + MoE) via Megatron-Bridge#2075
demouo wants to merge 1 commit into
THUDM:mainfrom
demouo:support_qwen35_all_vlm_megatron_bridge

Conversation

@demouo

@demouo demouo commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Support Qwen3.5-VL (dense + MoE) via NVIDIA Megatron-Bridge and solve #2073 with a standard method.

Changes:

  • Add slime_plugins/megatron_bridge/qwen3_5_vl.py to register the official Qwen35VLBridge / Qwen35VLMoEBridge by simply importing them so their @MegatronModelBridge.register_bridge decorators run.
  • Wire it into slime_plugins/megatron_bridge/__init__.py.
  • Refresh examples/geo3k_vlm/run_geo3k_qwen35.sh so a single script covers both dense (Qwen3.5-9B / 27B) and MoE (Qwen3.5-35B-A3B, ...) via MODEL_NAME, on top of the official megatron-bridge>=0.4.0 (no fork install needed).

Motivation

  • Qwen3.5-VL support is missing in slime — the only existing path is the legacy text-only slime_plugins/mbridge/qwen3_5.py, which doesn't know about the vision encoder, GDN+Gated-Attention hybrid layers, or M-RoPE.

  • NVIDIA already shipped these bridges in megatron-bridge 0.4.0 — we just needs to import them so the dispatch decorators run before AutoBridge.from_hf_pretrained. So we reuse NVIDIA's implementation rather than reimplementing on the slime side.

  • The plugin also patches mapping_registry() to add legacy transformer_layer aliases for every mtp_model_layer mapping (idempotent, no-op on newer Megatron-LM), to absorb the upstream Megatron-LM ↔ megatron-bridge module-naming drift around MTP layers:

def _add_legacy_mtp_aliases(registry):
    """Duplicate every ``mtp.*.mtp_model_layer.*`` mapping with the legacy
    Megatron-LM name ``transformer_layer``.
    The bridge's ``mapping_registry()`` returns a fresh registry on every call
    and ``MegatronMappingRegistry.__init__`` *pre-compiles* the patterns into
    ``_compiled_patterns`` / ``_reverse_patterns`` — so we cannot just append
    to ``registry.mappings``: the new entries would never be matched at
    lookup time. Instead we build a brand-new registry from the augmented
    mapping list, which lets ``__init__`` re-compile everything.
    """
    if registry is None:
        return registry
    original = list(registry.mappings)
    extra = []
    for mapping in original:
        m_param = getattr(mapping, "megatron_param", None)
        if isinstance(m_param, str) and ".mtp_model_layer." in m_param:
            alias = copy.copy(mapping)
            alias.megatron_param = m_param.replace(".mtp_model_layer.", ".transformer_layer.")
            extra.append(alias)
    if not extra:
        return registry

    cls = registry.__class__
    new_registry = cls(*original, *extra)
    return new_registry

Usage

  • Dense (default)
MODEL_NAME=Qwen3.5-9B  bash examples/geo3k_vlm/run_geo3k_qwen35.sh
MODEL_NAME=Qwen3.5-27B bash examples/geo3k_vlm/run_geo3k_qwen35.sh
  • MoE
MODEL_NAME=Qwen3.5-35B-A3B bash examples/geo3k_vlm/run_geo3k_qwen35.sh

Pass

  • AutoBridge automatically routes Qwen3.5-VL to the official bridge:
import slime_plugins.megatron_bridge   # triggers registration
from megatron.bridge import AutoBridge

bridge = AutoBridge.from_hf_pretrained("/path/to/Qwen3.5-9B", trust_remote_code=True)
print(type(bridge._model_bridge).__name__)              # Qwen35VLBridge
provider = bridge.to_megatron_provider(load_weights=False)
print(type(provider).__name__, provider.position_embedding_type, provider.vision_config is not None)
# Qwen35VLModelProvider mrope True
  • End-to-end: weights load through bridge (incl. vision encoder / GDN / MoE) and forward / rollout / actor-train all complete on Qwen3.5-9B and Qwen3.5-35B-A3B examples

…se + MoE) with MTP-naming alias and end-to-end geo3k example

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant