Skip to content

[Feature] Domino EP support and training optimizations for InternS1 Pro VL#1528

Open
tina-wen wants to merge 3 commits intoInternLM:mainfrom
tina-wen:split_bal_loss
Open

[Feature] Domino EP support and training optimizations for InternS1 Pro VL#1528
tina-wen wants to merge 3 commits intoInternLM:mainfrom
tina-wen:split_bal_loss

Conversation

@tina-wen
Copy link

@tina-wen tina-wen commented Mar 3, 2026

Description

This PR optimizes InternS1 Pro VL model training with three key changes:

  • Domino EP: Add support for domino_ep parallelism
  • Less d2h: Remove redundant transfers (loss/grad_norm logs only, no accuracy impact)
  • Layer-wise MoE loss: Split expert balance loss computation to reduce memory

Results: Performance ↑, Memory ↓, Accuracy unchanged

MOE_EP_COMPILE_CFG = MOE_NON_EP_COMPILE_CFG.copy()
MOE_EP_COMPILE_CFG.pop("xtuner.v1.module.decoder_layer.moe_decoder_layer.MoEDecoderLayer.forward")

class _AllReduce(torch.autograd.Function):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants