feat: Support aux loss normalization in RL SFT by pthombre · Pull Request #2194 · NVIDIA-NeMo/RL

pthombre · 2026-04-02T20:44:27Z

What does this PR do ?

Remove the MoE aux loss assertion that blocked aux_loss usage with calculate_per_token_loss=True. Add moe_grad_scale_func to properly normalize MOE auxiliary loss gradients: sets scale to 1/global_valid_toks before forward-backward and clears it after, so that after DDP SUM the aux loss gradient is correctly averaged.

Also adds sft_nanov3.yaml config for nano-v3 SFT training with MoE seq_aux_loss enabled.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

copy-pr-bot · 2026-04-02T20:44:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Remove the MoE aux loss assertion that blocked aux_loss usage with calculate_per_token_loss=True. Add moe_grad_scale_func to properly normalize MOE auxiliary loss gradients: sets scale to 1/global_valid_toks before forward-backward and clears it after, so that after DDP SUM the aux loss gradient is correctly averaged. Also adds sft_nanov3.yaml config for nano-v3 SFT training with MoE seq_aux_loss enabled. Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

pthombre force-pushed the pranav/moe_loss_normalization branch from e1dcc07 to 136fa39 Compare April 2, 2026 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support aux loss normalization in RL SFT#2194

feat: Support aux loss normalization in RL SFT#2194
pthombre wants to merge 1 commit intomainfrom
pranav/moe_loss_normalization

pthombre commented Apr 2, 2026

Uh oh!

copy-pr-bot bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pthombre commented Apr 2, 2026

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant