feat: Nemotron Nano-v3 pipeline parallelism by prestonfu · Pull Request #1298 · NVIDIA-NeMo/Automodel

prestonfu · 2026-02-16T21:27:56Z

What does this PR do?

Single-node pipeline parallelism for Nemotron NanoV3 30B.

Changelog

parallelizer.py: Unpack ModuleList/ModuleDict in layer extraction
functional.py:
- Support backbone.* model structure (vs model.*).
- Add stage_model.to_empty(device=device) to enable devicce storage for buffers such as e_score_correction_bias in MoE, which are otherwise on CPU.
hf_utils.py: Support backbone and backbone.embeddings (vs embed_tokens)
flops_utils.py: An (incorrect) attempt to calibrate Mamba2 SSM FLOPs
train_ft.py:
- Pass trust_remote_code to AutoConfig
- Add checkpoint.enabled
- Add MFU and nsys support

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

copy-pr-bot · 2026-02-16T21:28:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-02-19T14:59:49Z

@ZhiyuLi-Nvidia can you take a look? Thank you

ZhiyuLi-Nvidia · 2026-02-19T16:01:37Z

Hi, @prestonfu thanks a lot for contribution. I am just curious why you want to merge into this dev branch NVIDIA-NeMo:zhiyul/llm-optimization-workshop, which is for UCB homework only.
Are you interested in contributing into main branch instead?

prestonfu added 3 commits February 6, 2026 01:00

baseline

fd5ffe6

add

88b43ef

cleanup

60dcad1

prestonfu requested review from HuiyingLi, adil-a, akoumpa, hemildesai, shan-nvidia and ybabakhin as code owners February 16, 2026 21:27

github-actions Bot added the community-request label Feb 16, 2026

prestonfu changed the title ~~Prestonfu/a1~~ Nemotron pipeline parallelism Feb 16, 2026

prestonfu changed the title ~~Nemotron pipeline parallelism~~ Nemotron Nano-v3 pipeline parallelism Feb 16, 2026

add trainft support

d1bc2e4

chtruong814 added the needs-follow-up Issue needs follow-up label Feb 18, 2026

akoumpa removed the needs-follow-up Issue needs follow-up label Feb 19, 2026

akoumpa changed the title ~~Nemotron Nano-v3 pipeline parallelism~~ feat: Nemotron Nano-v3 pipeline parallelism Feb 26, 2026

chtruong814 added waiting-for-customer waiting-on-customer Waiting on the original author to respond and removed waiting-for-customer labels Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Nemotron Nano-v3 pipeline parallelism#1298

feat: Nemotron Nano-v3 pipeline parallelism#1298
prestonfu wants to merge 4 commits intoNVIDIA-NeMo:zhiyul/llm-optimization-workshopfrom
prestonfu:prestonfu/a1

prestonfu commented Feb 16, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Feb 16, 2026

Uh oh!

akoumpa commented Feb 19, 2026

Uh oh!

ZhiyuLi-Nvidia commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

prestonfu commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Feb 16, 2026

Uh oh!

akoumpa commented Feb 19, 2026

Uh oh!

ZhiyuLi-Nvidia commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

prestonfu commented Feb 16, 2026 •

edited

Loading