Skip to content

feat: Nemotron Nano-v3 pipeline parallelism#1298

Open
prestonfu wants to merge 4 commits intoNVIDIA-NeMo:zhiyul/llm-optimization-workshopfrom
prestonfu:prestonfu/a1
Open

feat: Nemotron Nano-v3 pipeline parallelism#1298
prestonfu wants to merge 4 commits intoNVIDIA-NeMo:zhiyul/llm-optimization-workshopfrom
prestonfu:prestonfu/a1

Conversation

@prestonfu
Copy link
Copy Markdown

@prestonfu prestonfu commented Feb 16, 2026

What does this PR do?

Single-node pipeline parallelism for Nemotron NanoV3 30B.

Changelog

  • parallelizer.py: Unpack ModuleList/ModuleDict in layer extraction
  • functional.py:
    • Support backbone.* model structure (vs model.*).
    • Add stage_model.to_empty(device=device) to enable devicce storage for buffers such as e_score_correction_bias in MoE, which are otherwise on CPU.
  • hf_utils.py: Support backbone and backbone.embeddings (vs embed_tokens)
  • flops_utils.py: An (incorrect) attempt to calibrate Mamba2 SSM FLOPs
  • train_ft.py:
    • Pass trust_remote_code to AutoConfig
    • Add checkpoint.enabled
    • Add MFU and nsys support

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@prestonfu prestonfu changed the title Prestonfu/a1 Nemotron pipeline parallelism Feb 16, 2026
@prestonfu prestonfu changed the title Nemotron pipeline parallelism Nemotron Nano-v3 pipeline parallelism Feb 16, 2026
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Feb 18, 2026
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Feb 19, 2026

@ZhiyuLi-Nvidia can you take a look? Thank you

@akoumpa akoumpa removed the needs-follow-up Issue needs follow-up label Feb 19, 2026
@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor

Hi, @prestonfu thanks a lot for contribution. I am just curious why you want to merge into this dev branch NVIDIA-NeMo:zhiyul/llm-optimization-workshop, which is for UCB homework only.
Are you interested in contributing into main branch instead?

@akoumpa akoumpa changed the title Nemotron Nano-v3 pipeline parallelism feat: Nemotron Nano-v3 pipeline parallelism Feb 26, 2026
@chtruong814 chtruong814 added waiting-for-customer waiting-on-customer Waiting on the original author to respond and removed waiting-for-customer labels Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request waiting-on-customer Waiting on the original author to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants