Skip to content

feat: Add secondary Nemotron perf/diagnostic patches (Liger, TP CE, FLOPs, benchmark)#1307

Open
jasont314 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
jasont314:pr2-nemotron-secondary-separate
Open

feat: Add secondary Nemotron perf/diagnostic patches (Liger, TP CE, FLOPs, benchmark)#1307
jasont314 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
jasont314:pr2-nemotron-secondary-separate

Conversation

@jasont314
Copy link
Copy Markdown

Adds auxiliary optimization and reporting updates separated from the core PP/EP/SQuAD correctness path.

What does this PR do ?

Introduces secondary Nemotron performance/diagnostic improvements (Liger integration path, TP CE handling, FLOPs accounting updates, and benchmark instrumentation) without changing the core PP/EP/SQuAD integration PR scope.

Changelog

  • _transformers/kernel_patches.py
    • Add NemotronH-specific Liger patch path with optional RMSNorm and CE patch toggles.
  • components/loss/masked_ce.py
    • Improve TP-aware masked CE behavior used in Nemotron runs.
  • components/utils/flops_utils.py
    • Update FLOPs computation/reporting for better diagnostic fidelity.
  • recipes/llm/benchmark.py
    • Add benchmark-side instrumentation/reporting improvements for run analysis.
  • components/distributed/init_utils.py
    • Minor distributed init/reporting update used by benchmark diagnostics.
  • NEMOTRON_SECONDARY_PATCH_NOTES.md
    • Add implementation notes and usage context for this secondary patch set.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)
  • Validation run:
    • python -m pytest -q tests/unit_tests/_transformers/test_auto_model.py tests/unit_tests/loss/test_masked_ce.py tests/unit_tests/utils/test_flops_utils.py tests/unit_tests/recipes/llm/test_benchmark.py
    • Result: 83 passed, 7 warnings

…benchmark)

Adds auxiliary optimization and reporting updates separated from core PP/EP/SQuAD correctness path.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Feb 19, 2026
@akoumpa akoumpa changed the title Add secondary Nemotron perf/diagnostic patches (Liger, TP CE, FLOPs, benchmark) feat: Add secondary Nemotron perf/diagnostic patches (Liger, TP CE, FLOPs, benchmark) Feb 19, 2026
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Feb 19, 2026

/ok to test 032a69a

@akoumpa akoumpa removed the needs-follow-up Issue needs follow-up label Feb 19, 2026
@svcnvidia-nemo-ci svcnvidia-nemo-ci added waiting-on-maintainers Waiting on maintainers to respond and removed needs-follow-up Issue needs follow-up labels Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request waiting-on-maintainers Waiting on maintainers to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants