Skip to content

refactor: avoid requiring Megatron for model_utils import#2540

Draft
alexbowe wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
alexbowe:abowe/lazy-megatron-utils
Draft

refactor: avoid requiring Megatron for model_utils import#2540
alexbowe wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
alexbowe:abowe/lazy-megatron-utils

Conversation

@alexbowe
Copy link
Copy Markdown

@alexbowe alexbowe commented May 21, 2026

What does this PR do ?

Defers Megatron imports in nemo_rl.distributed.model_utils, so non-Megatron import paths can load the module without requiring the Megatron stack.

This is useful because model_utils is a broad utility module: importing it should not force every environment, test, or non-Megatron workflow to have Megatron available.

Changes:

  • Move Megatron imports out of model_utils module scope.
  • Keep the GPTModel type annotation available to type checkers without importing Megatron at runtime.
  • Add a subprocess regression test that blocks Megatron imports while importing model_utils.

Issues

N/A

Usage

No user-facing API change.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? No documentation update needed; this is an internal import-boundary cleanup.

Additional Information

Validation:

  • PYTHONPYCACHEPREFIX=/tmp/codex-pycache python3 -m py_compile nemo_rl/distributed/model_utils.py tests/unit/distributed/test_model_utils_imports.py
  • git diff --check
  • AIHub: PYTHONPATH=. python -m pytest --confcutdir=tests/unit/distributed tests/unit/distributed/test_model_utils_imports.py -q passed: 1 passed in 1.52s

Claude adversarial review found no blockers.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@alexbowe alexbowe force-pushed the abowe/lazy-megatron-utils branch from 6d3ea98 to a282964 Compare May 21, 2026 09:34
@alexbowe alexbowe changed the title Avoid requiring Megatron for model_utils import refactor: avoid requiring Megatron for model_utils import May 21, 2026
Signed-off-by: alexbowe <abowe@nvidia.com>
@alexbowe alexbowe force-pushed the abowe/lazy-megatron-utils branch from a282964 to 060555d Compare May 22, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant