refactor(loss): migrate DPOLossConfig, DistillationLossConfig, DraftC… by NolenLiang · Pull Request #2520 · NVIDIA-NeMo/RL

NolenLiang · 2026-05-18T14:30:15Z

…rossEntropyLossConfig to BaseModel

Convert 3 TypedDict loss config classes to pydantic BaseModel with extra="allow". Update dict-style access (cfg["key"]) to attribute access (cfg.key) in their init methods.

DPOLossConfig: 5 required fields, used by DPOLossFn
DistillationLossConfig: 3 required fields, used by DistillationLossFn
DraftCrossEntropyLossConfig: 1 optional field (arbitrary_types_allowed for ProcessGroup)

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

…rossEntropyLossConfig to BaseModel Convert 3 TypedDict loss config classes to pydantic BaseModel with extra="allow". Update dict-style access (cfg["key"]) to attribute access (cfg.key) in their __init__ methods. - DPOLossConfig: 5 required fields, used by DPOLossFn - DistillationLossConfig: 3 required fields, used by DistillationLossFn - DraftCrossEntropyLossConfig: 1 optional field (arbitrary_types_allowed for ProcessGroup) Signed-off-by: nliang <nliang@nvidia.com>

copy-pr-bot · 2026-05-18T14:30:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

NolenLiang · 2026-05-18T14:31:03Z

/ok to test c340a3b

Callers (dpo.py, tests) pass DPOConfig dict or plain dict to DPOLossFn, not DPOLossConfig BaseModel. Add isinstance check to auto-convert dict to BaseModel, maintaining backward compatibility. Same fix for DistillationLossFn. Signed-off-by: nliang <nliang@nvidia.com>

NolenLiang · 2026-05-18T15:53:20Z

/ok to test b5427ff

NolenLiang · 2026-05-18T15:57:00Z

/ok to test b5427ff

Add defaults to DPOLossConfig and DistillationLossConfig fields matching the reference configs (dpo.yaml, distillation_math.yaml). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nliang <nliang@nvidia.com>

NolenLiang · 2026-05-19T06:44:23Z

/ok to test 857f01e

NolenLiang · 2026-05-19T06:56:57Z

/ok to test 5178546

yuki-97 · 2026-05-19T10:01:57Z

+    def __init__(self, cfg: DPOLossConfig | dict, use_linear_ce_fusion: bool = False):
+        if isinstance(cfg, dict):
+            cfg = DPOLossConfig(**cfg)


let's not use this tricky way and fix the places that fail because of this.

Suggested change

def __init__(self, cfg: DPOLossConfig | dict, use_linear_ce_fusion: bool = False):

if isinstance(cfg, dict):

cfg = DPOLossConfig(**cfg)

def __init__(self, cfg: DPOLossConfig, use_linear_ce_fusion: bool = False):

On this branch dpo.py is not modified — DPOLossFn(master_config.dpo) at dpo.py:270 still passes a plain dict (DPOConfig is still a TypedDict here). Removing the guard now would break the L1 functional test. Will remove it once DPO PR #2524 merges.

yuki-97 · 2026-05-19T10:02:07Z

+    def __init__(self, cfg: DistillationLossConfig | dict):
+        if isinstance(cfg, dict):
+            cfg = DistillationLossConfig(**cfg)


same as https://github.com/NVIDIA-NeMo/RL/pull/2520/changes#r3265414804

Done. Removed the guard and updated all callers in test_loss_functions.py and test_distillation.py to pass DistillationLossConfig(...) directly.

yuki-97 · 2026-05-19T10:10:46Z

+class DraftCrossEntropyLossConfig(BaseModel, extra="allow"):
+    model_config = {"arbitrary_types_allowed": True}
+    vocab_parallel_group: Optional[torch.distributed.ProcessGroup] = None


I think let's not change this for now, actually DraftCrossEntropyLossConfig is not used and not set from the config.

Suggested change

class DraftCrossEntropyLossConfig(BaseModel, extra="allow"):

model_config = {"arbitrary_types_allowed": True}

vocab_parallel_group: Optional[torch.distributed.ProcessGroup] = None

class DraftCrossEntropyLossConfig(TypedDict):

vocab_parallel_group: Optional[torch.distributed.ProcessGroup]

Done. Reverted to TypedDict since it is unused and not loaded from configuration files.

… DistillationLossFn guard 1. Revert DraftCrossEntropyLossConfig to TypedDict (unused, not loaded from config) 2. Remove isinstance(cfg, dict) guard from DistillationLossFn.__init__ and update all callers to pass DistillationLossConfig directly 3. Keep DPOLossFn guard for now (dpo.py still passes dict on this branch; will remove after DPO PR #2524 merges) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nliang <nliang@nvidia.com>

NolenLiang · 2026-05-20T05:48:08Z

/ok to test fd0af2d

NolenLiang requested a review from a team as a code owner May 18, 2026 14:30

NolenLiang added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label May 18, 2026

copy-pr-bot Bot temporarily deployed to public May 18, 2026 14:31 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 18, 2026 14:31 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci May 18, 2026 14:31 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 14:31 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 14:36 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 15:57 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 18, 2026 15:57 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 15:57 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 16:02 Inactive

NolenLiang added CI:L1 Run doctests, unit tests, and functional tests and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels May 19, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 01:41 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 02:06 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 04:00 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 06:44 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 19, 2026 06:44 Error

copy-pr-bot Bot temporarily deployed to public May 19, 2026 06:45 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 06:49 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 06:57 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 06:57 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 07:01 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 07:27 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 09:08 Inactive

yuki-97 reviewed May 19, 2026

View reviewed changes

NolenLiang force-pushed the nliang/typeddict-to-basemodel-loss branch from 5178546 to fd0af2d Compare May 20, 2026 05:14

NolenLiang requested a review from a team as a code owner May 20, 2026 05:14

copy-pr-bot Bot temporarily deployed to public May 20, 2026 05:48 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 20, 2026 05:48 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 05:48 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 05:52 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 20, 2026 06:20 Inactive

Conversation

NolenLiang commented May 18, 2026

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented May 18, 2026

Uh oh!

NolenLiang commented May 18, 2026

Uh oh!

NolenLiang commented May 18, 2026

Uh oh!

NolenLiang commented May 18, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

yuki-97 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

NolenLiang May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuki-97 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

NolenLiang May 20, 2026

Choose a reason for hiding this comment

Uh oh!

yuki-97 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

NolenLiang May 20, 2026

Choose a reason for hiding this comment

Uh oh!

NolenLiang commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NolenLiang May 19, 2026 •

edited

Loading