refactor(dpo): migrate DPOConfig, DPOSaveState, DPOValMetrics to Base… by NolenLiang · Pull Request #2524 · NVIDIA-NeMo/RL

NolenLiang · 2026-05-19T02:03:21Z

…Model

Convert 3 TypedDict classes to pydantic BaseModel with extra="allow":

DPOConfig: 14 required fields, used by dpo_train and setup
DPOSaveState: 5 fields with defaults, checkpoint serialization
DPOValMetrics: 9 required fields, validation metrics

Update all dict-style access to attribute access in dpo.py. Wrap model_construct dpo dict in test_dpo.py with DPOConfig.model_construct(). Fix BaseModel-incompatible patterns: .items() → .model_dump().items(), "key in obj" → hasattr(obj, key).

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Results before / after the changes

copy-pr-bot · 2026-05-19T02:03:25Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

NolenLiang · 2026-05-19T02:04:22Z

/ok to test 3ae126b

NolenLiang · 2026-05-19T02:53:03Z

/ok to test 97a03ea

…Model Convert 3 TypedDict classes to pydantic BaseModel with extra="allow": - DPOConfig: 14 required fields, used by dpo_train and setup - DPOSaveState: 5 fields with defaults, checkpoint serialization - DPOValMetrics: 9 required fields, validation metrics Update all dict-style access to attribute access in dpo.py. Wrap model_construct dpo dict in test_dpo.py with DPOConfig.model_construct(). Fix BaseModel-incompatible patterns: .items() → .model_dump().items(), "key in obj" → hasattr(obj, key). Signed-off-by: nliang <nliang@nvidia.com>

- DPOLossFn on main expects cfg["key"] (TypedDict). Pass master_config.dpo.model_dump() to maintain compatibility until PR2 (loss BaseModel migration) is merged. - Convert checkpoint dict to DPOSaveState on load (same pattern as GRPO checkpoint fix). Signed-off-by: nliang <nliang@nvidia.com>

logger.log_metrics() calls metrics.items() internally. DPOValMetrics is now BaseModel which lacks .items(). Use .model_dump() to convert. Signed-off-by: nliang <nliang@nvidia.com>

NolenLiang · 2026-05-19T06:06:40Z

/ok to test 0e50ce6

NolenLiang · 2026-05-19T06:31:50Z

/ok to test 0e50ce6

NolenLiang · 2026-05-19T09:07:18Z

/ok to test e57928a

val_metrics returned by validate() is a plain dict, not RMValMetrics BaseModel, so .model_dump() fails. Same fix as DPO (PR #2524). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nliang <nliang@nvidia.com>

… DistillationLossFn guard 1. Revert DraftCrossEntropyLossConfig to TypedDict (unused, not loaded from config) 2. Remove isinstance(cfg, dict) guard from DistillationLossFn.__init__ and update all callers to pass DistillationLossConfig directly 3. Keep DPOLossFn guard for now (dpo.py still passes dict on this branch; will remove after DPO PR #2524 merges) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nliang <nliang@nvidia.com>

- Convert DPOLossConfig from TypedDict to BaseModel with defaults - DPOLossFn.__init__ uses attribute access directly (no isinstance guard) - dpo.py passes master_config.dpo directly (no .model_dump() roundtrip) - Update test_loss_functions.py to pass DPOLossConfig(...) instead of dict Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nliang <nliang@nvidia.com>

NolenLiang · 2026-05-21T09:54:32Z

/ok to test 71a72f1

Missed indirect caller of DPOLossFn in tests/unit/models/policy/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nliang <nliang@nvidia.com>

NolenLiang · 2026-05-21T12:51:10Z

/ok to test 69658f2

NolenLiang requested review from a team as code owners May 19, 2026 02:03

NolenLiang added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label May 19, 2026

copy-pr-bot Bot temporarily deployed to public May 19, 2026 02:04 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 19, 2026 02:04 Failure

copy-pr-bot Bot temporarily deployed to public May 19, 2026 02:04 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 02:05 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 02:09 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 02:53 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 19, 2026 02:53 Failure

copy-pr-bot Bot temporarily deployed to public May 19, 2026 02:53 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 02:57 Inactive

NolenLiang added 3 commits May 18, 2026 23:03

fix: pass dict to logger.log_metrics for DPOValMetrics

0e50ce6

logger.log_metrics() calls metrics.items() internally. DPOValMetrics is now BaseModel which lacks .items(). Use .model_dump() to convert. Signed-off-by: nliang <nliang@nvidia.com>

NolenLiang force-pushed the nliang/typeddict-to-basemodel-dpo branch from 8b20cce to 0e50ce6 Compare May 19, 2026 06:03

copy-pr-bot Bot temporarily deployed to public May 19, 2026 06:06 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 19, 2026 06:07 Error

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 06:07 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 06:07 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 06:11 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 08:11 Inactive

NolenLiang added CI:L1 Run doctests, unit tests, and functional tests and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels May 19, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 09:07 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 09:32 Inactive

NolenLiang mentioned this pull request May 19, 2026

refactor(loss): migrate DPOLossConfig, DistillationLossConfig, DraftC… #2520

Open

4 tasks

copy-pr-bot Bot temporarily deployed to nemo-ci May 19, 2026 13:40 Inactive

NolenLiang requested a review from a team as a code owner May 21, 2026 09:42

copy-pr-bot Bot temporarily deployed to public May 21, 2026 09:54 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 21, 2026 09:55 Inactive

copy-pr-bot Bot temporarily deployed to public May 21, 2026 09:55 Inactive

copy-pr-bot Bot temporarily deployed to public May 21, 2026 09:58 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 21, 2026 10:26 Inactive

fix: update test_megatron_worker to pass DPOLossConfig instead of dict

69658f2

Missed indirect caller of DPOLossFn in tests/unit/models/policy/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nliang <nliang@nvidia.com>

copy-pr-bot Bot temporarily deployed to public May 21, 2026 12:51 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 21, 2026 12:51 Inactive

copy-pr-bot Bot temporarily deployed to public May 21, 2026 12:51 Inactive

copy-pr-bot Bot temporarily deployed to public May 21, 2026 12:56 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 21, 2026 13:23 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 21, 2026 15:06 Inactive

NolenLiang requested review from jinglinglingling and yuki-97 May 22, 2026 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(dpo): migrate DPOConfig, DPOSaveState, DPOValMetrics to Base…#2524

refactor(dpo): migrate DPOConfig, DPOSaveState, DPOValMetrics to Base…#2524
NolenLiang wants to merge 7 commits into
mainfrom
nliang/typeddict-to-basemodel-dpo

NolenLiang commented May 19, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 21, 2026

Uh oh!

NolenLiang commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NolenLiang commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Results before / after the changes

Uh oh!

copy-pr-bot Bot commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 19, 2026

Uh oh!

NolenLiang commented May 21, 2026

Uh oh!

NolenLiang commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NolenLiang commented May 19, 2026 •

edited

Loading