Skip to content

refactor(dpo): migrate DPOConfig, DPOSaveState, DPOValMetrics to Base…#2524

Open
NolenLiang wants to merge 7 commits into
mainfrom
nliang/typeddict-to-basemodel-dpo
Open

refactor(dpo): migrate DPOConfig, DPOSaveState, DPOValMetrics to Base…#2524
NolenLiang wants to merge 7 commits into
mainfrom
nliang/typeddict-to-basemodel-dpo

Conversation

@NolenLiang
Copy link
Copy Markdown
Contributor

@NolenLiang NolenLiang commented May 19, 2026

…Model

Convert 3 TypedDict classes to pydantic BaseModel with extra="allow":

  • DPOConfig: 14 required fields, used by dpo_train and setup
  • DPOSaveState: 5 fields with defaults, checkpoint serialization
  • DPOValMetrics: 9 required fields, validation metrics

Update all dict-style access to attribute access in dpo.py. Wrap model_construct dpo dict in test_dpo.py with DPOConfig.model_construct(). Fix BaseModel-incompatible patterns: .items() → .model_dump().items(), "key in obj" → hasattr(obj, key).

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Results before / after the changes

image

@NolenLiang NolenLiang requested review from a team as code owners May 19, 2026 02:03
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@NolenLiang NolenLiang added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label May 19, 2026
@NolenLiang
Copy link
Copy Markdown
Contributor Author

/ok to test 3ae126b

@NolenLiang
Copy link
Copy Markdown
Contributor Author

/ok to test 97a03ea

…Model

Convert 3 TypedDict classes to pydantic BaseModel with extra="allow":
- DPOConfig: 14 required fields, used by dpo_train and setup
- DPOSaveState: 5 fields with defaults, checkpoint serialization
- DPOValMetrics: 9 required fields, validation metrics

Update all dict-style access to attribute access in dpo.py.
Wrap model_construct dpo dict in test_dpo.py with DPOConfig.model_construct().
Fix BaseModel-incompatible patterns: .items() → .model_dump().items(),
"key in obj" → hasattr(obj, key).

Signed-off-by: nliang <nliang@nvidia.com>
- DPOLossFn on main expects cfg["key"] (TypedDict). Pass
  master_config.dpo.model_dump() to maintain compatibility until
  PR2 (loss BaseModel migration) is merged.
- Convert checkpoint dict to DPOSaveState on load (same pattern
  as GRPO checkpoint fix).

Signed-off-by: nliang <nliang@nvidia.com>
logger.log_metrics() calls metrics.items() internally. DPOValMetrics
is now BaseModel which lacks .items(). Use .model_dump() to convert.

Signed-off-by: nliang <nliang@nvidia.com>
@NolenLiang NolenLiang force-pushed the nliang/typeddict-to-basemodel-dpo branch from 8b20cce to 0e50ce6 Compare May 19, 2026 06:03
@NolenLiang
Copy link
Copy Markdown
Contributor Author

/ok to test 0e50ce6

@NolenLiang
Copy link
Copy Markdown
Contributor Author

/ok to test 0e50ce6

@NolenLiang NolenLiang added CI:L1 Run doctests, unit tests, and functional tests and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels May 19, 2026
@NolenLiang
Copy link
Copy Markdown
Contributor Author

/ok to test e57928a

NolenLiang added a commit that referenced this pull request May 19, 2026
val_metrics returned by validate() is a plain dict, not RMValMetrics
BaseModel, so .model_dump() fails. Same fix as DPO (PR #2524).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: nliang <nliang@nvidia.com>
NolenLiang added a commit that referenced this pull request May 20, 2026
… DistillationLossFn guard

1. Revert DraftCrossEntropyLossConfig to TypedDict (unused, not loaded from config)
2. Remove isinstance(cfg, dict) guard from DistillationLossFn.__init__
   and update all callers to pass DistillationLossConfig directly
3. Keep DPOLossFn guard for now (dpo.py still passes dict on this branch;
   will remove after DPO PR #2524 merges)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: nliang <nliang@nvidia.com>
- Convert DPOLossConfig from TypedDict to BaseModel with defaults
- DPOLossFn.__init__ uses attribute access directly (no isinstance guard)
- dpo.py passes master_config.dpo directly (no .model_dump() roundtrip)
- Update test_loss_functions.py to pass DPOLossConfig(...) instead of dict

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: nliang <nliang@nvidia.com>
@NolenLiang NolenLiang requested a review from a team as a code owner May 21, 2026 09:42
@NolenLiang
Copy link
Copy Markdown
Contributor Author

/ok to test 71a72f1

Missed indirect caller of DPOLossFn in tests/unit/models/policy/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: nliang <nliang@nvidia.com>
@NolenLiang
Copy link
Copy Markdown
Contributor Author

/ok to test 69658f2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant