Implement DPPO Algorithm by jonahsamost · Pull Request #1037 · NovaSky-AI/SkyRL

jonahsamost · 2026-02-06T17:07:25Z

Referencing #1028

Implements DPPO, which replaces PPO's ratio based clipping with their own divergence based binary masking

gemini-code-assist

Code Review

This pull request implements the Divergence Proximal Policy Optimization (DPPO) algorithm, including the configuration, the loss function implementation, and corresponding tests. The changes are well-structured and the implementation appears correct. I've only found a minor typo in the reference link to the DPPO paper in both the configuration file and the docstring, for which I've suggested corrections.

gemini-code-assist · 2026-02-07T15:57:22Z

skyrl-train/skyrl_train/config/ppo_base_config.yaml

      cispo_eps_clip_high: 5 # offset for upper bound of importance sampling ratio clipping (as opposed to PPO token update clipping)
+
+    # DPPO parameters (only used when policy_loss_type: "dppo")
+    # See: https://arxiv.org/abs/2602.04879


There appears to be a typo in the arXiv link. The ID 2602.04879 seems to be incorrect. The correct ID for 'Divergence Proximal Policy Optimization' is likely 2402.04879.

# See: https://arxiv.org/abs/2402.04879

gemini-code-assist · 2026-02-07T15:57:22Z

skyrl-train/skyrl_train/utils/ppo_utils.py

+    Section G.2 the authors find Top-K masking provides no significant benefit
+    over the simpler binary approximation, so we only implement binary here.
+
+    See: https://arxiv.org/abs/2602.04879


There appears to be a typo in the arXiv link. The ID 2602.04879 seems to be incorrect. The correct ID for 'Divergence Proximal Policy Optimization' is likely 2402.04879.

Suggested change

See: https://arxiv.org/abs/2602.04879

See: https://arxiv.org/abs/2402.04879

jonahsamost added 4 commits February 6, 2026 09:02

initial

0969fa9

precommit

590e9b0

test dppo

63e627a

precommit

b572b20

jonahsamost marked this pull request as ready for review February 7, 2026 15:56

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Implement DPPO Algorithm#1037

Implement DPPO Algorithm#1037
jonahsamost wants to merge 4 commits intoNovaSky-AI:mainfrom
jonahsamost:jonah_dppo

jonahsamost commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Uh oh!

gemini-code-assist bot Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	See: https://arxiv.org/abs/2602.04879
	See: https://arxiv.org/abs/2402.04879

Comments

Conversation

jonahsamost commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant