Skip to content

[docs] Fix OPD reverse KL formula in docs#2039

Open
zihaocheng-buaa wants to merge 2 commits into
THUDM:mainfrom
zihaocheng-buaa:fix-opd-reverse-kl-docs
Open

[docs] Fix OPD reverse KL formula in docs#2039
zihaocheng-buaa wants to merge 2 commits into
THUDM:mainfrom
zihaocheng-buaa:fix-opd-reverse-kl-docs

Conversation

@zihaocheng-buaa

Copy link
Copy Markdown

Summary

Fix the OPD reverse KL formula in both English and Chinese documentation.

The implementation computes the OPD penalty as student_log_probs - teacher_log_probs, so the documented reverse KL direction should be D_KL(P_student || P_teacher) instead of D_KL(P_teacher || P_student).

Validation

  • Ran git diff --check
  • Verified the change only affects OPD documentation

@zihaocheng-buaa zihaocheng-buaa changed the title Fix OPD reverse KL formula in docs [docs] Fix OPD reverse KL formula in docs Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant