Skip to content

Fix rollout_is_seq metrics by excluding padded tokens in packed sequences#1788

Merged
hhaAndroid merged 2 commits into
InternLM:mainfrom
Hyperion-shuo:ss/fix_is_metrics_pack_pad_seq
May 15, 2026
Merged

Fix rollout_is_seq metrics by excluding padded tokens in packed sequences#1788
hhaAndroid merged 2 commits into
InternLM:mainfrom
Hyperion-shuo:ss/fix_is_metrics_pack_pad_seq

Conversation

@Hyperion-shuo
Copy link
Copy Markdown
Contributor

image

Summary

This PR fixes the rollout_is_seq metric computation for packed sequences.

Previously, padded positions in packed sequences were included in the rollout_is_seq statistics. As a result, some metrics were dominated by padding artifacts:

  • rollout_is/mismatch/rollout_is_seq_min stayed close to 0.
  • rollout_is/mismatch/rollout_is_seq_mean was biased downward.
  • rollout_is/mismatch/rollout_is_seq_max_deviation stayed close to 1.

This PR masks out padded positions so that these metrics are computed only over valid sequence tokens.

Validation

I compared TensorBoard metrics before and after the fix.

  • Green: before the fix
  • Red: after the fix

After the fix, rollout_is_seq_min and rollout_is_seq_max_deviation are no longer affected by padded tokens.

@hhaAndroid
Copy link
Copy Markdown
Collaborator

@claude review

Comment thread xtuner/v1/rl/base/rollout_is.py Outdated
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

Claude: ## Summary

Clean, well-motivated fix. Excluding padded positions from per-sequence IS metrics is the right approach — the sentinel values (-inf/inf) for distributed max/min aggregation are correct.

Issues

Nit

  • rollout_is.py L550-562: count_value all_reduce is computed per-key inside the loop, but the validity mask is independent of key. Could hoist to avoid ~15-20 redundant NCCL all_reduce calls per step (negligible wall-clock impact but cleaner).

Verdict

APPROVE

@hhaAndroid
Copy link
Copy Markdown
Collaborator

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude: ## Summary

Second commit properly addresses the hoisted count_value all_reduce nit from the prior review. Verified the final state:

  • Shape compatibility: has_catastrophic is (batch, 1) (from keepdim=True), matching seq_valid shape — no broadcasting issue.
  • Sentinel values (-inf/inf) correctly don't affect ReduceOp.MAX/ReduceOp.MIN.
  • All metric keys containing "max"/"min" (e.g., rollout_is_seq_max_deviation, mismatch_log_ppl_diff_min) are semantically max/min values — dispatch logic is sound.
  • count == 0 guard prevents NaN from division by zero and prevents sentinels from leaking.

Verdict

APPROVE

@hhaAndroid hhaAndroid merged commit 91e30dd into InternLM:main May 15, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants