Skip to content

[XPU] Add TP broadcast after sampling in XPU model runner.#7096

Open
Jiajun-Ji wants to merge 1 commit intoPaddlePaddle:developfrom
Jiajun-Ji:broadcast-tp
Open

[XPU] Add TP broadcast after sampling in XPU model runner.#7096
Jiajun-Ji wants to merge 1 commit intoPaddlePaddle:developfrom
Jiajun-Ji:broadcast-tp

Conversation

@Jiajun-Ji
Copy link
Copy Markdown
Contributor

@Jiajun-Ji Jiajun-Ji commented Mar 31, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

In TP, each rank may produce different sampling results due to independent random sampling
Added a broadcast operation after sampling in the XPU model runner to synchronize the sampled tokens from rank 0.

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings March 31, 2026 02:18
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 31, 2026

Thanks for your contribution!

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci build_gpu

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在解决 XPU + Tensor Parallel(TP)场景下各 rank 独立采样导致的生成结果不一致问题,通过在采样后增加 TP 组内广播,使各 TP rank 以 rank0 的采样结果为准,从而保证跨 rank 的一致性。

Changes:

  • 在非投机解码路径中,采样后对 sampled_token_ids 进行 TP 组内广播同步。
  • 在投机解码路径中,采样后对 accept_tokens/accept_num/step_idx/stop_flags 进行 TP 组内广播同步。

Comment on lines +1588 to +1593
if self.parallel_config.tensor_parallel_size > 1:
paddle.distributed.broadcast(
sampler_output.sampled_token_ids,
self.parallel_config.data_parallel_rank * self.parallel_config.tensor_parallel_size,
group=self.parallel_config.tp_group,
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 src(root rank)的计算表达式在多处重复使用,后面 speculative 分支也同样重复。建议先用局部变量(例如 tp_src_rank = data_parallel_rank * tensor_parallel_size)保存,再传给 broadcast,避免复制粘贴带来的维护风险。

Copilot uses AI. Check for mistakes.
Comment on lines +1601 to +1606
if self.parallel_config.tensor_parallel_size > 1:
paddle.distributed.broadcast(
self.share_inputs["accept_tokens"],
self.parallel_config.data_parallel_rank * self.parallel_config.tensor_parallel_size,
group=self.parallel_config.tp_group,
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speculative 分支里连续多次调用 broadcast,且每次都重复同一个 src 计算逻辑。建议复用同一个 tp_src_rank 变量,并考虑用一个 key 列表循环广播这些张量(accept_tokens/accept_num/step_idx/stop_flags),降低后续新增/修改字段时遗漏的概率。

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants