[XPU] Add TP broadcast after sampling in XPU model runner. by Jiajun-Ji · Pull Request #7096 · PaddlePaddle/FastDeploy

Jiajun-Ji · 2026-03-31T02:18:20Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

In TP, each rank may produce different sampling results due to independent random sampling
Added a broadcast operation after sampling in the XPU model runner to synchronize the sampled tokens from rank 0.

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…onsistent results across ranks.

paddle-bot · 2026-03-31T02:18:26Z

Thanks for your contribution!

EmmonsCurse · 2026-03-31T02:19:46Z

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci build_gpu

Copilot

Pull request overview

该 PR 旨在解决 XPU + Tensor Parallel（TP）场景下各 rank 独立采样导致的生成结果不一致问题，通过在采样后增加 TP 组内广播，使各 TP rank 以 rank0 的采样结果为准，从而保证跨 rank 的一致性。

Changes:

在非投机解码路径中，采样后对 sampled_token_ids 进行 TP 组内广播同步。
在投机解码路径中，采样后对 accept_tokens/accept_num/step_idx/stop_flags 进行 TP 组内广播同步。

Copilot · 2026-03-31T02:24:55Z

fastdeploy/worker/xpu_model_runner.py

+                if self.parallel_config.tensor_parallel_size > 1:
+                    paddle.distributed.broadcast(
+                        sampler_output.sampled_token_ids,
+                        self.parallel_config.data_parallel_rank * self.parallel_config.tensor_parallel_size,
+                        group=self.parallel_config.tp_group,
+                    )


这里 src（root rank）的计算表达式在多处重复使用，后面 speculative 分支也同样重复。建议先用局部变量（例如 tp_src_rank = data_parallel_rank * tensor_parallel_size）保存，再传给 broadcast，避免复制粘贴带来的维护风险。

Copilot · 2026-03-31T02:24:56Z

fastdeploy/worker/xpu_model_runner.py

+                if self.parallel_config.tensor_parallel_size > 1:
+                    paddle.distributed.broadcast(
+                        self.share_inputs["accept_tokens"],
+                        self.parallel_config.data_parallel_rank * self.parallel_config.tensor_parallel_size,
+                        group=self.parallel_config.tp_group,
+                    )


speculative 分支里连续多次调用 broadcast，且每次都重复同一个 src 计算逻辑。建议复用同一个 tp_src_rank 变量，并考虑用一个 key 列表循环广播这些张量（accept_tokens/accept_num/step_idx/stop_flags），降低后续新增/修改字段时遗漏的概率。

[XPU] Add TP broadcast after sampling in XPU model runner to ensure c…

0f82653

…onsistent results across ranks.

Copilot AI review requested due to automatic review settings March 31, 2026 02:18

Jiajun-Ji temporarily deployed to Metax_ci March 31, 2026 02:18 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Jiajun-Ji March 31, 2026 02:18 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] Add TP broadcast after sampling in XPU model runner.#7096

[XPU] Add TP broadcast after sampling in XPU model runner.#7096
Jiajun-Ji wants to merge 1 commit intoPaddlePaddle:developfrom
Jiajun-Ji:broadcast-tp

Jiajun-Ji commented Mar 31, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Mar 31, 2026

Uh oh!

EmmonsCurse commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jiajun-Ji commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 31, 2026

Uh oh!

EmmonsCurse commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jiajun-Ji commented Mar 31, 2026 •

edited

Loading