Skip to content

[Speculative Decoding]【Hackathon 10th Spring No.49】Adapt ngram_match and hybrid_mtp_ngram gpu kernels#7103

Open
NKNaN wants to merge 2 commits intoPaddlePaddle:developfrom
NKNaN:ngram
Open

[Speculative Decoding]【Hackathon 10th Spring No.49】Adapt ngram_match and hybrid_mtp_ngram gpu kernels#7103
NKNaN wants to merge 2 commits intoPaddlePaddle:developfrom
NKNaN:ngram

Conversation

@NKNaN
Copy link
Copy Markdown

@NKNaN NKNaN commented Mar 31, 2026

Motivation

rfc: PaddlePaddle/community#1213

Modifications

  • 实现方式:两个 kernel。
    • 第一阶段:count_and_find_candidate_kernel,网格为 <<<max_batch_size+1, 1024>>>。
      • block 0 用 BlockReduce 统计全局 unprocessed_batch_size。
      • block 1..N 各自负责一个 batch 并行执行候选查找(input_ids / pre_ids)。
    • 第二阶段:truncate_candidate,<<<1, 1024>>>,统一按 threshold 做截断和写回。
      • 该阶段使用 CUB BlockScan 做前缀和(processed_batch_size / sum_token_num),用于计算每个 batch 的可分配 token 上限并完成截断。

Usage or Command

Accuracy Tests

https://github.com/NKNaN/FastDeploy_ngram_match_kernel

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 31, 2026

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant