Add GLM4-MOE Mode w/Disaggregated Prefill and Decode Support by vbaddi · Pull Request #988 · quic/efficient-transformers

vbaddi · 2026-05-14T14:41:21Z

Summary

Adds GLM4-MOE support for disaggregated serving with chunked prefill.

Supported

GLM4-MOE decode path
Chunked prefill MoE path with packed expert dispatch
KV-blocked attention path
Disaggregated prefill/decode serving example
ONNX subfunction export for decode and prefill

Tested

Added GLM4-MOE prefill/blocked export tests
Verified packed MoE custom-op counts for prefill_seq_len=512, packed chunk size 256
Ran GLM4-MOE disaggregated example end-to-end w/tiny config.

 pytest -q tests/transformers/models/test_moe_prefill_blocked.py
 python examples/disagg_serving/glm4_moe_disagg_mode_with_chunking.py

Enable GLM4-MOE chunked prefill MoE, KV-blocked attention, and disaggregated serving export with subfunctions. - GLM4-MOE decode path - Chunked prefill MoE path with packed expert dispatch - KV-blocked attention path - Disaggregated prefill/decode serving example - ONNX subfunction export for decode and prefill Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Use the headpar_offline KV-blocking path by default for GQA-compatible KV blocking, with fallback to the previous online implementation for unsupported masking/bias cases. Revert to previous commit if fails. WIP Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Route KV-containing combined blocking modes through the headpar_offline path when supported, and pass user-tiled compile flags explicitly in the GLM4 MoE disagg example. Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…on export and update example Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Trace chunked prefill exports with the requested prefill_seq_len so packed MoE dispatch unrolls all packed chunks, restore torch.full_likeindex init, and add ONNX coverage for the second packed chunk slice. Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…ss/qwen3/pr935 Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi assigned vbaddi and tchawada May 14, 2026

vbaddi marked this pull request as draft May 14, 2026 14:41

quic-rishinr mentioned this pull request May 15, 2026

Add Glm4MoeForCausalLM Support #619

Closed

quic-rishinr requested review from ochougul and quic-rishinr and removed request for ochougul May 15, 2026 10:30

vbaddi force-pushed the feat/enable_glm4_moe branch from 6e91468 to 77e65e9 Compare May 15, 2026 14:41

vbaddi marked this pull request as ready for review May 18, 2026 21:36

vbaddi mentioned this pull request May 27, 2026

Feat/enable glm4 moe #991

Open

quic-rishinr changed the base branch from main to release/v1.22.0_tmp May 27, 2026 05:14

vbaddi added 8 commits May 27, 2026 10:55

feat(0514): Use head-parallel KV path for combined blocking

59095ea

Route KV-containing combined blocking modes through the headpar_offline path when supported, and pass user-tiled compile flags explicitly in the GLM4 MoE disagg example. Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: fix the license header in the example file

4953a7c

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

fix(0415): fix: avoid unsupported prefill MoE reductions in subfuncti…

df6d647

…on export and update example Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

fix(0526): Align MoE prefill blocking with bench path, same like gpto…

bce4b73

…ss/qwen3/pr935 Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

fix(0527): align GLM4-MoE with transformers 5.5 cache and expert APIs

5c632b7

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi force-pushed the feat/enable_glm4_moe branch from 8452e31 to 5c632b7 Compare May 27, 2026 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GLM4-MOE Mode w/Disaggregated Prefill and Decode Support#988

Add GLM4-MOE Mode w/Disaggregated Prefill and Decode Support#988
vbaddi wants to merge 8 commits into
quic:release/v1.22.0_tmpfrom
vbaddi:feat/enable_glm4_moe

vbaddi commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbaddi commented May 14, 2026

Summary

Supported

Tested

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants