Skip to content

Add XPU MoE decode kernel (FP16/BF16 + INT4 sym/asym)#1813

Draft
Copilot wants to merge 5 commits into
mainfrom
copilot/add-xpu-moe-decode-implementation
Draft

Add XPU MoE decode kernel (FP16/BF16 + INT4 sym/asym)#1813
Copilot wants to merge 5 commits into
mainfrom
copilot/add-xpu-moe-decode-implementation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 14, 2026

  • Add INT8 (sym/asym) decode GEMV kernel in sycl_tla_moe_decode.hpp
  • Add INT2 (sym/asym) decode GEMV kernel (4 packed values per byte, sign-extended via shift trick)
  • Add FP8 (E4M3 / E5M2) decode GEMV kernel with inline bit-pattern decode (verified to match torch FP8 cast for all 256 byte values)
  • Extend moe_gemm_decode Python wrapper to accept weight_bits=8, weight_bits=2, and FP8 weight dtypes (torch.float8_e4m3fn, torch.float8_e5m2)
  • Add unit tests for INT8 sym/asym, INT2 sym/asym, FP8 E4M3/E5M2 (parametrized over fp16/bf16 acts and group_size where applicable) and an FP8+asym validation-error case in test_moe.py
  • Verified packing/dequant helpers and kernel-equivalent decode produce bit-exact reference output (int2 sym/asym diff=0; fp8 formulas match torch cast for all bytes)
  • parallel_validation: CodeQL 0 alerts; code-review nit on docstring addressed

Copilot AI and others added 2 commits May 14, 2026 04:03
Copilot AI and others added 2 commits May 14, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants