fix: Mixtral compatibility with transformers >=5.0 fused MoE API by Medhatt21 · Pull Request #451 · ModelTC/LightCompress

Medhatt21 · 2026-03-03T15:33:34Z

Summary

Fixes Mixtral quantization broken by transformers 5.x API changes. Currently crashes with:

'MixtralDecoderLayer' object has no attribute 'block_sparse_moe'

Problem

In transformers>=5.0, the Mixtral architecture changed significantly:

Component	transformers <5.0	transformers >=5.0
MoE container	`block.block_sparse_moe`	`block.mlp`
Experts	`ModuleList` of `MixtralBLockSparseTop2MLP` (each with `w1`, `w2`, `w3` as `nn.Linear`)	`MixtralExperts` with fused 3D `nn.Parameter` tensors (`gate_up_proj`, `down_proj`)
Gate	`nn.Linear`	`MixtralTopKRouter`
Expert access	`experts[i].w1`	Not subscriptable

Fix

Detect old vs new API via hasattr(block, 'block_sparse_moe')
get_extra_modules(): returns the MoE container from the appropriate attribute
get_moe_gate(): new method to access the gate regardless of API version
get_subsets_in_block(): dispatches to _get_subsets_legacy() (old per-expert w1/w2/w3) or _get_subsets_fused() (new fused experts)
For the fused API, attention layers are quantized per-subset; the MoE block is passed via get_extra_modules for activation-aware hooks
Legacy path preserved unchanged for older transformers versions

Test plan

Verify mistralai/Mixtral-8x7B-v0.1 loads without error on transformers 5.x
Verify attention layers are quantized correctly (GPTQ/RTN/SmoothQuant)
Verify MoE block is passed through as extra_modules for SmoothQuant-style calibration
Verify backward compatibility on transformers <5.0 (legacy block_sparse_moe path)

Made with Cursor

In transformers 5.x, MixtralDecoderLayer renamed block_sparse_moe to mlp, replaced the ModuleList of individual expert modules with a fused MixtralExperts class (3D nn.Parameter tensors), and changed the gate from nn.Linear to MixtralTopKRouter. This broke all Mixtral quantization with: 'MixtralDecoderLayer' object has no attribute 'block_sparse_moe' Fix: - Add _has_legacy_moe() to detect old vs new API via hasattr - get_extra_modules: returns block_sparse_moe (old) or mlp (new) - get_moe_gate: returns the gate from the appropriate MoE container - get_subsets_in_block: dispatches to _get_subsets_legacy (old per-expert w1/w2/w3 Linear modules) or _get_subsets_fused (new fused experts) - For the fused API, attention layers are quantized per-subset; the MoE block is passed as extra_modules for activation-aware hooks The legacy path is preserved unchanged for older transformers versions. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Mixtral compatibility with transformers >=5.0 fused MoE API#451

fix: Mixtral compatibility with transformers >=5.0 fused MoE API#451
Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Medhatt21:fix/mixtral-transformers-5x-compat

Medhatt21 commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Medhatt21 commented Mar 3, 2026

Summary

Problem

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant