fix: Mixtral compatibility with transformers >=5.0 fused MoE API#451
Open
Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Open
fix: Mixtral compatibility with transformers >=5.0 fused MoE API#451Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Conversation
In transformers 5.x, MixtralDecoderLayer renamed block_sparse_moe to mlp, replaced the ModuleList of individual expert modules with a fused MixtralExperts class (3D nn.Parameter tensors), and changed the gate from nn.Linear to MixtralTopKRouter. This broke all Mixtral quantization with: 'MixtralDecoderLayer' object has no attribute 'block_sparse_moe' Fix: - Add _has_legacy_moe() to detect old vs new API via hasattr - get_extra_modules: returns block_sparse_moe (old) or mlp (new) - get_moe_gate: returns the gate from the appropriate MoE container - get_subsets_in_block: dispatches to _get_subsets_legacy (old per-expert w1/w2/w3 Linear modules) or _get_subsets_fused (new fused experts) - For the fused API, attention layers are quantized per-subset; the MoE block is passed as extra_modules for activation-aware hooks The legacy path is preserved unchanged for older transformers versions. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes Mixtral quantization broken by transformers 5.x API changes. Currently crashes with:
Problem
In
transformers>=5.0, the Mixtral architecture changed significantly:block.block_sparse_moeblock.mlpModuleListofMixtralBLockSparseTop2MLP(each withw1,w2,w3asnn.Linear)MixtralExpertswith fused 3Dnn.Parametertensors (gate_up_proj,down_proj)nn.LinearMixtralTopKRouterexperts[i].w1Fix
hasattr(block, 'block_sparse_moe')get_extra_modules(): returns the MoE container from the appropriate attributeget_moe_gate(): new method to access the gate regardless of API versionget_subsets_in_block(): dispatches to_get_subsets_legacy()(old per-expertw1/w2/w3) or_get_subsets_fused()(new fused experts)get_extra_modulesfor activation-aware hooksTest plan
mistralai/Mixtral-8x7B-v0.1loads without error on transformers 5.xblock_sparse_moepath)Made with Cursor