Add Gemma4 MaxText→vLLM Weight Converter (torchax)#3794
Merged
copybara-service[bot] merged 1 commit intomainfrom May 6, 2026
Merged
Add Gemma4 MaxText→vLLM Weight Converter (torchax)#3794copybara-service[bot] merged 1 commit intomainfrom
copybara-service[bot] merged 1 commit intomainfrom
Conversation
c1807e9 to
1859f35
Compare
|
🤖 Hi @aireenmei, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This Pull Request introduces a weight converter for the Gemma4 model to facilitate MaxText to vLLM conversion, specifically for the gemma4-26b MoE variant. The implementation is well-structured and follows the established patterns for weight conversion in the project, including proper JIT usage and memory management.
🔍 General Feedback
- Naming Consistency: There is a critical naming discrepancy in the MoE weight keys (
moe.per_expert_scalevsrouter.per_expert_scale) that should be resolved to ensure compatibility with vLLM's expectation. - Code Cleanup: A few minor items like unused arguments in JIT functions and commented-out debugging code in the validator should be addressed to maintain code quality.
- Performance: The use of
jax.jitfor batch weight processing is a good practice for minimizing conversion overhead.
aireenmei
approved these changes
May 1, 2026
1859f35 to
721a9ab
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
d55189f to
434727b
Compare
a939844 to
bcfa1b1
Compare
khatwanimohit
approved these changes
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Original author: @khatwanimohit @aireenmei #3677
Refactor/test: @hengtaoguo
Extracts the Gemma4-specific weight conversion logic from
bench_weight_sync.pyinto a proper converter class, following the same base/model-specific split established for Qwen3.New:
gemma4_moe.pyGemma4MaxTextToVLLMConverter, inheriting fromBaseMaxTextToVLLMConvertergemma4-26b(MoE: 128 routed + 1 shared expert)convert()to add the_convert_normsstep and dispatch MoE vs. dense MLPUpdated:
validate_converter.pyGemma4MaxTextToVLLMConverterand dispatches ongemma4-*model namesgemma4-26bentry tovllm_model_name_mappingNotes:
MODEL_IMPL_TYPE=vllmto force the torchax-backed vLLM model for Gemma4 (default"auto"resolves to"flax_nnx"in newer tpu-inference, which uses a nested Flax state incompatible with the flat-key converter output)<bos>, example:prompt="<box>Paris is"Tests
Tested with
validate_converter, full logs:Tuned checkpoint with chat template
use_chat_template=True:Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.