Add Qwen3-Next to checkpoint util & update test scripts #2973

Rohan-Bierneni · 2026-01-20T19:15:35Z

Description

[Ckpt Conversion] Support Qwen3-Next in Unified Checkpoint Conversion Utility

This PR migrates the Qwen3-Next (qwen3-next-80b-a3b) from standalone conversion scripts to the centralized MaxText.utils.ckpt_conversion` library.

Previously, Qwen3-Next relied on ad-hoc scripts for checkpointing. Moving this to the unified utility enables:

Bidirectional Conversion: Robust support for converting both HF -> MaxText and MaxText -> HF.
Scanned & Unscanned Support: Native handling of scanned layers (optimized for training) and unscanned layers (optimized for decoding/inference).
Maintainability: Centralizes logic for the hybrid attention architecture (interleaved Linear and Full attention layers) within the standard mapping infrastructure.

Changes

src/MaxText/utils/ckpt_conversion/utils/hf_model_configs.py: Added qwen3_next_80b_a3b_config using transformers.Qwen3NextConfig and registered it in HF_MODEL_CONFIGS.
src/MaxText/utils/ckpt_conversion/utils/param_mapping.py:
- Implemented QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_MAPPING: Handles the inhomogeneous layer cycle (mapping Full Attention vs. Linear/Hybrid Attention blocks based on layer index) and MoE components (Shared vs. Routed experts).
- Implemented QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_HOOK_FN with robust tensor handling:
  - Correct Transposition for Scanned 1D Tensors: Added specific handling (using identity hooks) for 1D parameters like A_log (shape [1]). This ensures that the scan axis is correctly handled during conversion (e.g., transforming to [1, 12] where appropriate) rather than incorrectly collapsing to [1,].
  - Preservation of Singleton Dimensions: Implemented permute_conv to correctly handle conv1d kernels (HF: [C, 1, K] <-> MT: [K, 1, C]). This prevents dimensions with value 1 from being incorrectly squeezed or flattened during the permutation process.
src/MaxText/utils/ckpt_conversion/utils/hf_shape.py: Added QWEN3_NEXT_HF_WEIGHTS_TO_SHAPE to calculate expected HF tensor shapes for validation.
end_to_end/tpu/qwen/next/...:
- Updated 1_test_qwen3_next_80b_a3b.sh to use python3 -m MaxText.utils.ckpt_conversion.to_maxtext instead of the legacy script.
- Added 2_test_qwen3_next_80b_a3b.shfor XLML tests to consume for forward_pass & decode verification.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/469445683

Tests

The commands used to generate the checkpoints themselves: https://paste.googleplex.com/4921565475110912

Will run forward pass logit checker on converted checkpoint from Maxtext -> HF -> Maxtext for scanned and post results here:

Current status:

to_maxtext tests:

hf -> maxtext (scanned): https://paste.googleplex.com/5151438898593792
hf -> maxtext (unscanned): https://paste.googleplex.com/4721564912320512

to_huggingface tests:

Convert scanned & unscanned maxtext checkpoints from previous tests to hf format. Run forward_pass check against new hf checkpoints and existing maxtext checkpoints.

Maxtext (scanned) -> HF: https://paste.googleplex.com/4787924765900800
Maxtext (unscanned) -> HF: https://paste.googleplex.com/5256341314732032

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-01-20T19:28:59Z

Codecov Report

❌ Patch coverage is 0% with 101 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...xText/utils/ckpt_conversion/utils/param_mapping.py	0.00%	71 Missing ⚠️
...rc/MaxText/utils/ckpt_conversion/utils/hf_shape.py	0.00%	28 Missing ⚠️
...xt/utils/ckpt_conversion/utils/hf_model_configs.py	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh

parambole · 2026-01-26T19:16:01Z

src/MaxText/utils/ckpt_conversion/utils/hf_shape.py

+    is_full_attention_layer = (layer_idx + 1) % cycle_interval == 0
+
+    if is_full_attention_layer:
+    # Full Attention Block


nit: Adding comments explaining how these numbers relate to the config parameters (e.g., hidden_size, num_attention_heads * head_dim, etc.) or if they are fixed architectural dimensions would greatly enhance maintainability. For example, it seems 4096 = config["num_attention_heads"] * config["head_dim"]

Yes I will add how the hard coded numbers are calculated. The Gated Delta Net in particular has a bunch of these calculations.

parambole

I have left a couple of comments. PTAL.

shuningjin

Thanks for adding the model to conversion tool, along with careful logit checks! Left a minor comment.

For future reference, could you also add the conversion commands to the PR description? Would be nice to also add the conversion time in description, if you have it. Thank you!

For test script 2_test_qwen3_next_80b_a3b.sh:

Maybe also add pre-training and finetuning (example). Training was omitted from DS3 as covered by ubench.
Could you test this script and attach log to description?
Thanks for updating the description. Maybe update PR title as well to accurately reflect the change: e.g., add "update test scripts".
Will this be added to XLML in the other repo?

src/MaxText/utils/ckpt_conversion/utils/param_mapping.py

end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/1_test_qwen3_next_80b_a3b.sh

Rohan-Bierneni requested review from NicoGrande, RissyRan, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, parambole, richjames0, shralex, shuningjin and suexu1025 as code owners January 20, 2026 19:15

Rohan-Bierneni mentioned this pull request Jan 20, 2026

Migrate Qwen3 Next to Checkpoint Util #2972

Closed

4 tasks

Rohan-Bierneni requested review from A9isha, NuojCheng, SurbhiJainUSC, aireenmei, jesselu-google, khatwanimohit and vipannalla as code owners January 21, 2026 16:57

Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch from 8454259 to 866ee4e Compare January 22, 2026 22:43

Rohan-Bierneni requested a review from jacoguzo as a code owner January 22, 2026 22:45

Rohan-Bierneni self-assigned this Jan 22, 2026

parambole reviewed Jan 26, 2026

View reviewed changes

tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh Show resolved Hide resolved

parambole reviewed Jan 26, 2026

View reviewed changes

tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh Show resolved Hide resolved

parambole reviewed Jan 26, 2026

View reviewed changes

tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh Show resolved Hide resolved

parambole reviewed Jan 26, 2026

View reviewed changes

shuningjin reviewed Jan 27, 2026

View reviewed changes

Rohan-Bierneni changed the title ~~Add Qwen3-Next to checkpoint util~~ Add Qwen3-Next to checkpoint util & update test scripts Jan 29, 2026

Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch from 5bafc01 to 4fd58f5 Compare January 29, 2026 18:00

Add Qwen3-Next to checkpoint util (Squashed)

a6f97ae

Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch from 4fd58f5 to a6f97ae Compare January 30, 2026 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-Next to checkpoint util & update test scripts #2973

Add Qwen3-Next to checkpoint util & update test scripts #2973

Rohan-Bierneni commented Jan 20, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parambole Jan 26, 2026

Uh oh!

Rohan-Bierneni Jan 26, 2026

Uh oh!

parambole left a comment

Uh oh!

shuningjin left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Qwen3-Next to checkpoint util & update test scripts #2973

Are you sure you want to change the base?

Add Qwen3-Next to checkpoint util & update test scripts #2973

Conversation

Rohan-Bierneni commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

[Ckpt Conversion] Support Qwen3-Next in Unified Checkpoint Conversion Utility

Tests

Current status:

to_maxtext tests:

to_huggingface tests:

Checklist

Uh oh!

codecov bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parambole Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Rohan-Bierneni Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

parambole left a comment

Choose a reason for hiding this comment

Uh oh!

shuningjin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rohan-Bierneni commented Jan 20, 2026 •

edited

Loading

codecov bot commented Jan 20, 2026 •

edited

Loading

shuningjin left a comment •

edited

Loading