feat: migrate pipeline to nnx by mesakhcienet · Pull Request #2885 · AI-Hypercomputer/maxtext

mesakhcienet · 2025-12-24T06:35:02Z

Description

implement nnx-based pipeline.

This PR extends PR#2831

Main changes:

nnx_decoders.py: implementing the missing pipeline logic in nnx_decoders.py.
pipeline.py : add a new class NNXPipeline, which is a nnx-based pipeline class.

Tests

we run the pipeline process with command below:

MODEL_NAME=llama2-7b
python -m MaxText.train src/maxtext/configs/base.yml \
    run_name=pipeline_test_${MODEL_NAME}_nnx \
    base_output_directory=/dev/shm/pipeline_test_nnx \
    model_name=${MODEL_NAME}\
    dataset_type=synthetic \
    steps=15 \
    debug_sharding=true \
    per_device_batch_size=2 \
    max_target_length=32 \
    ici_pipeline_parallelism=2 \
    num_pipeline_microbatches=4 \
    num_layers_per_pipeline_stage=2 \
    enable_checkpointing=false \
    enable_nnx=true \
    pure_nnx_decoder=true \
    scan_layers_per_stage=false \
    async_checkpointing=false > nnx-porting-log/pipeline/custom_${MODEL_NAME}.log 2>&1

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-01-19T07:41:42Z

Codecov Report

❌ Patch coverage is 27.04225% with 518 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/layers/pipeline.py	37.02%	264 Missing and 3 partials ⚠️
src/maxtext/layers/nnx_decoders.py	12.23%	232 Missing and 19 partials ⚠️

📢 Thoughts on this report? Let us know!

bvandermoon

@gobbleturk what testing do you recommend for migrating pipeline parallelism to NNX? I'll send over an internal doc @hsuan-lun-chiang, @mesakhcienet, and others put together that shows the tests they have already run

bvandermoon · 2026-03-25T21:41:31Z

@gobbleturk what testing do you recommend for migrating pipeline parallelism to NNX? I'll send over an internal doc @hsuan-lun-chiang, @mesakhcienet, and others put together that shows the tests they have already run

@NuojCheng any thoughts here?

NuojCheng

Some additional train compile test for pipeline NNX migration:

Train compile test 1: https://paste.googleplex.com/5960957017849856
Train compile test 2: https://paste.googleplex.com/5749974483730432
Train compile test 3: https://paste.googleplex.com/5201745681711104
If the train compile tests above can pass without getting OOM + current tests in pipeline_parallelism_test.py can all pass, then I think it is good to go! Please ping me if the PR is ready for review.

NuojCheng · 2026-03-25T22:16:56Z

There are also some linen usage in pipeline_utils.py, e.g.

maxtext/src/maxtext/utils/pipeline_utils.py

Line 330 in 77f5334

return nn.scan(
maxtext/src/maxtext/utils/pipeline_utils.py

Line 307 in 77f5334

return nn.remat(

I don't see them get updated in this PR but I think they probably should be updated?

Another thing is the usage of function in

maxtext/src/maxtext/utils/pipeline_utils.py

Lines 151 to 162 in 77f5334

    
           # TODO(chengnuojin) Remove this function and its usage after pipeline nnx migration 
        
           def remove_logically_partition(weights): 
        
             """Removes LogicallyPartitioned wrapper from weights.""" 
        
             def _remove_logically_partition_leaf(v): 
        
               return getattr(v, "value") if isinstance(v, LogicallyPartitioned) else v 
        
             return jax.tree.map( 
        
                 _remove_logically_partition_leaf, 
        
                 weights, 
        
                 is_leaf=lambda v: isinstance(v, LogicallyPartitioned), 
        
             )

. I suspect NNX migration can help us get rid of using this function since it is mostly dealing with linen wrapper troubles. Take a look on this if you can. Thank you for the hard work!

mesakhcienet changed the title ~~core: migrate pipeline to nnx~~ feat: migrate pipeline to nnx Dec 24, 2025

mesakhcienet force-pushed the test/pipeline-scan-nnx branch 8 times, most recently from 6875da8 to f34b1a3 Compare January 15, 2026 23:43

mesakhcienet force-pushed the test/pipeline-scan-nnx branch 4 times, most recently from 12a3907 to 2c16599 Compare January 28, 2026 08:04

mesakhcienet force-pushed the test/pipeline-scan-nnx branch 2 times, most recently from 64dc147 to 9e4518e Compare February 2, 2026 01:58

mesakhcienet force-pushed the test/pipeline-scan-nnx branch from 631a73e to ac97a1d Compare March 2, 2026 08:48

mesakhcienet changed the base branch from main to xibin/nnx_all March 2, 2026 08:48

ecnal-cienet force-pushed the xibin/nnx_all branch 12 times, most recently from 1849f0b to 669dc01 Compare March 3, 2026 19:59

ecnal-cienet force-pushed the xibin/nnx_all branch 11 times, most recently from 2d742f9 to fc3fe0b Compare March 7, 2026 04:00

mesakhcienet force-pushed the test/pipeline-scan-nnx branch from 2e46721 to b732cb3 Compare March 9, 2026 01:24

mesakhcienet changed the base branch from xibin/nnx_all to main March 9, 2026 01:25

mesakhcienet force-pushed the test/pipeline-scan-nnx branch 8 times, most recently from 618de58 to e7656b2 Compare March 11, 2026 06:29

feat: implement nnx-based pipeline

34c002e

bvandermoon reviewed Mar 25, 2026

View reviewed changes

NuojCheng reviewed Mar 25, 2026

View reviewed changes

mesakhcienet added 4 commits March 27, 2026 15:37

Update pipeline.py

a7a60b9

test

fd1250b

test

fb744b0

fix: update

0e4ace0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate pipeline to nnx#2885

feat: migrate pipeline to nnx#2885
mesakhcienet wants to merge 5 commits intoAI-Hypercomputer:mainfrom
CIeNET-International:test/pipeline-scan-nnx

mesakhcienet commented Dec 24, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jan 19, 2026 •

edited

Loading

Uh oh!

bvandermoon left a comment

Uh oh!

bvandermoon commented Mar 25, 2026

Uh oh!

NuojCheng left a comment

Uh oh!

NuojCheng commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mesakhcienet commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bvandermoon left a comment

Choose a reason for hiding this comment

Uh oh!

bvandermoon commented Mar 25, 2026

Uh oh!

NuojCheng left a comment

Choose a reason for hiding this comment

Uh oh!

NuojCheng commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mesakhcienet commented Dec 24, 2025 •

edited

Loading

codecov bot commented Jan 19, 2026 •

edited

Loading