Skip to content

Add 2d fsdp custom mesh and deprecate previous usages#3894

Merged
copybara-service[bot] merged 2 commits into
mainfrom
chengnuojin-2dfsdp
May 14, 2026
Merged

Add 2d fsdp custom mesh and deprecate previous usages#3894
copybara-service[bot] merged 2 commits into
mainfrom
chengnuojin-2dfsdp

Conversation

@NuojCheng
Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng commented May 13, 2026

Description

This PR introduces 2d fsdp custom mesh and rule. Unlike deepseek3-671b-2dfsdp, the new rule does not use fsdp_transpose and expert as TP in attention part.

This PR deprecates config yml deepseek3-671b-2dfsdp.yml and flag use_2dfsdp_sharding. In future, use custom_mesh_and_rule=2d-fsdp for the same functionality. Sharding dump tests are added for protection.

Tests

Comparing before and after change, the sharding debugging info are almost the same, and real performance gets improved by the new mesh and rule. HBM usage gets slightly increased mainly because attention weights are replicated along fsdp_transpose.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...t/integration/vllm/maxtext_vllm_adapter/adapter.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@NuojCheng NuojCheng force-pushed the chengnuojin-2dfsdp branch from 8569784 to f28f38c Compare May 13, 2026 19:03
@copybara-service copybara-service Bot merged commit fb0fdce into main May 14, 2026
47 of 48 checks passed
@copybara-service copybara-service Bot deleted the chengnuojin-2dfsdp branch May 14, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants