Skip to content

[DeepSeek-V4] Implement MoE routing primitives (HashRouter, TopKRouter, RoutedMoE)#3871

Open
parambole wants to merge 1 commit into
deepseek_v4_core_primitivesfrom
dsv4-moe-routing-primitives
Open

[DeepSeek-V4] Implement MoE routing primitives (HashRouter, TopKRouter, RoutedMoE)#3871
parambole wants to merge 1 commit into
deepseek_v4_core_primitivesfrom
dsv4-moe-routing-primitives

Conversation

@parambole
Copy link
Copy Markdown
Collaborator

@parambole parambole commented May 11, 2026

Description

Implement Mixture of Experts (MoE) routing gates and execution layers required for DeepSeek-V4 integration into MaxText:

  • HashRouter: Token routing mechanism utilizing MD5 hash projections for deterministic expert assignment without auxiliary loss.
  • TopKRouter: Gated top-k router implementing sigmoid scaling and score normalization across selected experts.
  • RoutedMoE & RoutedAndSharedMoE: Execution layers supporting layer_idx routing, gate clamping, and FP32 expert summation parity.
  • Unit test suite (tests/unit/deepseek_v4_vs_reference_test.py) validating MoE routing parity against PyTorch reference implementations at atol=1e-5, rtol=1e-5.

Tests

Tested on CPU:

pytest  tests/unit/deepseek_v4_vs_reference_test.py

======================== 6 passed, 8 warnings in 3.99s =========================
tests/unit/deepseek_v4_vs_reference_test.py ......                       [100%]

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 17.64706% with 84 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/layers/moe.py 16.83% 72 Missing and 12 partials ⚠️

📢 Thoughts on this report? Let us know!

@parambole parambole force-pushed the dsv4-moe-routing-primitives branch from 37ee811 to 31329c5 Compare May 11, 2026 20:38
@parambole parambole force-pushed the deepseek_v4_core_primitives branch from 1ab79e5 to c025463 Compare May 12, 2026 17:22
@parambole parambole force-pushed the dsv4-moe-routing-primitives branch from 31329c5 to 22a57ff Compare May 12, 2026 17:23
@parambole parambole force-pushed the deepseek_v4_core_primitives branch from c025463 to 68c44a6 Compare May 12, 2026 21:12
@parambole parambole force-pushed the dsv4-moe-routing-primitives branch from 22a57ff to 32869e5 Compare May 12, 2026 21:12
@parambole parambole force-pushed the deepseek_v4_core_primitives branch 2 times, most recently from 72a92a7 to e81f52d Compare May 14, 2026 17:45
…outer, RoutedMoE)

Implement Mixture of Experts routing gates and execution layers for DeepSeek-V4 integration into MaxText:

- HashRouter: Token routing mechanism utilizing MD5 hash projections for deterministic expert assignment.
- TopKRouter: Gated top-k router implementing sigmoid scaling and score normalization.
- RoutedMoE & RoutedAndSharedMoE: Execution layers supporting layer_idx routing and FP32 expert summation parity.
- Parity verification: Extended unit test suite (deepseek_v4_vs_reference_test.py) validating routing parity against PyTorch reference implementations at atol=1e-5, rtol=1e-5.
@parambole parambole force-pushed the dsv4-moe-routing-primitives branch from 32869e5 to c92f2e0 Compare May 14, 2026 17:51
@parambole parambole changed the title Implement custom MoE HashRouter, TopKRouter, and sqrtsoftplus Implement DeepSeek-V4 MoE routing primitives (HashRouter, TopKRouter, RoutedMoE) May 14, 2026
@parambole parambole changed the title Implement DeepSeek-V4 MoE routing primitives (HashRouter, TopKRouter, RoutedMoE) [DeepSeek-V4] Implement MoE routing primitives (HashRouter, TopKRouter, RoutedMoE) May 14, 2026
@github-actions
Copy link
Copy Markdown

🤖 Hi @parambole, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants