[DeepSeek-V4] Implement model integration, decoders, and configuration stack#3867
Open
parambole wants to merge 1 commit into
Open
[DeepSeek-V4] Implement model integration, decoders, and configuration stack#3867parambole wants to merge 1 commit into
parambole wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
5f54827 to
07eb3e2
Compare
f84da67 to
3b1d7be
Compare
07eb3e2 to
4520166
Compare
b09cac5 to
45f13b0
Compare
4520166 to
10ca4f6
Compare
a1e3133 to
efc2768
Compare
10ca4f6 to
31a5932
Compare
efc2768 to
6bcffb8
Compare
31a5932 to
c98a34e
Compare
…ation stack Implement full model architecture, decoder integration layers, and execution configurations for DeepSeek-V4 integration into MaxText: - deepseek_v4.py: Model architecture definition supporting cyclical layer stacking and hyper-connections. - decoders.py & nnx_decoders.py: Integration of DeepSeekV4DecoderLayer, supporting get_attention_type routing and scanned vs unrolled compilation parity. - mhc.py & engram.py: Integration of multi-head hyper-connections (mHC) and engram memory management. - Configuration: Register model configs (deepseek_v4-flash.yml, deepseek_v4-tiny.yml) and hyperparameter definitions in base.yml and types.py. - Parity verification: Comprehensive unit test suite (deepseek_v4_vs_reference_test.py) validating end-to-end decoder block parity against PyTorch reference implementations at atol=1e-5, rtol=1e-5.
07ab2eb to
743f096
Compare
|
🤖 Hi @parambole, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
|
🤖 I'm sorry @parambole, but I was unable to process your request. Please see the logs for more details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Implement full model architecture, decoder integration layers, and execution configurations required for DeepSeek-V4 integration into MaxText:
deepseek_v4.py: Model architecture definition supporting cyclical layer stacking, multi-head hyper-connections (mHC), and engram memory management.decoders.py&nnx_decoders.py: Integration ofDeepSeekV4DecoderLayer, supportingget_attention_typerouting and scanned vs unrolled compilation parity.mhc.py&engram.py: Execution layers for additive hyper-connections and engram state projection.deepseek_v4-flash.yml,deepseek_v4-tiny.yml) and hyperparameter definitions inbase.ymlandtypes.py.tests/unit/deepseek_v4_vs_reference_test.py) validating end-to-end decoder block parity against PyTorch reference implementations atatol=1e-5, rtol=1e-5.Tests
Tested on CPU
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.