[DeepSeek-V4] Implement Compressed Attention Layers#3866
Open
parambole wants to merge 1 commit into
Open
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
5f54827 to
07eb3e2
Compare
37ee811 to
31329c5
Compare
07eb3e2 to
4520166
Compare
31329c5 to
22a57ff
Compare
4520166 to
10ca4f6
Compare
22a57ff to
32869e5
Compare
10ca4f6 to
31a5932
Compare
32869e5 to
c92f2e0
Compare
…ghtningIndexer) Implement compressed attention mechanisms and indexer modules for DeepSeek-V4 integration into MaxText: - CSACompressor & HCACompressor: Long-range attention compressors supporting causal block bias and YaRN frequency scaling decoupling. - LightningIndexer: Memory-efficient indexer module implementing sentinel masking and dynamic RoPE scaling. - Configuration: Register attention compression hyperparameters (compress_ratios, index_head_dim, sliding_window) in types.py and base.yml. - Parity verification: Extended unit test suite (deepseek_v4_vs_reference_test.py) validating attention compression parity against PyTorch reference implementations at atol=1e-5, rtol=1e-5.
31a5932 to
c98a34e
Compare
|
🤖 Hi @parambole, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Implement compressed attention mechanisms and indexer modules required for DeepSeek-V4 integration into MaxText:
CSACompressor&HCACompressor: Long-range attention compressors supporting causal block bias and YaRN frequency scaling decoupling.LightningIndexer: Memory-efficient indexer module implementing sentinel masking and dynamic RoPE scaling.compress_ratios,index_head_dim,sliding_window) intypes.pyandbase.yml.tests/unit/deepseek_v4_vs_reference_test.py) validating attention compression parity against PyTorch reference implementations atatol=1e-5, rtol=1e-5.Tests
Tested on CPU
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.