You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A polished PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer. Designed for clarity, reproducibility, and interoperability with HuggingFace Transformers, this repository provides a robust baseline for research and engineering being fully configurable. The codebase emphasizes readable, well-documented components so you can iterate on attention mechanisms, Feed-Forward, Attention and Normalization blocks and other architectural variants with minimal friction.
10
+
_A polished **PyTorch implementation** of the current **State-Of-The-Art(SOTA) Transformer**. Designed for clarity, reproducibility, and interoperability with **HuggingFace Transformers**, this repository provides a robust baseline for **Research** and **Engineering** being **Fully Configurable**. The codebase emphasizes **readable and well-documented components** so you can iterate on **Feed-Forward**, **Attention** and **Normalization** blocks and other **architectural variants** with minimal friction._
11
11
12
12
## Features
13
13
-**Fully Configurable** architecture (layers, heads, model dimensions, dropout, etc.)
14
-
- HuggingFace-compatible API alignment.
15
-
- Compact and easily extensible design for rapid prototyping and research experiments.
16
-
- Clear, well-documented modules to facilitate experimentation with attention, FFNs, etc.
14
+
-**HuggingFace-compatible** API alignment.
15
+
-**Compact and easily extensible** design for rapid prototyping and research experiments.
16
+
-**Clear, well-documented modules** to facilitate experimentation with attention, FFNs, etc.
17
17
18
18
## Download the code
19
19
```bash
@@ -46,7 +46,7 @@ config = TransformerConfig(
46
46
n_layers=12,
47
47
n_heads: int=32,
48
48
d_model: int=1536,
49
-
qk_norm: bool=False,
49
+
attn_qk_norm: bool=False,
50
50
tied_weights: bool=False,
51
51
seq_len: int=1024,
52
52
max_seq_len: int=4096,
@@ -69,18 +69,25 @@ from transformer import TransformerConfig
0 commit comments