You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-3Lines changed: 18 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,11 +76,26 @@ This project is an implementation study and relies heavily on the brilliant theo
76
76
* **Mixture of Experts:** Shazeer, N., et al. (2017). *Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer*. [arXiv:1701.06538](https://arxiv.org/abs/1701.06538)
77
77
* **Inspiration:** Jamba (AI21 Labs) and OpenMoE.
78
78
79
-
## 🧠 Model Weights
79
+
## 🧠 Model Weights & Checkpoints
80
80
81
-
Currently working on training a small model. Once complete, I will publish the final checkpoints on my [Hugging Face profile](https://huggingface.co/Pomilon).
81
+
All pre-trained checkpoints are hosted on the [Hugging Face Hub](https://huggingface.co/Pomilon).
82
82
83
-
> **Note:** Don't expect something that rivals state-of-the-art models :D! This is a proof-of-concept for the architecture.
83
+
| Model Artifact | Step | Description | Download |
84
+
| :--- | :--- | :--- | :--- |
85
+
| **Aetheris-Base** | 10k | Early convergence checkpoint (Loss ~3.66). Good for analyzing router behavior. | [🤗 Hugging Face](https://huggingface.co/Pomilon/Aetheris) |
> **⚠️ Important:** Aetheris uses a custom Hybrid Mamba-MoE architecture. You **cannot** load it directly with `transformers.AutoModel`. You must use the interface provided in this repository.
89
+
90
+
### 🐍 How to Load
91
+
92
+
```python
93
+
python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir path/to/checkpoints_folder # rename the checkpoint inside to checkpoint_current.pth
94
+
```
95
+
> **Note:** will add better inference later down the line, for now used this scuffed version. :D
96
+
97
+
> **Note:** These weights are from an experimental run. While they demonstrate the architectural capabilities, do not expect GPT-5 or even google bard level coherence. :D
0 commit comments