Skip to content

Commit b18eef8

Browse files
committed
Update README.md
1 parent 0d80d09 commit b18eef8

1 file changed

Lines changed: 18 additions & 3 deletions

File tree

README.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,11 +76,26 @@ This project is an implementation study and relies heavily on the brilliant theo
7676
* **Mixture of Experts:** Shazeer, N., et al. (2017). *Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer*. [arXiv:1701.06538](https://arxiv.org/abs/1701.06538)
7777
* **Inspiration:** Jamba (AI21 Labs) and OpenMoE.
7878
79-
## 🧠 Model Weights
79+
## 🧠 Model Weights & Checkpoints
8080
81-
Currently working on training a small model. Once complete, I will publish the final checkpoints on my [Hugging Face profile](https://huggingface.co/Pomilon).
81+
All pre-trained checkpoints are hosted on the [Hugging Face Hub](https://huggingface.co/Pomilon).
8282
83-
> **Note:** Don't expect something that rivals state-of-the-art models :D! This is a proof-of-concept for the architecture.
83+
| Model Artifact | Step | Description | Download |
84+
| :--- | :--- | :--- | :--- |
85+
| **Aetheris-Base** | 10k | Early convergence checkpoint (Loss ~3.66). Good for analyzing router behavior. | [🤗 Hugging Face](https://huggingface.co/Pomilon/Aetheris) |
86+
| **Aetheris-Chat** | -- | *Coming Soon (Post-SFT)* | -- |
87+
88+
> **⚠️ Important:** Aetheris uses a custom Hybrid Mamba-MoE architecture. You **cannot** load it directly with `transformers.AutoModel`. You must use the interface provided in this repository.
89+
90+
### 🐍 How to Load
91+
92+
```python
93+
python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir path/to/checkpoints_folder # rename the checkpoint inside to checkpoint_current.pth
94+
```
95+
> **Note:** will add better inference later down the line, for now used this scuffed version. :D
96+
97+
> **Note:** These weights are from an experimental run. While they demonstrate the architectural capabilities, do not expect GPT-5 or even google bard level coherence. :D
98+
> this project was made for learning and fun!
8499
85100
## License
86101

0 commit comments

Comments
 (0)