Skip to content

Conversation

@ajkv-google
Copy link
Collaborator

@ajkv-google ajkv-google commented Jan 14, 2026

Summary

Added training scripts for HSTU using both Keras and Jax trainers. This is an implementation of how HSTU in this library can be trained on TPU using different trainers. The hyper parameters (e.g vocab_size, etc.) are set based on the Amazon Books dataset. However, those can be changed based on the dataset used and other factors.

Verified training on Trillium chip, which ran successfully when using both trainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant