Added training scripts for HSTU using keras and jax trainers #140

ajkv-google · 2026-01-14T00:34:22Z

Summary

Added training scripts for HSTU using both Keras and Jax trainers. This is an implementation of how HSTU in this library can be trained on TPU using different trainers. The hyper parameters (e.g vocab_size, etc.) are set based on the Amazon Books dataset. However, those can be changed based on the dataset used and other factors.

Verified training on Trillium chip, which ran successfully when using both trainers.

ajkv-google added 3 commits January 14, 2026 00:30

Added training scripts for HSTU using keras and jax trainers

172ee51

Updated comment

f59d184

Added dockerfile for future use

0e3a892

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added training scripts for HSTU using keras and jax trainers #140

Added training scripts for HSTU using keras and jax trainers #140

Uh oh!

ajkv-google commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Added training scripts for HSTU using keras and jax trainers #140

Are you sure you want to change the base?

Added training scripts for HSTU using keras and jax trainers #140

Uh oh!

Conversation

ajkv-google commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ajkv-google commented Jan 14, 2026 •

edited

Loading