Skip to content

GPT-OSS-20B Pretraining#862

Open
suachong wants to merge 36 commits intomlcommons:masterfrom
suachong:master
Open

GPT-OSS-20B Pretraining#862
suachong wants to merge 36 commits intomlcommons:masterfrom
suachong:master

Conversation

@suachong
Copy link
Contributor

This PR provides the reference code for GPT-OSS-20B using Primus framework that can be run on both AMD and NVIDIA hardware.

@github-actions
Copy link

github-actions bot commented Jan 19, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@suachong suachong marked this pull request as ready for review January 23, 2026 17:50
@suachong suachong requested a review from a team as a code owner January 23, 2026 17:50
@ShriyaRishab
Copy link
Contributor

@mmarcinkiewicz can you please review this?

@mmarcinkiewicz
Copy link
Contributor

It seems the datadir needs to be writeable (presumably to store the index) - can we put index into a different dir so the datadir stays RO?

fp8: null # Disabled - using bf16 instead

# hyper parameters
train_iters: ${PRIMUS_TRAIN_ITERS:20000}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to talk about that

Copy link

@pbaumstarck pbaumstarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall and I got the code running. Another minor comment that we don't have any binary whl files in the repo, so it'd be ideal if we could dynamically retrieve and install that.

rank = int(os.getenv("RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
master_addr = os.getenv("MASTER_ADDR", "127.0.0.1")
master_port = int(os.getenv("MASTER_PORT", "29500"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflicts with the port being set to 29501 in the shell commands. Should these all be the same?

# Report result
result=$(( end - start ))
result_name="GPT_OSS_20B"
echo "RESULT,$result_name,,$result,AMD,$start_fmt"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded "AMD" string but this code is shared between vendors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants