GPT-OSS-20B Pretraining by suachong · Pull Request #862 · mlcommons/training

suachong · 2026-01-19T20:13:40Z

This PR provides the reference code for GPT-OSS-20B using Primus framework that can be run on both AMD and NVIDIA hardware.

address review from previous PR Credit co-authors for prior squash Co-authored-by: ZixianWangAMD <zixiwang@amd.com> Co-authored-by: Michal Marcinkiewicz <michalm@nvidia.com> Co-authored-by: Lukasz Pierscieniewski <l.pierscieniewski@gmail.com>

disable async save and save intermediate checkpoint

…arget log perplexity to be 3.3 for consistency purposes

…ons-storage.org/index.html

This reverts commit fed1bb4.

github-actions · 2026-01-19T20:13:49Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

ShriyaRishab · 2026-01-26T16:29:03Z

@mmarcinkiewicz can you please review this?

gpt-oss-20b/primus/README.md

mmarcinkiewicz · 2026-02-02T18:20:09Z

It seems the datadir needs to be writeable (presumably to store the index) - can we put index into a different dir so the datadir stays RO?

gpt-oss-20b/primus/conf/gpt_oss_20B-pretrain-nvidia.yaml

mmarcinkiewicz · 2026-02-04T07:05:58Z

gpt-oss-20b/primus/conf/gpt_oss_20B-pretrain-nvidia.yaml

+      fp8: null  # Disabled - using bf16 instead
+
+      # hyper parameters
+      train_iters: ${PRIMUS_TRAIN_ITERS:20000}


we need to talk about that

gpt-oss-20b/primus/conf/gpt_oss_20B-pretrain-nvidia.yaml

Add option to run with SLURM

pbaumstarck

Looking good overall and I got the code running. Another minor comment that we don't have any binary whl files in the repo, so it'd be ideal if we could dynamically retrieve and install that.

pbaumstarck · 2026-02-12T16:17:07Z

gpt-oss-20b/primus/src/train.py

+    rank = int(os.getenv("RANK", "0"))
+    world_size = int(os.getenv("WORLD_SIZE", "1"))
+    master_addr = os.getenv("MASTER_ADDR", "127.0.0.1")
+    master_port = int(os.getenv("MASTER_PORT", "29500"))


This conflicts with the port being set to 29501 in the shell commands. Should these all be the same?

pbaumstarck · 2026-02-12T16:20:02Z

gpt-oss-20b/primus/run_and_time.sh

+# Report result
+result=$(( end - start ))
+result_name="GPT_OSS_20B"
+echo "RESULT,$result_name,,$result,AMD,$start_fmt"


Hardcoded "AMD" string but this code is shared between vendors.

suachong and others added 14 commits August 15, 2025 07:13

add small llm pretraining

416de68

address review from previous PR Credit co-authors for prior squash Co-authored-by: ZixianWangAMD <zixiwang@amd.com> Co-authored-by: Michal Marcinkiewicz <michalm@nvidia.com> Co-authored-by: Lukasz Pierscieniewski <l.pierscieniewski@gmail.com>

Update pretrain_llama31.py

9c98fa6

disable async save and save intermediate checkpoint

Merge remote-tracking branch 'mlcommons/master'

2159c30

update README instruction, minor change to callback and set default t…

08d765e

…arget log perplexity to be 3.3 for consistency purposes

set LR for GB32

7b6544d

update README with download instructions from https://training.mlcomm…

2b66faa

…ons-storage.org/index.html

merge mlcommons/master

bf7b2a7

update link to https://github.com/mlcommons/r2-downloader

fed1bb4

Revert "update link to https://github.com/mlcommons/r2-downloader"

7f8f3ec

This reverts commit fed1bb4.

merge mlcommons/master

bb2d788

Merge remote-tracking branch 'upstream/master'

9f3b94c

initial commit for gpt-oss-20b

c5bff49

update README

b183204

update NV dockerfile and readme

a1a6356

suachong added 4 commits January 19, 2026 19:37

set random seed

77eaf43

update license for amd + clean up rocm dockerfile

aa7cbd7

revisit the target log perplexity after establishing rcp

c110d5d

remove target metric

f856c7f

suachong marked this pull request as ready for review January 23, 2026 17:50

suachong requested a review from a team as a code owner January 23, 2026 17:50

mmarcinkiewicz reviewed Jan 30, 2026

View reviewed changes

gpt-oss-20b/primus/README.md Outdated Show resolved Hide resolved

remove HF_TOKEN

83bcba0

mmarcinkiewicz added 5 commits February 3, 2026 09:49

Update run_and_time.sh

4129037

Create run.sub

71c75b8

Update run_and_time.sh

97dbfe3

Update README.md

97e2a50

Update run.sub

14ccfe7

mmarcinkiewicz added 4 commits February 3, 2026 16:34

Update run.sub

c5bf7cb

Update run.sub

de38b48

Update Dockerfile.nvidia

397ce2b

Update run.sub

c0ff227

mmarcinkiewicz reviewed Feb 4, 2026

View reviewed changes

gpt-oss-20b/primus/conf/gpt_oss_20B-pretrain-nvidia.yaml Outdated Show resolved Hide resolved

mmarcinkiewicz reviewed Feb 4, 2026

View reviewed changes

gpt-oss-20b/primus/conf/gpt_oss_20B-pretrain-nvidia.yaml Show resolved Hide resolved

mmarcinkiewicz reviewed Feb 4, 2026

View reviewed changes

gpt-oss-20b/primus/conf/gpt_oss_20B-pretrain-nvidia.yaml Outdated Show resolved Hide resolved

suachong and others added 7 commits February 4, 2026 13:06

update configs to match 8b, addressed pr comments

ea6e87e

Merge branch 'suachong:master' into master

b448b8c

Merge pull request #2 from mmarcinkiewicz/master

db6a5df

Add option to run with SLURM

expose adam_eps with env var

18bdd3a

Merge branch 'master' of https://github.com/suachong/training

1c7d631

update more configs based on hf + nvidia configs

78a267f

remove yarn patch and add primus evaluator patch

9f6e611

pbaumstarck reviewed Feb 12, 2026

View reviewed changes

update megatron validation consumed samples

381f4ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-OSS-20B Pretraining#862

GPT-OSS-20B Pretraining#862
suachong wants to merge 36 commits intomlcommons:masterfrom
suachong:master

suachong commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026 •

edited

Loading

Uh oh!

ShriyaRishab commented Jan 26, 2026

Uh oh!

Uh oh!

mmarcinkiewicz commented Feb 2, 2026

Uh oh!

Uh oh!

mmarcinkiewicz Feb 4, 2026

Uh oh!

Uh oh!

Uh oh!

pbaumstarck left a comment

Uh oh!

pbaumstarck Feb 12, 2026

Uh oh!

pbaumstarck Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

suachong commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShriyaRishab commented Jan 26, 2026

Uh oh!

Uh oh!

mmarcinkiewicz commented Feb 2, 2026

Uh oh!

Uh oh!

mmarcinkiewicz Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pbaumstarck left a comment

Choose a reason for hiding this comment

Uh oh!

pbaumstarck Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

pbaumstarck Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Jan 19, 2026 •

edited

Loading