Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
7572 commits
Select commit Hold shift + click to select a range
c6096d9
Update PR template (#4904)
Phlip79 May 21, 2026
0588bf7
ci: Update perf test to output logs for tests to pass (#4906)
chtruong814 May 21, 2026
6576040
Also persist asymmetrical units for the MXFP8 transpose weight buffer…
cspades May 21, 2026
80a2d39
fix no_shard training convergency and add unittest for no_shard (#3754)
wplf May 21, 2026
e9a3184
Move policy epoch stats to the message object (#4533)
ArEsKay3 May 21, 2026
9a7cd17
Add a knob to throttle the max allowed inflight offload in fine grain…
nanz-nv May 21, 2026
5e4fc93
refactor(data): consolidate get_batch and enable PP for SFT THD (#4103)
asolergi-nv May 21, 2026
0d198cd
Allow YAML MoE configs to use model specs (#4822)
chawkins-nvidia May 21, 2026
547fb17
Move bert and t5 pretrain files (#4820)
Phlip79 May 21, 2026
e27607a
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] May 22, 2026
f007db7
Paged Stashing (#4247)
nanz-nv May 22, 2026
4c63602
make FP4 param gather work with the mixed precisions in NVFP4 recipe …
xrennvidia May 22, 2026
4db6fa4
fix: Fix multi-node functional test phase sync (#4924)
chtruong814 May 22, 2026
686aa8c
Perf tests (#4917)
shanmugamr1992 May 22, 2026
0beaa53
fix(cuda_graphs): handle TE 2.15 removal of FP8GlobalStateManager.set…
balasaajay May 22, 2026
6c1bd6e
Fix paged stashing test submodules lookup (#4925)
Phlip79 May 22, 2026
fa7a23b
Add TEFusedDenseMLP for Dense+Grouped GEMM fusion on SM100+ (#4318) (…
sraman-rgb May 22, 2026
5f79118
Fix mxfp8 param gather numerical issue when DP overlap is off (#4800)
WanZzzzzz May 22, 2026
08bad7a
[MXFP8/FP4-param-gather] Post processing after forced param AG in eva…
WanZzzzzz May 22, 2026
3fb34c6
ci: Update training script paths in BERT and T5 (#4939)
balasaajay May 22, 2026
34560c4
Various training utils (#4872)
maanug-nv May 22, 2026
f7f584d
ci: restore perf test torchrun logs (#4951)
chtruong814 May 23, 2026
4bd8bb3
Fix `get_batch` return order to ignore BlendedDataset provenance fiel…
deepakn94 May 23, 2026
be2b2cd
test(release): add release goldens for deepseekv3/nemotron3 and set t…
ko3n1g May 25, 2026
3b2b6e7
chore(beep boop 🤖): Bump (main) (2026-05-25)
github-actions[bot] May 25, 2026
2f754f4
test: enable NVTE_CUTEDSL_FUSED_GROUPED_MLP via pytest fixture (#4931)
ko3n1g May 25, 2026
2bd9fd5
Avoid offsetting functional test master port (#4973)
chtruong814 May 25, 2026
4415119
Fix elastification unwrap_model import (#4972)
Devil1716 May 25, 2026
432d76b
test: re-enable paged stashing MoE tests (#4978)
ko3n1g May 26, 2026
ff64743
test(ci): re-enable 8experts2parallel_multi_dist_optimizer_instances_…
ko3n1g May 26, 2026
0e5cd0e
ci: Add support for MBridge job gating based on PR labels (#4926)
balasaajay May 26, 2026
08c368a
test: re-enable test_pp2_create_cudagraphs_first_stage on TE 2.15+ (#…
ko3n1g May 26, 2026
6ce6fac
fix(tests): initialize num_microbatches calculator in vision cudagrap…
ko3n1g May 26, 2026
859b719
ci: Add allow_failure flag to gpt and moe recipes that are failing in…
balasaajay May 26, 2026
88e7ab0
Drain predecessor reduce-scatter at dispatch time (#4940)
deepakn94 May 27, 2026
e6b2bd8
nightly(ci): Update golden values for functional t5 tests (#4995)
balasaajay May 27, 2026
7521ecb
chore: rotate oncall schedule
github-actions[bot] May 27, 2026
873678a
[main] Refactor and Improve MoE Logginginit commit (#3431)
yanring May 27, 2026
4e52a9e
ci: validate release branch-rules (#4929)
ko3n1g May 27, 2026
67b2f38
[Megatron-FSDP] Add conditional param.grad dereferencing logic to sup…
cspades May 27, 2026
71223d5
test: restrict iter-time comparison to steady-state window (#5010)
ko3n1g May 27, 2026
a6d61fb
fix(test): pin eval-global-batch-size on 15b gb200 release configs (#…
ko3n1g May 27, 2026
286445c
[fix] Release MTP assertion when EP overlap with PP=1 (#4796)
Wohox May 27, 2026
0cb4034
fix(test): widen iter-time steady-state window for short tests (#5023)
ko3n1g May 27, 2026
805e24e
Perf fix (#4996)
shanmugamr1992 May 27, 2026
457e3f7
Add dev-feature preservation gate and change schedule (#4773)
Phlip79 May 27, 2026
146d171
chore(test): remove orphan nemotron3_super_release_g200 dir (#5024)
ko3n1g May 27, 2026
7be1748
Ignore Claude worktree directory (#5020)
Phlip79 May 27, 2026
9d23a73
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] May 28, 2026
3e00820
ci: update CI workflow conditions for integration tests (#4658)
balasaajay May 28, 2026
0010683
Add NVSkills CI request workflow (#5033)
Phlip79 May 28, 2026
2a4f820
DDP wrap pg size fixes (#5006)
maanug-nv May 28, 2026
e0afe89
fix(layer_wise): tag MTP-stage word_embeddings as is_embedding_or_out…
Wohox May 28, 2026
4f2e7ce
Move LTS dependencies from pyproject.toml to Dockerfile.ci.lts (#4877)
balasaajay May 28, 2026
88efdf7
Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback fix …
kevalmorabia97 May 28, 2026
a5c2d1b
test(release): skip golden comparison on intermediate resume windows …
ko3n1g May 28, 2026
2ebca1e
[mimo] Thread position_ids through MimoModel for multimodal RoPE (#4938)
liding-nv May 28, 2026
e6ff4c3
build: Switch DSv3 on H100 to HybridEP (#5039)
ko3n1g May 28, 2026
ba9e0ea
Fix: Import unwrap_model from megatron.core.utils in modelopt example…
kevalmorabia97 May 28, 2026
3c39d98
Simple and stable Inference APIs (#4697)
YangFei1990 May 29, 2026
f63a46e
ci: Add notification step for MBridge downstream test results (#5028)
balasaajay May 29, 2026
f1b5516
Delete output tensor early (#4742)
Phlip79 May 29, 2026
f8e3885
Support ScaledSReLU in TE grouped MLP fuser (#4859)
sraman-rgb May 29, 2026
1801316
Skip gradient updates when grad norm exceeds threshold (#3460)
yfw May 29, 2026
6e0d14a
Add 9 user skills (#5066)
Phlip79 May 29, 2026
54dc530
test(nemotron): align nemotron3 super GB200 goldens with exit-interva…
ko3n1g May 30, 2026
6e091e1
chore: Update transformer-engine dependency to version 2.16.0 (#4992)
balasaajay May 30, 2026
52d1d68
Update energon version requirement (#4572)
maanug-nv May 30, 2026
791a45f
Fix test failures for new inference APIs (#5068)
YangFei1990 May 30, 2026
f33a51f
fix(ci): set PYTHONUNBUFFERED=1 in JET workload env (#5072)
ko3n1g May 30, 2026
378d81f
Preserve non-FSDP-unit buckets across AllGatherPipeline reset (#4717)
wujingyue May 30, 2026
3f6a2ed
Add opt-in MXFP8 LM-head output projection (#4825)
gdengk May 31, 2026
24cc2a8
chore(beep boop 🤖): Bump (main) (2026-06-01)
github-actions[bot] Jun 1, 2026
0a1726a
fix(ci): bound JET pipeline polling with a watchdog to prevent indefi…
ko3n1g Jun 1, 2026
595f697
ci: prune old artifacts on cluster lustre during weekly/release runs …
ko3n1g Jun 1, 2026
46f1af7
ci(test): isolate ckpt-resume tensorboard per phase (#5074)
ko3n1g Jun 1, 2026
33da12c
test: unmark EP A2A activation offload test flaky (#5009)
lhb8125 Jun 1, 2026
e656940
Change ownership groups (#5021)
Phlip79 Jun 1, 2026
496e1ff
test: skip mfsdp_fully_shard cases when world_size < mesh size (#4487)
wujingyue Jun 1, 2026
80cf756
fix mimo optimizer checkpoint metadata restore (#4791)
liding-nv Jun 1, 2026
de030fc
[mimo] Support bridge fan-out for variable modality tokens (#5062)
liding-nv Jun 1, 2026
b8dcaab
Add separate mtp_grad_scale_func for MTP loss scaling (#3459)
yfw Jun 1, 2026
5f88438
[training migration] Migrate GPT builder (#4741)
maanug-nv Jun 1, 2026
35992ba
Make Mamba conv params direct mixer params (#4899)
wujingyue Jun 1, 2026
907ebcd
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Jun 2, 2026
f5da5ea
Update oncall reviewer assignment (#5093)
Phlip79 Jun 2, 2026
bb192ff
Pass explicit process groups to hybrid logging (#4781)
yashaswikarnati Jun 2, 2026
7d9f259
Clean up top-level repository files (#5097)
Phlip79 Jun 2, 2026
b0eb914
[main] fix(moe): Fix several bugs for DSA rope and spec. (#3026)
yuzhongw-nvidia Jun 2, 2026
c69697d
Move MIMO unit tests into models/mimo (#5063)
yashaswikarnati Jun 2, 2026
54e1996
test: update DeepSeek FSDP2 GB200 memory golden (#5094)
wujingyue Jun 2, 2026
a1ac2f6
Remove DeepEP hardware limit check (#4846)
janEbert Jun 2, 2026
591605e
Update transformer-engine dependency to revision 4220403 (#5112)
balasaajay Jun 2, 2026
44f254a
ci: make CI resilient to pip/uv network timeouts (#5118)
ko3n1g Jun 2, 2026
d94604e
ci: treat docker container-removal conflict as flaky (#5120)
ko3n1g Jun 2, 2026
655a064
Fix GDN DTensor splitting for FSDP checkpointing (#4843)
conver334 Jun 2, 2026
9ef8a2a
Fix MoE aux_loss / z_loss gradient scaling with TP > 1 (#5047)
deepakn94 Jun 2, 2026
540ff5d
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Jun 3, 2026
eac5ca7
Update Claude copy workflow to enforce user restrictions and improve …
balasaajay Jun 3, 2026
a823ea0
Add advisory process group guidance to Claude reviews (#5111)
yashaswikarnati Jun 3, 2026
3b6bc31
build: cap pydantic<2.14 in transformer-engine dependency metadata (#…
cuichenx Jun 3, 2026
20f5290
fix(test): skip scalar-less tensorboard event files in resume checks …
ko3n1g Jun 3, 2026
64c3fb8
chore: rotate oncall schedule
github-actions[bot] Jun 3, 2026
c2a3bec
docs: fix contributor guide typo (#4858)
LeSingh1 Jun 3, 2026
4d104fd
ci(unit-tests): split slow unit-test buckets over 15min SLA (#5133)
ko3n1g Jun 3, 2026
ec97c85
Fix DSA indexer loss not averaged across micro-batches (#4070)
kaimo455 Jun 3, 2026
7a7f7b4
Update MINOR version to 19 (#5096)
balasaajay Jun 3, 2026
947d6ae
Fix Muon QKV split for gated attention (#4728)
Moozy23232 Jun 3, 2026
ffd66a3
Roll input IDs for MTP labels (#3457)
yfw Jun 3, 2026
b8fef11
Refactor: Move paged stashing Triton kernels (#5003)
Phlip79 Jun 3, 2026
a377dee
Adding blackwell tests (#5113)
shanmugamr1992 Jun 3, 2026
d9bd342
Relax atol for test_router_gating_linear router_dtype=torch.float32 (…
adityasingh2400 Jun 3, 2026
e513ec4
Fix incorrect inference metadata tensor dtypes (#4855)
santhnm2 Jun 3, 2026
168cb15
Disable TE cross entropy loss fusion (#5115)
mchrzanowski Jun 3, 2026
b80a854
fix(optimizer): gate ChainedOptimizer MXFP8 defer-sync on DDP-level o…
ko3n1g Jun 3, 2026
669b747
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Jun 4, 2026
4a47b16
Pass TP group to unfused cross entropy (#5128)
yashaswikarnati Jun 3, 2026
471e3e3
test(elastification): quarantine flaky test_gumbel_determinism as fla…
ko3n1g Jun 4, 2026
bdcaf26
ci(notify): mention mcore-oncall and philipp on critical CI events (#…
ko3n1g Jun 4, 2026
16b7194
Change the cudagraph distribution from linearly to exponentially-decr…
mathemakitten Jun 4, 2026
4108d12
ci: Disable a few gb200 test cases to support 2 branches. (#5151)
balasaajay Jun 4, 2026
53e3748
build: Switch DSv3 on H100 to HybridEP (#5164)
balasaajay Jun 4, 2026
d041544
Add MTP acceptance rate metrics (#3458)
yfw Jun 4, 2026
6f14bf9
Nemotron Ultra config for ModelOpt examples (#5159)
jenchen13 Jun 4, 2026
53b5e6e
Make MTP / prefix cache stats persist for engine lifetime (#4101)
santhnm2 Jun 4, 2026
2944537
Restore Greptile configuration (#5166)
Phlip79 Jun 4, 2026
96a894e
chore: bump `_code_freeze` workflow to `v1.4.2` (#5132)
ko3n1g Jun 4, 2026
5a2b598
ci: Remove docs build test in favor of release test (#5182)
chtruong814 Jun 5, 2026
b574499
Move TE cross entropy guard to training args (#5162)
yaoyu-33 Jun 5, 2026
5a2c6fa
Fix error in deepseek parser (#5136)
tdene Jun 5, 2026
9d33299
Fix logprob slicing for 0 generated token case (#5167)
santhnm2 Jun 5, 2026
0f891a1
[Perf] Fold frozen linear dgrad matmul (#5092)
cuichenx Jun 5, 2026
53d9ba0
Clamp `max_new_tokens` in MInf to mirror vllm (#5181)
tdene Jun 5, 2026
dbf5228
build: add managed = true to [tool.uv] (#5190)
kajalj22 Jun 5, 2026
6204b92
Stabilize GB200 inference perf tests against cold-start noise (#5171)
shanmugamr1992 Jun 5, 2026
947c0b3
nvidia style guide audit for getting started folder (#5168)
megnvidia Jun 8, 2026
05c76af
AI aided audit for Nvidia Style guidance (#5141)
megnvidia Jun 8, 2026
ff5264c
Enable selective recompute for `norm_out` in GDN layers (#4715)
xuantengh Jun 8, 2026
adcdf16
fix(elastification): align with get_batch + utils refactors (#5194)
balasaajay Jun 8, 2026
5da641c
fix(combined-1f1b): release loss-node input storage after combined ba…
Wohox Jun 8, 2026
dbf719b
chore(beep boop 🤖): Bump (main) (2026-06-08)
github-actions[bot] Jun 8, 2026
17a5a80
Avoid stat syscall in rerun result validation (#5107)
dimapihtar Jun 8, 2026
71e2246
docs: Update Latest News in README.md (#3790)
sbhavani Jun 8, 2026
3e6e32b
Fix bug with Megatron-FSDP zero counter not working with decoupled gr…
cspades Jun 8, 2026
1829fd9
ci: add smoke tests (#5143)
balasaajay Jun 8, 2026
71e418e
Add mtp_detach_heads config to detach MTP head inputs (#3456)
yfw Jun 8, 2026
a95d866
docs: fix install guide NGC container anchor (#5224)
Connor-XY Jun 9, 2026
48032d7
Fuse per-sequence AlltoAll into a unified one in GDN forward (#4913)
xuantengh Jun 9, 2026
3920476
Apply MIMO SP/CP sharding with explicit groups and enable THD in non-…
yashaswikarnati Jun 9, 2026
d199bb9
Fix CUDA IMA in fsdp_double_buffer when an FSDP unit's bucket doesn't…
wujingyue Jun 9, 2026
ba71ec2
Add named layouts to HyperCommGrid for heterogeneous parallelism (#5148)
yashaswikarnati Jun 9, 2026
55638bc
Fix wgrad race condition when using double buffers. (#5222)
cspades Jun 10, 2026
504b02f
Move uneven DTensor distributed fixture to conftest (#5237)
wujingyue Jun 10, 2026
b180d2c
Route bridge communicator cross-grid P2P through a dedicated process …
yashaswikarnati Jun 10, 2026
e03878b
Fix test_split_tensor_along_last_dim to actually assert correctness (…
lichenlu Jun 10, 2026
2065b5a
chore: rotate oncall schedule
github-actions[bot] Jun 10, 2026
c6af841
Add optional group= to common_utils model/data-parallel reduction hel…
yashaswikarnati Jun 10, 2026
8baac4d
ci: Allow DCO check in merge queue and add DCO requirement to Contrib…
chtruong814 Jun 10, 2026
84d8bfe
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Jun 11, 2026
aa10571
Add MIMO hetero topology + distributed bootstrap (examples/mimo train…
yashaswikarnati Jun 10, 2026
3a183e2
Remove checkpoint-time GPU cache reclaim workaround (#5170)
shurkat-nvidia Jun 11, 2026
1af933d
Remove duplicate nccl_allocator import (#5057)
returnL Jun 11, 2026
930bb6f
fix(ci): resolve t5 dataloader stall + GRPO cudagraph-memory regressi…
Connor-XY Jun 11, 2026
d81b638
Fix Dockerfile warnings (#4856)
janEbert Jun 11, 2026
76d26e2
Fix fused MLA delayed weight grad hooks (#5273)
sraman-rgb Jun 11, 2026
5321c66
ci: limit retries on unsuccessful test launches (#5275)
balasaajay Jun 11, 2026
75e382c
Thread pg_collection into get_model DDP bucket sizing (#5250)
yashaswikarnati Jun 11, 2026
5ccd936
Enable non-deterministic results in model configuration for nemotron …
balasaajay Jun 11, 2026
bd1f0dd
Stabilize hybrid nanov3 gb200 perf (#5295)
shanmugamr1992 Jun 11, 2026
5b16c99
Clip mtp grads separately when mtp_detach_heads=True (#4116)
yfw Jun 11, 2026
de6305c
Thread pg_collection into train_step reductions (#5259)
yashaswikarnati Jun 11, 2026
df9141e
Allow for pre-bound socket to be passed in server (#5301)
tdene Jun 12, 2026
277c4f8
Offline Logits-Based Knowledge Distillation (#5019)
AAnoosheh Jun 12, 2026
1f537e8
Handle None values in sampling parameters (#5300)
tdene Jun 12, 2026
c0c1f91
Add moe loss normalization for RL SFT (#3956)
pthombre Jun 12, 2026
18a2f55
Add code owners for optimizer-related files (#5297)
janEbert Jun 12, 2026
b45ae73
Fix EP=1 inference by allocating buffers anyway (#5233)
mathemakitten Jun 12, 2026
806022f
Fix crash due to tool call at sequence length (#5302)
tdene Jun 13, 2026
ef549a6
Inference: Cudagraph-aware admission gating in prefill scheduler (#4870)
mathemakitten Jun 13, 2026
0022550
Account for reasoning token stripping (#5313)
tdene Jun 13, 2026
eb1c677
Thread pg_collection through wrap_model_chunks_with_ddp (#5328)
yashaswikarnati Jun 13, 2026
59bb1c1
chore(beep boop 🤖): Bump (main) (2026-06-15)
github-actions[bot] Jun 15, 2026
1bcb3b9
Fix LatentMoE theoretical memory estimate (#5145)
Wong4j Jun 15, 2026
133cf60
Add zstandard package to Docker LTS requirements. Fix nightly failure…
balasaajay Jun 15, 2026
addc601
Thread MIMO support through the stock training loop (schedule + optim…
yashaswikarnati Jun 15, 2026
4165673
ci: default functional test time limit to 4h for release/weekly scope…
ko3n1g Jun 16, 2026
72171c0
Fix memory leak with log_max_attention_logit (#4699) (#5067)
asolergi-nv Jun 16, 2026
b60de39
Clean up pretrain_gpt.py and pretrain_hybrid.py formatting and remove…
ilml Jun 16, 2026
1cfa834
Add full model cuda graph support for MTP inference (#4950)
santhnm2 Jun 16, 2026
a83f408
Expand the Mamba prefix caching memory safety check to include scratc…
santhnm2 Jun 16, 2026
b00cad1
Make Megatron RL only materialize last token logit (#4551)
tdene Jun 16, 2026
a12484b
Profiling (#3110)
jalbericiola Jun 16, 2026
2b90b3f
Support fused MLA QKV checkpoint reload (#5310)
sraman-rgb Jun 16, 2026
000dc1c
Add minimal DBuffer implementation (#4835)
wujingyue Jun 16, 2026
d30e165
[split 1/5] Fix packed THD RoPE under CP (#5243)
HollowMan6 Jun 17, 2026
49737fd
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Jun 17, 2026
2e1183a
Document agent PR commit sign-off and signing (#5381)
wujingyue Jun 16, 2026
2463dbe
Remove unused distributed pytest markers (#5380)
wujingyue Jun 17, 2026
5c660c3
[feat] Support fine-grained activation offloading in fused group mlp …
lhb8125 Jun 17, 2026
bd381ac
Thread tensor-parallel group into the RADIO patch embedder (#5371)
yashaswikarnati Jun 17, 2026
a00c0de
Add MimoModel.zero_grad_buffer delegating to active DDP submodules (#…
yashaswikarnati Jun 17, 2026
7604f28
[split 3/5] Refactor absorbed MLA projection handling (#5245)
HollowMan6 Jun 17, 2026
5182aa6
chore: rotate oncall schedule
github-actions[bot] Jun 17, 2026
41dbab4
ci: Remove sync skills workflow (#5091)
chtruong814 Jun 17, 2026
ad5a93b
Add flaky marker to fine-grained activation offloading test (#5350) (…
balasaajay Jun 17, 2026
be82829
Revert "Remove checkpoint-time GPU cache reclaim workaround (#5170)" …
balasaajay Jun 18, 2026
a50252b
Update goldens for weekly tests after pytorch and TE bumps. (#5399)
balasaajay Jun 18, 2026
d1410e1
Add MIMO runtime setup: per-role RNG seeding and DDP wrapping (#5285)
yashaswikarnati Jun 18, 2026
6142ee4
Add --mamba-training-ssm-states-dtype argument (#5309)
tdene Jun 21, 2026
2cb1b80
chore(beep boop 🤖): Bump (main) (2026-06-22)
github-actions[bot] Jun 22, 2026
57c484e
Fix Mamba prefix match for chunked prefill (#4758)
lmcafee-nvidia Jun 22, 2026
ae2efd5
Disag MR2: Refit into multiple destination pools and tied-embedding +…
wdykas Jun 22, 2026
fc4597c
Disag MR1: Add inference shard specs and pg-collection building (#5186)
wdykas Jun 22, 2026
e1c4495
Support the MIMO cross-grid path in training loop (#5373)
yashaswikarnati Jun 22, 2026
6bd392f
Stabilize hybrid_2b GB200 perf test against run-to-run noise (#5364)
shanmugamr1992 Jun 22, 2026
b6b44a7
Consistent oncall schedule (#5404)
Phlip79 Jun 22, 2026
93a7642
Disag MR3: Add heterogeneous KV/Mamba reshard planners (#5188)
wdykas Jun 22, 2026
2a46893
Add RADIO vision encoder wrapper for MIMO example (#5397)
yashaswikarnati Jun 22, 2026
76f6ccc
Clean up MTP inference control flow (#5418)
santhnm2 Jun 22, 2026
f8170b4
Add MIMO dual gradient finalization (colocated + non-colocated) (#5286)
yashaswikarnati Jun 23, 2026
a58373f
Add RL rollout submission and consumption granularity controls (#5306)
lauradang Jun 23, 2026
f66c28f
Add --functional-test-name to trigger_internal_ci (#5449)
ko3n1g Jun 23, 2026
8fa1831
Rename CP batch helpers to describe balancing granularity (#5403)
deepakn94 Jun 23, 2026
06ae6a9
build: point flash_mla at the nv_dev branch (#5448)
ko3n1g Jun 23, 2026
a2bb5e5
Add logprobs_mode (raw/processed) to inference config (#5419)
tdene Jun 23, 2026
fcbb6ed
Support SWA and sink attention in dynamic inference (#5249)
cuichenx Jun 23, 2026
b1884d1
Add hetero grid args and MoE process groups for MIMO example (#5375)
yashaswikarnati Jun 24, 2026
b549290
ci: Set test_save_verify_integrity_manifest_directly as flaky (#5468)
chtruong814 Jun 24, 2026
47cb413
Remove DBuffer mesh axis validation (#5441)
wujingyue Jun 23, 2026
a27b040
feat(inference): default use_coordinator to True in high-level APIs (…
shanmugamr1992 Jun 24, 2026
811bd29
Support HybridModel feature specs in ModelOpt (#5354)
Phlip79 Jun 24, 2026
e7af860
Add experimental Megatron-FSDP fully_shard implementation (#5387)
wujingyue Jun 24, 2026
cc0c960
chore: rotate oncall schedule
github-actions[bot] Jun 24, 2026
4d44e37
Add inference functions to support MCore-/MBridge- training refactor …
shanmugamr1992 Jun 24, 2026
5a256f3
ci: launch GB200 unit tests via launch_on_gb200 marker (#5477)
ko3n1g Jun 24, 2026
82de1b8
build: install flash_mla from source in the CI image (#5481)
ko3n1g Jun 24, 2026
0b0d985
[split 2/4] Scale DSA indexer loss in pipeline schedules (#5244)
HollowMan6 Jun 24, 2026
0938eb7
ci: check megatron.training imports in installation test (#5458)
ko3n1g Jun 24, 2026
239959b
Fix merges_file kwarg name in HuggingFaceTokenizer (#5406)
muyihao Jun 24, 2026
3330d12
Automated community request assignment (#5147)
Phlip79 Jun 24, 2026
311416f
Clean up training.py module header (dedupe + reorganize imports/globa…
ilml Jun 24, 2026
9038381
Thread process groups through training checkpoint paths (#5486)
yashaswikarnati Jun 24, 2026
5863721
Narrow oncall responsibilities (#5490)
Phlip79 Jun 25, 2026
1c1d6b5
Add MIMO forward step and per-token loss for hetero training (#5376)
yashaswikarnati Jun 25, 2026
ea967a7
Add Nemotron6-MoE VLM model provider for MIMO example (#5374)
yashaswikarnati Jun 25, 2026
7168714
ci: auto-retry test-data download in container-build job (#5498)
ko3n1g Jun 25, 2026
3bfd87b
Force RL inference to CP=1 (#5423)
tdene Jun 25, 2026
2a43e0d
Merge cu_seqlens across micro-batch for THD attention (#5454)
deepakn94 Jun 25, 2026
da482cf
[split 4/4] Enable DSA CP and THD hooks (#5246)
HollowMan6 Jun 25, 2026
e1b8454
Fix fused MLA down projection with tensor parallelism (#5383)
sraman-rgb Jun 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 1 addition & 0 deletions .agents/skills
14 changes: 14 additions & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"hooks": {
"UserPromptSubmit": [
{
"hooks": [
{
"type": "command",
"command": "printf '{\"hookSpecificOutput\":{\"hookEventName\":\"UserPromptSubmit\",\"additionalContext\":\"MANDATORY WORKFLOW — never skip or reorder: (1) Read the artifact first (commit, file, error, PR). (2) Identify and invoke the relevant skill via the Skill tool BEFORE forming any answer or plan — even when the answer seems obvious. (3) Only then answer using the skill context. Skipping step 2 is not allowed.\"}}'"
}
]
}
]
}
}
1 change: 1 addition & 0 deletions .claude/skills
1 change: 1 addition & 0 deletions .cursorrules
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See CLAUDE.md for all repository guidelines.
80 changes: 80 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
megatron/core/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/models/common/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/gpt/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/multimodal/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/multi-modal

megatron/core/models/mamba/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-model
megatron/core/ssm/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-model

megatron/core/models/hybrid/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-model

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/tokenizers/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/tokenizers

megatron/core/distributed/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/data-parallelism
megatron/core/distributed/fsdp/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/transformer/fsdp_dtensor_checkpoint.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/dist_checkpointing/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-checkpointing

megatron/core/optimizer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mcore-optimizer

megatron/core/optimizer/distrib_optimizer.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer
megatron/core/optimizer/layer_wise_optimizer.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer
megatron/core/optimizer/param_layout.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer

megatron/core/optimizer/emerging_optimizers.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mcore-emerging-optimizers
megatron/core/optimizer/muon.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mcore-emerging-optimizers
megatron/core/optimizer/qk_clip.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mcore-emerging-optimizers @NVIDIA/transformer

megatron/core/inference/modelopt_support @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/pipeline_parallel/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/pipeline-parallelism

megatron/core/transformer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/transformer

megatron/core/transformer/moe/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mixture-of-experts-adlr @NVIDIA/mixture-of-experts-devtech

megatron/core/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference

megatron/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference-interface

megatron/core/parallel_state.py @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/post_training/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/post_training/ @NVIDIA/post-training

megatron/core/transformer/cuda_graphs.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/cuda-graphs

megatron/training/ @NVIDIA/training-adlr @NVIDIA/training-nemo
megatron/training/arguments.py

.gitlab/ @NVIDIA/ci
.github/ @NVIDIA/ci
.github/oncall_schedule.json @NVIDIA/mcore-oncall-rotation
.gitlab-ci.yml @NVIDIA/ci
docker/ @NVIDIA/ci
tests/functional_tests/python_test_utils/ @NVIDIA/ci
tests/functional_tests/shell_test_utils/ @NVIDIA/ci
tests/test_utils/recipes/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci

# API Backwards Compatibility Check
scripts/check_api_backwards_compatibility.py @NVIDIA/ci
scripts/README_API_COMPAT.md @NVIDIA/ci
.github/workflows/check_api_backwards_compatibility_workflow.yml @NVIDIA/ci
docs/api-backwards-compatibility-check.md @NVIDIA/ci
tests/unit_tests/test_api_backwards_compat_setup.py @NVIDIA/ci

megatron/rl/ @NVIDIA/reinforcement-learning
examples/rl/ @NVIDIA/reinforcement-learning
test/unit_tests/test_rl_utils.py @NVIDIA/reinforcement-learning
train_rl.py @NVIDIA/reinforcement-learning
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Bug report
about: Create a report to help us improve the repository or project
title: ""
labels: bug
assignees: ''

---

**Describe the bug**

A clear and concise description of what the bug is. Tag @NVIDIA/mcore-oncall
to get oncall's attention to this issue.

**Steps/Code to reproduce bug**

Please list *minimal* steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.


**Expected behavior**

A clear and concise description of what you expected to happen.


**Additional context**

Add any other context about the problem here.
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
blank_issues_enabled: false

23 changes: 23 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: Feature request
about: Suggest an idea for this project
title: ""
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Tag @NVIDIA/mcore-oncall
to get oncall's attention to this issue.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
13 changes: 13 additions & 0 deletions .github/ISSUE_TEMPLATE/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
name: QUESTION
about: Ask a question about Megatron-LM that is not a bug, regression or enhancement
request
title: "[QUESTION]"
labels: ''
assignees: ''

---

**Your question**
Ask a clear and concise question about Megatron-LM. Tag @NVIDIA/mcore-oncall
to get oncall's attention to this issue.
40 changes: 40 additions & 0 deletions .github/ISSUE_TEMPLATE/regression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
name: REGRESSION
about: Report a regression in speed or accuracy due to a Megatron-LM update
title: "[REGRESSION]"
labels: ''
assignees: ''

---

**Describe the regression**
A clear and concise description of what the regression is. Tag @NVIDIA/mcore-oncall
to get oncall's attention to this issue.

**To Reproduce**
Steps to reproduce the behavior. The easier it is to reproduce the faster it will get maintainer attention.

**Previous performance**
What speed or accuracy did you previously see.

**New performance**
What speed or accuracy do you see after the update.

**Stack trace/logs**
If applicable, add the stack trace or logs related to the regression.

**Environment (please complete the following information):**
- Previous Megatron-LM commit ID
- New Megatron-LM commit ID
- Previous PyTorch version
- New PyTorch version
- Previous CUDA version
- New CUDA version
- Previous NCCL version
- New NCCL version

**Proposed fix**
If you have a proposal for how to fix the issue state it here or link to a PR.

**Additional context**
Add any other context about the problem here.
Loading