-
Notifications
You must be signed in to change notification settings - Fork 718
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Refactor tensor class in C++ unit tests
refactor
#2962
opened May 6, 2026 by
timmoon10
Collaborator
Loading…
8 of 13 tasks
[JAX][Common] Enable cuDNN fused attn backend for NO_MASK + bidirectional SWA
#2961
opened May 5, 2026 by
KshitijLakhani
Collaborator
•
Draft
1 of 12 tasks
[PyTorch/Common] Remove legacy FP8DS implementation
2.16.0
#2959
opened May 5, 2026 by
cyanguwa
Collaborator
Loading…
8 of 13 tasks
[Common] Use specialized unfused MXFP8 cast kernels by default
#2958
opened May 5, 2026 by
Oleg-Goncharov
Collaborator
Loading…
5 of 13 tasks
CPU overhead optimizations for te autocast
#2957
opened May 4, 2026 by
vthumbe1503
Collaborator
Loading…
13 tasks
MXFP8 + FSDP2 checkpoint resume crashes in reset_sharded_param - add mxfp8 recpipe to fully shard
#2951
opened May 1, 2026 by
savitha-eng
Loading…
[Common, PyTorch] Add Triton MLA attention kernels for SM80
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2950
opened Apr 30, 2026 by
bzantium
Loading…
[All] Remove legacy max512 backend
2.16.0
#2949
opened Apr 30, 2026 by
cyanguwa
Collaborator
Loading…
8 of 13 tasks
Add NVFP4 1x64 Local Encode Recipe
#2941
opened Apr 29, 2026 by
cael-ling
Contributor
Loading…
1 of 13 tasks
[Common/PyTorch/JAX] make offset of ClampedSwiGLU configurable
#2938
opened Apr 28, 2026 by
hxbai
Contributor
Loading…
13 tasks
[PyTorch] Enable head dim 256 for FA4
#2932
opened Apr 27, 2026 by
yaox12
Member
Loading…
1 of 13 tasks
Implement per-token NVFP4 fprop recipe
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2931
opened Apr 27, 2026 by
zianglih
Contributor
Loading…
8 of 13 tasks
[Common/PyTorch] Add MXFP8 cast-and-transpose op
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2930
opened Apr 26, 2026 by
jeweldave
Loading…
Fix WHEEL Tag mismatch in transformer-engine-cu12 wheels
#2928
opened Apr 25, 2026 by
eyupcanakman
Loading…
7 of 13 tasks
[PyTorch] Fix stale columnwise data usage
#2925
opened Apr 25, 2026 by
ksivaman
Member
Loading…
7 of 13 tasks
[PyTorch] Add distributed Muon optimizer
2.16.0
#2920
opened Apr 23, 2026 by
vcherepanov-nv
Collaborator
Loading…
5 of 13 tasks
guard fuser grad checks on non-leaf nodes
#2919
opened Apr 23, 2026 by
CarlosGomes98
Contributor
Loading…
1 of 13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916
opened Apr 22, 2026 by
sudhakarsingh27
Collaborator
•
Draft
1 of 3 tasks
[JAX] Add an MoE Block (Layer) that compound router, permutation, groupedGEMM and communication
#2912
opened Apr 21, 2026 by
tdophung
Collaborator
Loading…
13 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2911
opened Apr 21, 2026 by
NoonePauseferg
Loading…
Previous Next
ProTip!
Exclude everything labeled
bug with -label:bug.