Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add L2 score mod distributed attention shape 2.17
#3147 opened Jun 25, 2026 by vcherepanov-nv Collaborator Loading…
3 of 13 tasks
[PyTorch] Make quantized-tensor __repr__ safe
#3146 opened Jun 25, 2026 by pggPL Collaborator Loading…
7 of 13 tasks
Add MXFP8 support with cuBLASMp community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3145 opened Jun 25, 2026 by almogsegal Contributor Loading…
13 tasks
Add multi_tensor_raw_moments kernel community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3144 opened Jun 25, 2026 by philipcmonk Draft
6 of 13 tasks
Graph Safe Current Scaling Support for GroupedLinear Module/Ops
#3143 opened Jun 25, 2026 by vthumbe1503 Collaborator Loading…
13 tasks
docs: document attention backend selection documentation Improvements or additions to documentation
#3142 opened Jun 24, 2026 by sbhavani Collaborator Loading…
4 of 13 tasks
[PyTorch] Preserve fprop operands for dequantized backward override
#3141 opened Jun 23, 2026 by negvet Collaborator Loading…
13 tasks
[Common] Fix Build: Remove nproc from parallel make for NCCL EP build
#3138 opened Jun 22, 2026 by phu0ngng Collaborator Loading…
7 of 13 tasks
[Common] Experimental CuTeDSL MXFP8 backends in C++ via TVM-FFI
#3137 opened Jun 21, 2026 by kainzhong Collaborator Draft
13 tasks
[Common/PyTorch] Grouped-quantize kernels for 1D and 2D FP8 block-scaling FP8 MoE performance Performance issues
#3135 opened Jun 17, 2026 by denera Collaborator Loading…
8 of 13 tasks
Single-launch CUTLASS grouped GEMM for per-tensor NVFP4 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3134 opened Jun 17, 2026 by cael-ling Contributor Loading…
9 of 13 tasks
Enable NVFP4 RHT amax for grouped SReLU MLP community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3133 opened Jun 16, 2026 by sraman-rgb Contributor Loading…
13 tasks
[Common] Support scaled & clamped swiglu, srelu for BF16 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3132 opened Jun 16, 2026 by zhongbozhu Collaborator Loading…
13 tasks
[torch.compile] Bunch of small changes needed for enabling torch.compile
#3130 opened Jun 15, 2026 by pggPL Collaborator Loading…
8 of 13 tasks
feat: add SM_121 (GB10 consumer Blackwell) support for FA4 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3125 opened Jun 12, 2026 by TyGu1 Loading…
Avoid unpickling the extra state when not needed
#3123 opened Jun 12, 2026 by ptrendx Member Loading…
2 of 6 tasks
docs(readme): update latest news documentation Improvements or additions to documentation
#3121 opened Jun 11, 2026 by sbhavani Collaborator Loading…
6 of 13 tasks
TE EP integration to MoEBlock
#3116 opened Jun 10, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
[JAX] Collective Gemm test fixes
#3115 opened Jun 10, 2026 by jberchtold-nvidia Collaborator Loading…
13 tasks
Abstract CUDA hardcodes into configurable te_device_type / te_platform community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3113 opened Jun 10, 2026 by lxd-cumt Loading…
Add entrypoint for flagos multi-backend plugin system community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3107 opened Jun 9, 2026 by lxd-cumt Loading…
[PyTorch][torch.compile] Decouple amax reduction group from the quantizer
#3104 opened Jun 8, 2026 by pggPL Collaborator Loading…
4 of 13 tasks
Introduce Mega-C++ to reduce CPU overhead community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3099 opened Jun 6, 2026 by zhongbozhu Collaborator Loading…
3 of 17 tasks
ProTip! Add no:assignee to see everything that’s not assigned.