You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(readme): update FP8 convergence table and add MXFP8/NVFP4 support info
- Add MXFP8 and NVFP4 format support to highlights and description
- Update FP8 convergence table with MXFP8 results from arxiv paper
- Remove outdated JAX-Toolbox links and "available on request" entries
- Update Docker container versions to 26.01
- Fix DeepSpeed and Lightning integration links
- Add Nemotron 3 paper to Latest News
- Add quickstart notebook link after PyTorch example
Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
* [11/2025] `Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes <https://developer.nvidia.com/blog/scale-biology-transformer-models-with-pytorch-and-nvidia-bionemo-recipes/>`_
18
19
* [11/2025] `FP8 Training of Large-Scale RL Models <https://lmsys.org/blog/2025-11-25-fp8-rl/>`_
@@ -30,7 +31,8 @@ What is Transformer Engine?
30
31
31
32
Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including
32
33
using 8-bit floating point (FP8) precision on Hopper, Ada, and Blackwell GPUs, to provide better
33
-
performance with lower memory utilization in both training and inference. TE provides a collection
34
+
performance with lower memory utilization in both training and inference. On Blackwell GPUs, TE also
35
+
supports MXFP8 (Microscaling FP8) and NVFP4 formats for even greater efficiency. TE provides a collection
34
36
of highly optimized building blocks for popular Transformer architectures and an automatic mixed
35
37
precision-like API that can be used seamlessly with your framework-specific code. TE also includes a
36
38
framework agnostic C++ API that can be integrated with other deep learning libraries to enable FP8
@@ -58,6 +60,7 @@ Highlights
58
60
* Easy-to-use modules for building Transformer layers with FP8 support
59
61
* Optimizations (e.g. fused kernels) for Transformer models
60
62
* Support for FP8 on NVIDIA Hopper, Ada, and Blackwell GPUs
63
+
* Support for MXFP8 and NVFP4 on NVIDIA Blackwell GPUs
61
64
* Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later
62
65
63
66
Examples
@@ -91,6 +94,7 @@ PyTorch
91
94
loss = out.sum()
92
95
loss.backward()
93
96
97
+
For a tutorial with more details, see the `Quickstart Notebook <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb>`_.
94
98
95
99
JAX
96
100
^^^
@@ -175,15 +179,15 @@ For example to use the NGC PyTorch container interactively,
175
179
176
180
.. code-block:: bash
177
181
178
-
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:25.08-py3
182
+
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:26.01-py3
179
183
180
184
For example to use the NGC JAX container interactively,
181
185
182
186
.. code-block:: bash
183
187
184
-
docker run --gpus all -it --rm nvcr.io/nvidia/jax:25.08-py3
188
+
docker run --gpus all -it --rm nvcr.io/nvidia/jax:26.01-py3
185
189
186
-
Where 25.08 (corresponding to August 2025 release) is the container version.
190
+
Where 26.01 (corresponding to January 2026 release) is the container version.
187
191
188
192
**Benefits of using NGC containers:**
189
193
@@ -343,46 +347,35 @@ FP8 has been tested extensively across different model architectures and configu
0 commit comments