Skip to content

Commit 98726c5

Browse files
committed
docs(readme): update FP8 convergence table and add MXFP8/NVFP4 support info
- Add MXFP8 and NVFP4 format support to highlights and description - Update FP8 convergence table with MXFP8 results from arxiv paper - Remove outdated JAX-Toolbox links and "available on request" entries - Update Docker container versions to 26.01 - Fix DeepSpeed and Lightning integration links - Add Nemotron 3 paper to Latest News - Add quickstart notebook link after PyTorch example Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
1 parent c3769cb commit 98726c5

1 file changed

Lines changed: 13 additions & 20 deletions

File tree

README.rst

Lines changed: 13 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Transformer Engine
1313
Latest News
1414
===========
1515

16+
* [12/2025] `NVIDIA Nemotron 3: Efficient and Open Intelligence <https://arxiv.org/abs/2512.20856>`_ - trained with NVFP4 on Transformer Engine
1617
* [11/2025] `NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks <https://developer.nvidia.com/blog/nvidia-blackwell-architecture-sweeps-mlperf-training-v5-1-benchmarks/>`_
1718
* [11/2025] `Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes <https://developer.nvidia.com/blog/scale-biology-transformer-models-with-pytorch-and-nvidia-bionemo-recipes/>`_
1819
* [11/2025] `FP8 Training of Large-Scale RL Models <https://lmsys.org/blog/2025-11-25-fp8-rl/>`_
@@ -30,7 +31,8 @@ What is Transformer Engine?
3031
3132
Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including
3233
using 8-bit floating point (FP8) precision on Hopper, Ada, and Blackwell GPUs, to provide better
33-
performance with lower memory utilization in both training and inference. TE provides a collection
34+
performance with lower memory utilization in both training and inference. On Blackwell GPUs, TE also
35+
supports MXFP8 (Microscaling FP8) and NVFP4 formats for even greater efficiency. TE provides a collection
3436
of highly optimized building blocks for popular Transformer architectures and an automatic mixed
3537
precision-like API that can be used seamlessly with your framework-specific code. TE also includes a
3638
framework agnostic C++ API that can be integrated with other deep learning libraries to enable FP8
@@ -58,6 +60,7 @@ Highlights
5860
* Easy-to-use modules for building Transformer layers with FP8 support
5961
* Optimizations (e.g. fused kernels) for Transformer models
6062
* Support for FP8 on NVIDIA Hopper, Ada, and Blackwell GPUs
63+
* Support for MXFP8 and NVFP4 on NVIDIA Blackwell GPUs
6164
* Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later
6265

6366
Examples
@@ -91,6 +94,7 @@ PyTorch
9194
loss = out.sum()
9295
loss.backward()
9396
97+
For a tutorial with more details, see the `Quickstart Notebook <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb>`_.
9498

9599
JAX
96100
^^^
@@ -175,15 +179,15 @@ For example to use the NGC PyTorch container interactively,
175179

176180
.. code-block:: bash
177181
178-
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:25.08-py3
182+
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:26.01-py3
179183
180184
For example to use the NGC JAX container interactively,
181185

182186
.. code-block:: bash
183187
184-
docker run --gpus all -it --rm nvcr.io/nvidia/jax:25.08-py3
188+
docker run --gpus all -it --rm nvcr.io/nvidia/jax:26.01-py3
185189
186-
Where 25.08 (corresponding to August 2025 release) is the container version.
190+
Where 26.01 (corresponding to January 2026 release) is the container version.
187191

188192
**Benefits of using NGC containers:**
189193

@@ -343,46 +347,35 @@ FP8 has been tested extensively across different model architectures and configu
343347
+------------+------------------+---------------------------------------------------------------------------------------------------------+
344348
| Model | Framework | Source |
345349
+============+==================+=========================================================================================================+
346-
| T5-770M | JAX/T5x | https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x#convergence-and-performance|
347-
+------------+------------------+---------------------------------------------------------------------------------------------------------+
348350
| MPT-1.3B | Mosaic Composer | https://www.mosaicml.com/blog/coreweave-nvidia-h100-part-1 |
349351
+------------+------------------+---------------------------------------------------------------------------------------------------------+
350-
| GPT-5B | JAX/Paxml | https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/pax#h100-results |
351-
+------------+------------------+---------------------------------------------------------------------------------------------------------+
352-
| GPT-5B | NeMo Framework | Available on request |
353-
+------------+------------------+---------------------------------------------------------------------------------------------------------+
354352
| LLama2-7B | Alibaba Pai | https://mp.weixin.qq.com/s/NQT0uKXLbXyh5031zBdeBQ |
355353
+------------+------------------+---------------------------------------------------------------------------------------------------------+
356-
| T5-11B | JAX/T5x | Available on request |
354+
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
357355
+------------+------------------+---------------------------------------------------------------------------------------------------------+
358356
| MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 |
359357
+------------+------------------+---------------------------------------------------------------------------------------------------------+
360-
| GPT-22B | NeMo Framework | Available on request |
358+
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |
361359
+------------+------------------+---------------------------------------------------------------------------------------------------------+
362360
| LLama2-70B | Alibaba Pai | https://mp.weixin.qq.com/s/NQT0uKXLbXyh5031zBdeBQ |
363361
+------------+------------------+---------------------------------------------------------------------------------------------------------+
364-
| GPT-175B | JAX/Paxml | https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/pax#h100-results |
365-
+------------+------------------+---------------------------------------------------------------------------------------------------------+
366362

367363
Integrations
368364
============
369365

370366
Transformer Engine has been integrated with popular LLM frameworks such as:
371367

372-
* `DeepSpeed <https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/runtime/half_precision/test_fp8.py>`_
368+
* `DeepSpeed <https://github.com/deepspeedai/DeepSpeed>`_
373369
* `Hugging Face Accelerate <https://huggingface.co/docs/accelerate/main/en/usage_guides/low_precision_training#configuring-transformersengine>`_
374-
* `Lightning <https://github.com/Lightning-AI/lightning/issues/17172>`_
370+
* `Lightning <https://lightning.ai/docs/pytorch/stable/common/precision.html>`_
375371
* `MosaicML Composer <https://github.com/mosaicml/composer/releases/tag/v0.13.1>`_
376372
* `NVIDIA JAX Toolbox <https://github.com/NVIDIA/JAX-Toolbox>`_
377373
* `NVIDIA Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_
378374
* `NVIDIA NeMo Framework <https://github.com/NVIDIA/NeMo-Megatron-Launcher>`_
379375
* `Amazon SageMaker Model Parallel Library <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-core-features-v2-tensor-parallelism.html>`_
380376
* `Levanter <https://github.com/stanford-crfm/levanter>`_
381377
* `GPT-NeoX <https://github.com/EleutherAI/gpt-neox>`_
382-
* `Hugging Face Nanotron <https://github.com/huggingface/nanotron>`_ - Coming soon!
383-
* `Colossal-AI <https://github.com/hpcaitech/ColossalAI>`_ - Coming soon!
384-
* `PeriFlow <https://github.com/friendliai/periflow-python-sdk>`_ - Coming soon!
385-
378+
* `Hugging Face Nanotron <https://github.com/huggingface/nanotron>`_
386379

387380
Contributing
388381
============

0 commit comments

Comments
 (0)