docs(readme): update FP8 convergence table and add MXFP8/NVFP4 support info

sbhavani · sbhavani · commit 98726c543806 · 2026-02-02T12:36:16.000-08:00
- Add MXFP8 and NVFP4 format support to highlights and description
- Update FP8 convergence table with MXFP8 results from arxiv paper
- Remove outdated JAX-Toolbox links and "available on request" entries
- Update Docker container versions to 26.01
- Fix DeepSpeed and Lightning integration links
- Add Nemotron 3 paper to Latest News
- Add quickstart notebook link after PyTorch example

Signed-off-by: Santosh Bhavani &lt;santosh.bhavani@live.com&gt;
diff --git a/README.rst b/README.rst
@@ -13,6 +13,7 @@ Transformer Engine
 Latest News
 ===========
 
+* [12/2025] `NVIDIA Nemotron 3: Efficient and Open Intelligence <https://arxiv.org/abs/2512.20856>`_ - trained with NVFP4 on Transformer Engine
 * [11/2025] `NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks <https://developer.nvidia.com/blog/nvidia-blackwell-architecture-sweeps-mlperf-training-v5-1-benchmarks/>`_
 * [11/2025] `Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes <https://developer.nvidia.com/blog/scale-biology-transformer-models-with-pytorch-and-nvidia-bionemo-recipes/>`_
 * [11/2025] `FP8 Training of Large-Scale RL Models <https://lmsys.org/blog/2025-11-25-fp8-rl/>`_
@@ -30,7 +31,8 @@ What is Transformer Engine?
 
 Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including
 using 8-bit floating point (FP8) precision on Hopper, Ada, and Blackwell GPUs, to provide better
-performance with lower memory utilization in both training and inference. TE provides a collection
+performance with lower memory utilization in both training and inference. On Blackwell GPUs, TE also
+supports MXFP8 (Microscaling FP8) and NVFP4 formats for even greater efficiency. TE provides a collection
 of highly optimized building blocks for popular Transformer architectures and an automatic mixed
 precision-like API that can be used seamlessly with your framework-specific code. TE also includes a
 framework agnostic C++ API that can be integrated with other deep learning libraries to enable FP8
@@ -58,6 +60,7 @@ Highlights
 * Easy-to-use modules for building Transformer layers with FP8 support
 * Optimizations (e.g. fused kernels) for Transformer models
 * Support for FP8 on NVIDIA Hopper, Ada, and Blackwell GPUs
+* Support for MXFP8 and NVFP4 on NVIDIA Blackwell GPUs
 * Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later
 
 Examples
@@ -91,6 +94,7 @@ PyTorch
   loss = out.sum()
   loss.backward()
 
+For a tutorial with more details, see the `Quickstart Notebook <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb>`_.
 
 JAX
 ^^^
@@ -175,15 +179,15 @@ For example to use the NGC PyTorch container interactively,
 
 .. code-block:: bash
 
-    docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:25.08-py3
+    docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:26.01-py3
 
 For example to use the NGC JAX container interactively,
 
 .. code-block:: bash
 
-    docker run --gpus all -it --rm nvcr.io/nvidia/jax:25.08-py3
+    docker run --gpus all -it --rm nvcr.io/nvidia/jax:26.01-py3
 
-Where 25.08 (corresponding to August 2025 release) is the container version.
+Where 26.01 (corresponding to January 2026 release) is the container version.
 
 **Benefits of using NGC containers:**
 
@@ -343,46 +347,35 @@ FP8 has been tested extensively across different model architectures and configu
 +------------+------------------+---------------------------------------------------------------------------------------------------------+
 | Model      | Framework        | Source                                                                                                  |
 +============+==================+=========================================================================================================+
-| T5-770M    |  JAX/T5x         | https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x#convergence-and-performance|
-+------------+------------------+---------------------------------------------------------------------------------------------------------+
 | MPT-1.3B   |  Mosaic Composer | https://www.mosaicml.com/blog/coreweave-nvidia-h100-part-1                                              |
 +------------+------------------+---------------------------------------------------------------------------------------------------------+
-| GPT-5B     |  JAX/Paxml       | https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/pax#h100-results               |
-+------------+------------------+---------------------------------------------------------------------------------------------------------+
-| GPT-5B     |  NeMo Framework  | Available on request                                                                                    |
-+------------+------------------+---------------------------------------------------------------------------------------------------------+
 | LLama2-7B  |  Alibaba Pai     | https://mp.weixin.qq.com/s/NQT0uKXLbXyh5031zBdeBQ                                                       |
 +------------+------------------+---------------------------------------------------------------------------------------------------------+
-| T5-11B     |  JAX/T5x         | Available on request                                                                                    |
+| LLM-8B     |  Megatron Core           | https://arxiv.org/abs/2506.08027                                                                        |
 +------------+------------------+---------------------------------------------------------------------------------------------------------+
 | MPT-13B    |  Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8         |
 +------------+------------------+---------------------------------------------------------------------------------------------------------+
-| GPT-22B    |  NeMo Framework  | Available on request                                                                                    |
+| MoE-16B    |  Megatron Core           | https://arxiv.org/abs/2506.08027                                                                        |
 +------------+------------------+---------------------------------------------------------------------------------------------------------+
 | LLama2-70B |  Alibaba Pai     | https://mp.weixin.qq.com/s/NQT0uKXLbXyh5031zBdeBQ                                                       |
 +------------+------------------+---------------------------------------------------------------------------------------------------------+
-| GPT-175B   |  JAX/Paxml       | https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/pax#h100-results               |
-+------------+------------------+---------------------------------------------------------------------------------------------------------+
 
 Integrations
 ============
 
 Transformer Engine has been integrated with popular LLM frameworks such as:
 
-* `DeepSpeed <https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/runtime/half_precision/test_fp8.py>`_
+* `DeepSpeed <https://github.com/deepspeedai/DeepSpeed>`_
 * `Hugging Face Accelerate <https://huggingface.co/docs/accelerate/main/en/usage_guides/low_precision_training#configuring-transformersengine>`_
-* `Lightning <https://github.com/Lightning-AI/lightning/issues/17172>`_
+* `Lightning <https://lightning.ai/docs/pytorch/stable/common/precision.html>`_
 * `MosaicML Composer <https://github.com/mosaicml/composer/releases/tag/v0.13.1>`_
 * `NVIDIA JAX Toolbox <https://github.com/NVIDIA/JAX-Toolbox>`_
 * `NVIDIA Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_
 * `NVIDIA NeMo Framework <https://github.com/NVIDIA/NeMo-Megatron-Launcher>`_
 * `Amazon SageMaker Model Parallel Library <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-core-features-v2-tensor-parallelism.html>`_
 * `Levanter <https://github.com/stanford-crfm/levanter>`_
 * `GPT-NeoX <https://github.com/EleutherAI/gpt-neox>`_
-* `Hugging Face Nanotron <https://github.com/huggingface/nanotron>`_ - Coming soon!
-* `Colossal-AI <https://github.com/hpcaitech/ColossalAI>`_ - Coming soon!
-* `PeriFlow <https://github.com/friendliai/periflow-python-sdk>`_ - Coming soon!
-
+* `Hugging Face Nanotron <https://github.com/huggingface/nanotron>`_
 
 Contributing
 ============