AI-Hypercomputer · SurbhiJainUSC · May 13, 2026
@@ -1,7 +1,7 @@
 # Post-training
 
 ```{note}
-Post-training workflows on TPU require specific dependencies. Please ensure you have installed MaxText with `maxtext[tpu-post-train]` as described in the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html).
+Post-training workflows on TPU require specific dependencies. Please ensure you have installed MaxText with `maxtext[tpu-post-train]` as described in the [official documentation](../install_maxtext.md).
 ```
 
 ## What is MaxText post-training?
@@ -14,7 +14,7 @@ We’re investing in performance, scale, algorithms, models, reliability, and ea
 
 MaxText was co-designed with key Google led innovations to provide a unified post training experience:
 
-- [MaxText model library](https://maxtext.readthedocs.io/en/latest/reference/models/supported_models_and_architectures.html#supported-model-families) for JAX LLMs highly optimized for TPUs
+- [MaxText model library](../reference/models/supported_models_and_architectures.md#supported-model-families) for JAX LLMs highly optimized for TPUs
 - [Tunix](https://github.com/google/tunix) for the latest algorithms and post-training techniques
 - [vLLM on TPU](https://github.com/vllm-project/tpu-inference) for high performance sampling (inference) for Reinforcement Learning (RL)
 - [Pathways](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro) for multi-host inference (sampling) and highly efficient weight transfer
@@ -24,15 +24,16 @@ MaxText was co-designed with key Google led innovations to provide a unified pos
 ## Supported techniques & models
 
 - **SFT (Supervised Fine-Tuning)**
-  - [SFT on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft.html)
-  - [SFT on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html)
+  - [SFT on Single-Host TPUs](../tutorials/posttraining/sft.md)
+  - [SFT on Multi-Host TPUs](../tutorials/posttraining/sft_on_multi_host.md)
 - **LoRA (Low-Rank Adaptation)**
-  - [LoRA on Single-Host TPUs](posttraining/lora.md)
+  - [LoRA on Single-Host TPUs](../tutorials/posttraining/lora.md)
 - **Multimodal SFT**
-  - [Multimodal Support](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/multimodal.html)
+  - [Multimodal Support](../tutorials/posttraining/multimodal.md)
 - **Reinforcement Learning (RL)**
-  - [RL on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl.html)
-  - [RL on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html)
+  - [RL on Single-Host TPUs](../tutorials/posttraining/rl.md)
+  - [RL on Multi-Host TPUs](../tutorials/posttraining/rl_on_multi_host.md)
+  - [RL with Qwen3-30b-a3b](../tutorials/posttraining/rl_qwen3_30b.md)
 
 ## Step by step RL
 
@@ -57,7 +58,7 @@ Pathways supercharges RL with:
 
 ## Getting started
 
-Start your Post-Training journey through quick experimentation with [Python Notebooks](https://maxtext.readthedocs.io/en/latest/guides/run_python_notebook.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) and [RL](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html).
+Start your Post-Training journey through quick experimentation with [Python Notebooks](../guides/run_python_notebook.md) or our Production level tutorials for [SFT](../tutorials/posttraining/sft_on_multi_host.md) and [RL](../tutorials/posttraining/rl_on_multi_host.md).
 
 ## More tutorials
 
@@ -69,6 +70,7 @@ posttraining/sft.md
 posttraining/sft_on_multi_host.md
 posttraining/rl.md
 posttraining/rl_on_multi_host.md
+posttraining/rl_qwen3_30b.md
 posttraining/knowledge_distillation.md
 posttraining/lora.md
 posttraining/multimodal.md

@@ -0,0 +1,160 @@
+<!--
+ Copyright 2023-2026 Google LLC
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      https://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ -->
+
+# Reinforcement Learning with Qwen3-30b-a3b on Multi-Host TPUs
+
+This tutorial provides step-by-step instructions for setting up the environment
+and training the Qwen3-30b-a3b model on the [OpenMathInstruct-2 dataset](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) on Ironwood GKE cluster with `tpu7x-128` nodes.
+
+## Prerequisites
+
+Before starting, ensure you have:
+
+- Access to a Google Cloud Project with TPU quotas.
+- A Hugging Face account with an access token for downloading models.
+- Permissions for Google Artifact Registry (Artifact Registry Writer role).
+- Prerequisites for XPK installed (follow [official documentation](https://github.com/AI-Hypercomputer/xpk/blob/main/docs/installation.md#1-prerequisites)).
+- A Pathways-ready GKE cluster (see [create GKE cluster](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster)).
+- **Docker** installed and configured for sudoless use. Follow the steps to [configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/).
+
+## Build and Upload MaxText Docker Image
+
+For instructions on building and uploading the MaxText Docker image with post-training dependencies, please refer to the [official documentation](../../build_maxtext.md).
+
+## Setup Environment Variables
+
+Set up the following environment variables to configure your training run. Replace
+placeholders with your actual values.
+
+```bash
+# Your GCP project ID.
+# If you've already set it in your local config, you can retrieve it via:
+# gcloud config get-value project
+export PROJECT_ID=<PROJECT_ID>
+
+# The name of your GKE cluster.
+export CLUSTER_NAME=<CLUSTER_NAME>
+
+# The GCP location of your GKE cluster.
+export ZONE=<ZONE> # e.g., 'us-central1' or 'us-central1-a'
+
+# Use a GCS bucket you own to store logs and checkpoints.
+export BASE_OUTPUT_DIRECTORY=<GCS_BUCKET> # e.g., gs://my-bucket/maxtext-runs
+
+# The Docker image you pushed in the previous step
+export CLOUD_IMAGE_NAME=<IMAGE_NAME>
+export DOCKER_IMAGE="gcr.io/${PROJECT_ID?}/${CLOUD_IMAGE_NAME?}"
+```
+
+# Clone MaxText Repository
+
+If you haven't already, clone the MaxText repository to your local machine:
+
+```bash
+git clone https://github.com/AI-Hypercomputer/maxtext.git
+cd maxtext
+```
+
+## Get Your MaxText Compatible Model Checkpoint
+
+### Option 1: Using an existing MaxText checkpoint
+
+If you already have a MaxText-compatible model checkpoint, simply set the
+following environment variable and move on to the next section.
+
+```bash
+export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
+```
+
+### Option 2: Converting from a Hugging Face checkpoint
+
+> **Note:** Converting the 30B model requires approximately 62 GB of free disk space to download its safetensors. Please verify you have sufficient space before running the conversion script.
+
+```bash
+# Optional: If you run out of disk space when downloading Hugging Face safetensors,
+# customize your "HF_HOME" to redirect the cache to a larger or mounted disk (e.g., on a TPU VM).
+# export HF_HOME="/dev/shm/huggingface_tmp"
+
+# Create and activate a virtual environment
+uv venv --python 3.12 --seed tpu_venv
+source tpu_venv/bin/activate
+uv pip install -e .[tpu] --resolution=lowest
+
+# Authenticate with Hugging Face
+hf auth login
+
+# Run the conversion script to convert the Hugging Face checkpoint to MaxText format
+bash scripts/run_qwen3_30b_hf_to_maxtext.sh
+
+# Deactivate the virtual environment
+deactivate
+rm -rf tpu_venv
+```
+
+## Run RL Workload
+
+### Submit your workload
+
+```bash
+# Create and activate a virtual environment
+uv venv --python 3.12 --seed runner_venv
+source runner_venv/bin/activate
+uv pip install -e .[runner] --resolution=lowest
+
+# Run the RL training script on your cluster
+bash scripts/run_qwen3_30b_rl.sh
+
+# Deactivate the virtual environment
+deactivate
+rm -rf runner_venv
+```
+
+### Monitor your workload
+
+To monitor your job's progress, you can use `kubectl` to check the `Jobset` status and stream logs directly from the pods.
+
+```bash
+kubectl get jobset -n default ${WORKLOAD_NAME}
+
+# List pods to find the specific name
+kubectl get pods | grep ${WORKLOAD_NAME}
+
+# stream the logs from the running pod (replace <POD_NAME> with the name you found)
+kubectl logs -f <POD_NAME>
+```
+
+Alternatively, after running the bash script, you will also get a link to the Google Cloud Console to view your workload logs. Follow the link to view logs and monitor your workload's progress in the Cloud Console.
+
+## Convert Checkpoint to Hugging Face Format
+
+After training, you may want to convert your MaxText checkpoint back to Hugging Face format. Use the following script to perform the conversion:
+
+```bash
+# Create and activate a virtual environment
+uv venv --python 3.12 --seed tpu_venv
+source tpu_venv/bin/activate
+uv pip install -e .[tpu] --resolution=lowest
+
+# Authenticate with Hugging Face
+hf auth login
+
+# Run the conversion script to convert the MaxText checkpoint back to Hugging Face format 
+bash scripts/run_qwen3_30b_maxtext_to_hf.sh
+
+# Deactivate the virtual environment
+deactivate
+rm -rf tpu_venv
+```
@@ -0,0 +1,45 @@
+#!/bin/bash
+
+# This script converts a Qwen3-30B-A3B model checkpoint from the Hugging Face
+# format to the MaxText format. It requires the BASE_OUTPUT_DIRECTORY environment
+# variable to be set to a GCS path where the converted checkpoint will be stored.
+
+set -e
+
+# --- Environment Setup ---
+if ! pip show maxtext &> /dev/null; then
+    echo "maxtext not found in the environment. Please install it by running:"
+    echo "uv pip install -e .[tpu] --resolution=lowest"
+    exit 1
+fi
+
+# --- Environment Variables ---
+export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for outputs (e.g., gs://my-bucket/outputs)
+
+# --- Variable Validation ---
+if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then
+    echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+
+# Install torch for conversion
+echo "Installing torch for checkpoint conversion..."
+python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
+
+# Define the output path for the converted checkpoint
+CONVERTED_CKPT_BASE_DIR="${BASE_OUTPUT_DIRECTORY}/checkpoints/qwen3-30b-a3b-converted"
+echo "Converted checkpoint will be saved to: ${CONVERTED_CKPT_BASE_DIR}"
+
+# Run the conversion script
+python3 -m maxtext.checkpoint_conversion.to_maxtext \
+    src/maxtext/configs/base.yml \
+    model_name=qwen3-30b-a3b \
+    base_output_directory="${CONVERTED_CKPT_BASE_DIR}" \
+    scan_layers=True \
+    weight_dtype=bfloat16 hardware=cpu skip_jax_distributed_system=True \
+    checkpoint_storage_use_ocdbt=False checkpoint_storage_use_zarr3=False \
+    --eager_load_method=safetensors
+
+# Set MAXTEXT_CKPT_PATH to the newly created checkpoint path
+export MAXTEXT_CKPT_PATH="${CONVERTED_CKPT_BASE_DIR}/0/items"
+echo "Conversion complete. Using checkpoint at: ${MAXTEXT_CKPT_PATH}"
@@ -0,0 +1,48 @@
+#!/bin/bash
+
+# This script converts a Qwen3-30B-A3B model checkpoint from the MaxText
+# format to the Hugging Face format. It requires the MAXTEXT_CKPT_PATH and
+# BASE_OUTPUT_DIRECTORY environment variables to be set.
+
+set -e
+
+# --- Environment Setup ---
+if ! pip show maxtext &> /dev/null; then
+    echo "maxtext not found in the environment. Please install it by running:"
+    echo "uv pip install -e .[tpu] --resolution=lowest"
+    exit 1
+fi
+
+# --- Environment Variables ---
+export MAXTEXT_CKPT_PATH="${MAXTEXT_CKPT_PATH:-}" # GCS path to the MaxText checkpoint to convert
+export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for storing the converted HF checkpoint
+
+# --- Variable Validation ---
+if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then
+    echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+
+if [ -z "$MAXTEXT_CKPT_PATH" ]; then
+    echo "Error: MAXTEXT_CKPT_PATH is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+
+# Install torch for conversion
+echo "Installing torch for checkpoint conversion..."
+python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
+
+# Define the output path for the converted checkpoint
+HF_CKPT_OUTPUT_DIR="${BASE_OUTPUT_DIRECTORY}/checkpoints/qwen3-30b-a3b-hf-converted"
+echo "Converted Hugging Face checkpoint will be saved to: ${HF_CKPT_OUTPUT_DIR}"
+
+# Run the conversion script
+python -m maxtext.checkpoint_conversion.to_huggingface \
+    src/maxtext/configs/base.yml \
+    model_name=qwen3-30b-a3b \
+    load_parameters_path="${MAXTEXT_CKPT_PATH}" \
+    base_output_directory="${HF_CKPT_OUTPUT_DIR}" \
+    scan_layers=True \
+    weight_dtype=bfloat16 hardware=cpu skip_jax_distributed_system=True
+
+echo "Conversion to Hugging Face format complete. Checkpoint saved to: ${HF_CKPT_OUTPUT_DIR}"