Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions docs/tutorials/post_training_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Post-training

```{note}
Post-training workflows on TPU require specific dependencies. Please ensure you have installed MaxText with `maxtext[tpu-post-train]` as described in the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html).
Post-training workflows on TPU require specific dependencies. Please ensure you have installed MaxText with `maxtext[tpu-post-train]` as described in the [official documentation](../install_maxtext.md).
```

## What is MaxText post-training?
Expand All @@ -14,7 +14,7 @@ We’re investing in performance, scale, algorithms, models, reliability, and ea

MaxText was co-designed with key Google led innovations to provide a unified post training experience:

- [MaxText model library](https://maxtext.readthedocs.io/en/latest/reference/models/supported_models_and_architectures.html#supported-model-families) for JAX LLMs highly optimized for TPUs
- [MaxText model library](../reference/models/supported_models_and_architectures.md#supported-model-families) for JAX LLMs highly optimized for TPUs
- [Tunix](https://github.com/google/tunix) for the latest algorithms and post-training techniques
- [vLLM on TPU](https://github.com/vllm-project/tpu-inference) for high performance sampling (inference) for Reinforcement Learning (RL)
- [Pathways](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro) for multi-host inference (sampling) and highly efficient weight transfer
Expand All @@ -24,15 +24,16 @@ MaxText was co-designed with key Google led innovations to provide a unified pos
## Supported techniques & models

- **SFT (Supervised Fine-Tuning)**
- [SFT on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft.html)
- [SFT on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html)
- [SFT on Single-Host TPUs](../tutorials/posttraining/sft.md)
- [SFT on Multi-Host TPUs](../tutorials/posttraining/sft_on_multi_host.md)
- **LoRA (Low-Rank Adaptation)**
- [LoRA on Single-Host TPUs](posttraining/lora.md)
- [LoRA on Single-Host TPUs](../tutorials/posttraining/lora.md)
- **Multimodal SFT**
- [Multimodal Support](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/multimodal.html)
- [Multimodal Support](../tutorials/posttraining/multimodal.md)
- **Reinforcement Learning (RL)**
- [RL on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl.html)
- [RL on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html)
- [RL on Single-Host TPUs](../tutorials/posttraining/rl.md)
- [RL on Multi-Host TPUs](../tutorials/posttraining/rl_on_multi_host.md)
- [RL with Qwen3-30b-a3b](../tutorials/posttraining/rl_qwen3_30b.md)

## Step by step RL

Expand All @@ -57,7 +58,7 @@ Pathways supercharges RL with:

## Getting started

Start your Post-Training journey through quick experimentation with [Python Notebooks](https://maxtext.readthedocs.io/en/latest/guides/run_python_notebook.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) and [RL](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html).
Start your Post-Training journey through quick experimentation with [Python Notebooks](../guides/run_python_notebook.md) or our Production level tutorials for [SFT](../tutorials/posttraining/sft_on_multi_host.md) and [RL](../tutorials/posttraining/rl_on_multi_host.md).

## More tutorials

Expand All @@ -69,6 +70,7 @@ posttraining/sft.md
posttraining/sft_on_multi_host.md
posttraining/rl.md
posttraining/rl_on_multi_host.md
posttraining/rl_qwen3_30b.md
posttraining/knowledge_distillation.md
posttraining/lora.md
posttraining/multimodal.md
Expand Down
160 changes: 160 additions & 0 deletions docs/tutorials/posttraining/rl_qwen3_30b.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
<!--
Copyright 2023-2026 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Reinforcement Learning with Qwen3-30b-a3b on Multi-Host TPUs

This tutorial provides step-by-step instructions for setting up the environment
and training the Qwen3-30b-a3b model on the [OpenMathInstruct-2 dataset](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) on Ironwood GKE cluster with `tpu7x-128` nodes.

## Prerequisites

Before starting, ensure you have:

- Access to a Google Cloud Project with TPU quotas.
- A Hugging Face account with an access token for downloading models.
- Permissions for Google Artifact Registry (Artifact Registry Writer role).
- Prerequisites for XPK installed (follow [official documentation](https://github.com/AI-Hypercomputer/xpk/blob/main/docs/installation.md#1-prerequisites)).
- A Pathways-ready GKE cluster (see [create GKE cluster](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster)).
- **Docker** installed and configured for sudoless use. Follow the steps to [configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/).

## Build and Upload MaxText Docker Image

For instructions on building and uploading the MaxText Docker image with post-training dependencies, please refer to the [official documentation](../../build_maxtext.md).

## Setup Environment Variables

Set up the following environment variables to configure your training run. Replace
placeholders with your actual values.

```bash
# Your GCP project ID.
# If you've already set it in your local config, you can retrieve it via:
# gcloud config get-value project
export PROJECT_ID=<PROJECT_ID>

# The name of your GKE cluster.
export CLUSTER_NAME=<CLUSTER_NAME>

# The GCP location of your GKE cluster.
export ZONE=<ZONE> # e.g., 'us-central1' or 'us-central1-a'

# Use a GCS bucket you own to store logs and checkpoints.
export BASE_OUTPUT_DIRECTORY=<GCS_BUCKET> # e.g., gs://my-bucket/maxtext-runs

# The Docker image you pushed in the previous step
export CLOUD_IMAGE_NAME=<IMAGE_NAME>
export DOCKER_IMAGE="gcr.io/${PROJECT_ID?}/${CLOUD_IMAGE_NAME?}"
```

# Clone MaxText Repository

If you haven't already, clone the MaxText repository to your local machine:

```bash
git clone https://github.com/AI-Hypercomputer/maxtext.git
cd maxtext
```

## Get Your MaxText Compatible Model Checkpoint

### Option 1: Using an existing MaxText checkpoint

If you already have a MaxText-compatible model checkpoint, simply set the
following environment variable and move on to the next section.

```bash
export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
```

### Option 2: Converting from a Hugging Face checkpoint

> **Note:** Converting the 30B model requires approximately 62 GB of free disk space to download its safetensors. Please verify you have sufficient space before running the conversion script.

```bash
# Optional: If you run out of disk space when downloading Hugging Face safetensors,
# customize your "HF_HOME" to redirect the cache to a larger or mounted disk (e.g., on a TPU VM).
# export HF_HOME="/dev/shm/huggingface_tmp"

# Create and activate a virtual environment
uv venv --python 3.12 --seed tpu_venv
source tpu_venv/bin/activate
uv pip install -e .[tpu] --resolution=lowest

# Authenticate with Hugging Face
hf auth login

# Run the conversion script to convert the Hugging Face checkpoint to MaxText format
bash scripts/run_qwen3_30b_hf_to_maxtext.sh

# Deactivate the virtual environment
deactivate
rm -rf tpu_venv
```

## Run RL Workload

### Submit your workload

```bash
# Create and activate a virtual environment
uv venv --python 3.12 --seed runner_venv
source runner_venv/bin/activate
uv pip install -e .[runner] --resolution=lowest

# Run the RL training script on your cluster
bash scripts/run_qwen3_30b_rl.sh

# Deactivate the virtual environment
deactivate
rm -rf runner_venv
```

### Monitor your workload

To monitor your job's progress, you can use `kubectl` to check the `Jobset` status and stream logs directly from the pods.

```bash
kubectl get jobset -n default ${WORKLOAD_NAME}

# List pods to find the specific name
kubectl get pods | grep ${WORKLOAD_NAME}

# stream the logs from the running pod (replace <POD_NAME> with the name you found)
kubectl logs -f <POD_NAME>
```

Alternatively, after running the bash script, you will also get a link to the Google Cloud Console to view your workload logs. Follow the link to view logs and monitor your workload's progress in the Cloud Console.

## Convert Checkpoint to Hugging Face Format

After training, you may want to convert your MaxText checkpoint back to Hugging Face format. Use the following script to perform the conversion:

```bash
# Create and activate a virtual environment
uv venv --python 3.12 --seed tpu_venv
source tpu_venv/bin/activate
uv pip install -e .[tpu] --resolution=lowest

# Authenticate with Hugging Face
hf auth login

# Run the conversion script to convert the MaxText checkpoint back to Hugging Face format
bash scripts/run_qwen3_30b_maxtext_to_hf.sh

# Deactivate the virtual environment
deactivate
rm -rf tpu_venv
```
45 changes: 45 additions & 0 deletions scripts/run_qwen3_30b_hf_to_maxtext.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash

# This script converts a Qwen3-30B-A3B model checkpoint from the Hugging Face
# format to the MaxText format. It requires the BASE_OUTPUT_DIRECTORY environment
# variable to be set to a GCS path where the converted checkpoint will be stored.

set -e

# --- Environment Setup ---
if ! pip show maxtext &> /dev/null; then
echo "maxtext not found in the environment. Please install it by running:"
echo "uv pip install -e .[tpu] --resolution=lowest"
exit 1
fi

# --- Environment Variables ---
export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for outputs (e.g., gs://my-bucket/outputs)

# --- Variable Validation ---
if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then
echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable."
exit 1
fi

# Install torch for conversion
echo "Installing torch for checkpoint conversion..."
python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu

# Define the output path for the converted checkpoint
CONVERTED_CKPT_BASE_DIR="${BASE_OUTPUT_DIRECTORY}/checkpoints/qwen3-30b-a3b-converted"
echo "Converted checkpoint will be saved to: ${CONVERTED_CKPT_BASE_DIR}"

# Run the conversion script
python3 -m maxtext.checkpoint_conversion.to_maxtext \
src/maxtext/configs/base.yml \
model_name=qwen3-30b-a3b \
base_output_directory="${CONVERTED_CKPT_BASE_DIR}" \
scan_layers=True \
weight_dtype=bfloat16 hardware=cpu skip_jax_distributed_system=True \
checkpoint_storage_use_ocdbt=False checkpoint_storage_use_zarr3=False \
--eager_load_method=safetensors

# Set MAXTEXT_CKPT_PATH to the newly created checkpoint path
export MAXTEXT_CKPT_PATH="${CONVERTED_CKPT_BASE_DIR}/0/items"
echo "Conversion complete. Using checkpoint at: ${MAXTEXT_CKPT_PATH}"
48 changes: 48 additions & 0 deletions scripts/run_qwen3_30b_maxtext_to_hf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/bin/bash

# This script converts a Qwen3-30B-A3B model checkpoint from the MaxText
# format to the Hugging Face format. It requires the MAXTEXT_CKPT_PATH and
# BASE_OUTPUT_DIRECTORY environment variables to be set.

set -e

# --- Environment Setup ---
if ! pip show maxtext &> /dev/null; then
echo "maxtext not found in the environment. Please install it by running:"
echo "uv pip install -e .[tpu] --resolution=lowest"
exit 1
fi

# --- Environment Variables ---
export MAXTEXT_CKPT_PATH="${MAXTEXT_CKPT_PATH:-}" # GCS path to the MaxText checkpoint to convert
export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for storing the converted HF checkpoint

# --- Variable Validation ---
if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then
echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable."
exit 1
fi

if [ -z "$MAXTEXT_CKPT_PATH" ]; then
echo "Error: MAXTEXT_CKPT_PATH is not set. Please set it in the script or as an environment variable."
exit 1
fi

# Install torch for conversion
echo "Installing torch for checkpoint conversion..."
python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu

# Define the output path for the converted checkpoint
HF_CKPT_OUTPUT_DIR="${BASE_OUTPUT_DIRECTORY}/checkpoints/qwen3-30b-a3b-hf-converted"
echo "Converted Hugging Face checkpoint will be saved to: ${HF_CKPT_OUTPUT_DIR}"

# Run the conversion script
python -m maxtext.checkpoint_conversion.to_huggingface \
src/maxtext/configs/base.yml \
model_name=qwen3-30b-a3b \
load_parameters_path="${MAXTEXT_CKPT_PATH}" \
base_output_directory="${HF_CKPT_OUTPUT_DIR}" \
scan_layers=True \
weight_dtype=bfloat16 hardware=cpu skip_jax_distributed_system=True

echo "Conversion to Hugging Face format complete. Checkpoint saved to: ${HF_CKPT_OUTPUT_DIR}"
Loading
Loading