diff --git a/docs/tutorials/post_training_index.md b/docs/tutorials/post_training_index.md index d277cfb4be..89a4065867 100644 --- a/docs/tutorials/post_training_index.md +++ b/docs/tutorials/post_training_index.md @@ -1,7 +1,7 @@ # Post-training ```{note} -Post-training workflows on TPU require specific dependencies. Please ensure you have installed MaxText with `maxtext[tpu-post-train]` as described in the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html). +Post-training workflows on TPU require specific dependencies. Please ensure you have installed MaxText with `maxtext[tpu-post-train]` as described in the [official documentation](../install_maxtext.md). ``` ## What is MaxText post-training? @@ -14,7 +14,7 @@ We’re investing in performance, scale, algorithms, models, reliability, and ea MaxText was co-designed with key Google led innovations to provide a unified post training experience: -- [MaxText model library](https://maxtext.readthedocs.io/en/latest/reference/models/supported_models_and_architectures.html#supported-model-families) for JAX LLMs highly optimized for TPUs +- [MaxText model library](../reference/models/supported_models_and_architectures.md#supported-model-families) for JAX LLMs highly optimized for TPUs - [Tunix](https://github.com/google/tunix) for the latest algorithms and post-training techniques - [vLLM on TPU](https://github.com/vllm-project/tpu-inference) for high performance sampling (inference) for Reinforcement Learning (RL) - [Pathways](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro) for multi-host inference (sampling) and highly efficient weight transfer @@ -24,15 +24,16 @@ MaxText was co-designed with key Google led innovations to provide a unified pos ## Supported techniques & models - **SFT (Supervised Fine-Tuning)** - - [SFT on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft.html) - - [SFT on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) + - [SFT on Single-Host TPUs](../tutorials/posttraining/sft.md) + - [SFT on Multi-Host TPUs](../tutorials/posttraining/sft_on_multi_host.md) - **LoRA (Low-Rank Adaptation)** - - [LoRA on Single-Host TPUs](posttraining/lora.md) + - [LoRA on Single-Host TPUs](../tutorials/posttraining/lora.md) - **Multimodal SFT** - - [Multimodal Support](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/multimodal.html) + - [Multimodal Support](../tutorials/posttraining/multimodal.md) - **Reinforcement Learning (RL)** - - [RL on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl.html) - - [RL on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html) + - [RL on Single-Host TPUs](../tutorials/posttraining/rl.md) + - [RL on Multi-Host TPUs](../tutorials/posttraining/rl_on_multi_host.md) + - [RL with Qwen3-30b-a3b](../tutorials/posttraining/rl_qwen3_30b.md) ## Step by step RL @@ -57,7 +58,7 @@ Pathways supercharges RL with: ## Getting started -Start your Post-Training journey through quick experimentation with [Python Notebooks](https://maxtext.readthedocs.io/en/latest/guides/run_python_notebook.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) and [RL](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html). +Start your Post-Training journey through quick experimentation with [Python Notebooks](../guides/run_python_notebook.md) or our Production level tutorials for [SFT](../tutorials/posttraining/sft_on_multi_host.md) and [RL](../tutorials/posttraining/rl_on_multi_host.md). ## More tutorials @@ -69,6 +70,7 @@ posttraining/sft.md posttraining/sft_on_multi_host.md posttraining/rl.md posttraining/rl_on_multi_host.md +posttraining/rl_qwen3_30b.md posttraining/knowledge_distillation.md posttraining/lora.md posttraining/multimodal.md diff --git a/docs/tutorials/posttraining/rl_qwen3_30b.md b/docs/tutorials/posttraining/rl_qwen3_30b.md new file mode 100644 index 0000000000..28358d1152 --- /dev/null +++ b/docs/tutorials/posttraining/rl_qwen3_30b.md @@ -0,0 +1,160 @@ + + +# Reinforcement Learning with Qwen3-30b-a3b on Multi-Host TPUs + +This tutorial provides step-by-step instructions for setting up the environment +and training the Qwen3-30b-a3b model on the [OpenMathInstruct-2 dataset](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) on Ironwood GKE cluster with `tpu7x-128` nodes. + +## Prerequisites + +Before starting, ensure you have: + +- Access to a Google Cloud Project with TPU quotas. +- A Hugging Face account with an access token for downloading models. +- Permissions for Google Artifact Registry (Artifact Registry Writer role). +- Prerequisites for XPK installed (follow [official documentation](https://github.com/AI-Hypercomputer/xpk/blob/main/docs/installation.md#1-prerequisites)). +- A Pathways-ready GKE cluster (see [create GKE cluster](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster)). +- **Docker** installed and configured for sudoless use. Follow the steps to [configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/). + +## Build and Upload MaxText Docker Image + +For instructions on building and uploading the MaxText Docker image with post-training dependencies, please refer to the [official documentation](../../build_maxtext.md). + +## Setup Environment Variables + +Set up the following environment variables to configure your training run. Replace +placeholders with your actual values. + +```bash +# Your GCP project ID. +# If you've already set it in your local config, you can retrieve it via: +# gcloud config get-value project +export PROJECT_ID= + +# The name of your GKE cluster. +export CLUSTER_NAME= + +# The GCP location of your GKE cluster. +export ZONE= # e.g., 'us-central1' or 'us-central1-a' + +# Use a GCS bucket you own to store logs and checkpoints. +export BASE_OUTPUT_DIRECTORY= # e.g., gs://my-bucket/maxtext-runs + +# The Docker image you pushed in the previous step +export CLOUD_IMAGE_NAME= +export DOCKER_IMAGE="gcr.io/${PROJECT_ID?}/${CLOUD_IMAGE_NAME?}" +``` + +# Clone MaxText Repository + +If you haven't already, clone the MaxText repository to your local machine: + +```bash +git clone https://github.com/AI-Hypercomputer/maxtext.git +cd maxtext +``` + +## Get Your MaxText Compatible Model Checkpoint + +### Option 1: Using an existing MaxText checkpoint + +If you already have a MaxText-compatible model checkpoint, simply set the +following environment variable and move on to the next section. + +```bash +export MAXTEXT_CKPT_PATH= # e.g., gs://my-bucket/my-model-checkpoint/0/items +``` + +### Option 2: Converting from a Hugging Face checkpoint + +> **Note:** Converting the 30B model requires approximately 62 GB of free disk space to download its safetensors. Please verify you have sufficient space before running the conversion script. + +```bash +# Optional: If you run out of disk space when downloading Hugging Face safetensors, +# customize your "HF_HOME" to redirect the cache to a larger or mounted disk (e.g., on a TPU VM). +# export HF_HOME="/dev/shm/huggingface_tmp" + +# Create and activate a virtual environment +uv venv --python 3.12 --seed tpu_venv +source tpu_venv/bin/activate +uv pip install -e .[tpu] --resolution=lowest + +# Authenticate with Hugging Face +hf auth login + +# Run the conversion script to convert the Hugging Face checkpoint to MaxText format +bash scripts/run_qwen3_30b_hf_to_maxtext.sh + +# Deactivate the virtual environment +deactivate +rm -rf tpu_venv +``` + +## Run RL Workload + +### Submit your workload + +```bash +# Create and activate a virtual environment +uv venv --python 3.12 --seed runner_venv +source runner_venv/bin/activate +uv pip install -e .[runner] --resolution=lowest + +# Run the RL training script on your cluster +bash scripts/run_qwen3_30b_rl.sh + +# Deactivate the virtual environment +deactivate +rm -rf runner_venv +``` + +### Monitor your workload + +To monitor your job's progress, you can use `kubectl` to check the `Jobset` status and stream logs directly from the pods. + +```bash +kubectl get jobset -n default ${WORKLOAD_NAME} + +# List pods to find the specific name +kubectl get pods | grep ${WORKLOAD_NAME} + +# stream the logs from the running pod (replace with the name you found) +kubectl logs -f +``` + +Alternatively, after running the bash script, you will also get a link to the Google Cloud Console to view your workload logs. Follow the link to view logs and monitor your workload's progress in the Cloud Console. + +## Convert Checkpoint to Hugging Face Format + +After training, you may want to convert your MaxText checkpoint back to Hugging Face format. Use the following script to perform the conversion: + +```bash +# Create and activate a virtual environment +uv venv --python 3.12 --seed tpu_venv +source tpu_venv/bin/activate +uv pip install -e .[tpu] --resolution=lowest + +# Authenticate with Hugging Face +hf auth login + +# Run the conversion script to convert the MaxText checkpoint back to Hugging Face format +bash scripts/run_qwen3_30b_maxtext_to_hf.sh + +# Deactivate the virtual environment +deactivate +rm -rf tpu_venv +``` diff --git a/scripts/run_qwen3_30b_hf_to_maxtext.sh b/scripts/run_qwen3_30b_hf_to_maxtext.sh new file mode 100644 index 0000000000..5631b35123 --- /dev/null +++ b/scripts/run_qwen3_30b_hf_to_maxtext.sh @@ -0,0 +1,45 @@ +#!/bin/bash + +# This script converts a Qwen3-30B-A3B model checkpoint from the Hugging Face +# format to the MaxText format. It requires the BASE_OUTPUT_DIRECTORY environment +# variable to be set to a GCS path where the converted checkpoint will be stored. + +set -e + +# --- Environment Setup --- +if ! pip show maxtext &> /dev/null; then + echo "maxtext not found in the environment. Please install it by running:" + echo "uv pip install -e .[tpu] --resolution=lowest" + exit 1 +fi + +# --- Environment Variables --- +export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for outputs (e.g., gs://my-bucket/outputs) + +# --- Variable Validation --- +if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then + echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable." + exit 1 +fi + +# Install torch for conversion +echo "Installing torch for checkpoint conversion..." +python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu + +# Define the output path for the converted checkpoint +CONVERTED_CKPT_BASE_DIR="${BASE_OUTPUT_DIRECTORY}/checkpoints/qwen3-30b-a3b-converted" +echo "Converted checkpoint will be saved to: ${CONVERTED_CKPT_BASE_DIR}" + +# Run the conversion script +python3 -m maxtext.checkpoint_conversion.to_maxtext \ + src/maxtext/configs/base.yml \ + model_name=qwen3-30b-a3b \ + base_output_directory="${CONVERTED_CKPT_BASE_DIR}" \ + scan_layers=True \ + weight_dtype=bfloat16 hardware=cpu skip_jax_distributed_system=True \ + checkpoint_storage_use_ocdbt=False checkpoint_storage_use_zarr3=False \ + --eager_load_method=safetensors + +# Set MAXTEXT_CKPT_PATH to the newly created checkpoint path +export MAXTEXT_CKPT_PATH="${CONVERTED_CKPT_BASE_DIR}/0/items" +echo "Conversion complete. Using checkpoint at: ${MAXTEXT_CKPT_PATH}" diff --git a/scripts/run_qwen3_30b_maxtext_to_hf.sh b/scripts/run_qwen3_30b_maxtext_to_hf.sh new file mode 100644 index 0000000000..35b529efbb --- /dev/null +++ b/scripts/run_qwen3_30b_maxtext_to_hf.sh @@ -0,0 +1,48 @@ +#!/bin/bash + +# This script converts a Qwen3-30B-A3B model checkpoint from the MaxText +# format to the Hugging Face format. It requires the MAXTEXT_CKPT_PATH and +# BASE_OUTPUT_DIRECTORY environment variables to be set. + +set -e + +# --- Environment Setup --- +if ! pip show maxtext &> /dev/null; then + echo "maxtext not found in the environment. Please install it by running:" + echo "uv pip install -e .[tpu] --resolution=lowest" + exit 1 +fi + +# --- Environment Variables --- +export MAXTEXT_CKPT_PATH="${MAXTEXT_CKPT_PATH:-}" # GCS path to the MaxText checkpoint to convert +export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for storing the converted HF checkpoint + +# --- Variable Validation --- +if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then + echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable." + exit 1 +fi + +if [ -z "$MAXTEXT_CKPT_PATH" ]; then + echo "Error: MAXTEXT_CKPT_PATH is not set. Please set it in the script or as an environment variable." + exit 1 +fi + +# Install torch for conversion +echo "Installing torch for checkpoint conversion..." +python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu + +# Define the output path for the converted checkpoint +HF_CKPT_OUTPUT_DIR="${BASE_OUTPUT_DIRECTORY}/checkpoints/qwen3-30b-a3b-hf-converted" +echo "Converted Hugging Face checkpoint will be saved to: ${HF_CKPT_OUTPUT_DIR}" + +# Run the conversion script +python -m maxtext.checkpoint_conversion.to_huggingface \ + src/maxtext/configs/base.yml \ + model_name=qwen3-30b-a3b \ + load_parameters_path="${MAXTEXT_CKPT_PATH}" \ + base_output_directory="${HF_CKPT_OUTPUT_DIR}" \ + scan_layers=True \ + weight_dtype=bfloat16 hardware=cpu skip_jax_distributed_system=True + +echo "Conversion to Hugging Face format complete. Checkpoint saved to: ${HF_CKPT_OUTPUT_DIR}" diff --git a/scripts/run_qwen3_30b_rl.sh b/scripts/run_qwen3_30b_rl.sh new file mode 100644 index 0000000000..c86d7ff1c8 --- /dev/null +++ b/scripts/run_qwen3_30b_rl.sh @@ -0,0 +1,147 @@ +#!/bin/bash + +# This script launches a Reinforcement Learning (RL) training workload for the +# Qwen3-30B-A3B model on a GKE cluster using XPK. + +set -e + +# --- Environment Setup --- +if ! pip show xpk &> /dev/null; then + echo "xpk not found in the environment. Please install it by running:" + echo "uv pip install -e .[runner] --resolution=lowest" + exit 1 +fi + +# --- Environment Variables --- +export PROJECT_ID="${PROJECT_ID:-}" # GCP project ID where the Ironwood cluster is deployed +export CLUSTER_NAME="${CLUSTER_NAME:-}" # Name of your Ironwood cluster +export ZONE="${ZONE:-}" # Zone where your Ironwood cluster is deployed +export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for outputs (e.g., gs://my-bucket/outputs) +export DOCKER_IMAGE="${DOCKER_IMAGE:-}" # Full path to the Docker image you pushed (e.g., gcr.io/my-project/my-image:tag) +export MAXTEXT_CKPT_PATH="${MAXTEXT_CKPT_PATH:-}" # GCS path of the MaxText checkpoint you want to fine-tune from (e.g., gs://my-bucket/checkpoints/maxtext-ckpt) +export TPU_TYPE="tpu7x-128" +export WORKLOAD_NAME="rl-$(date +%Y%m%d-%H%M)" + +# --- Variable Validation --- +if [ -z "$PROJECT_ID" ]; then + echo "Error: PROJECT_ID is not set. Please set it in the script or as an environment variable." + exit 1 +fi +if [ -z "$CLUSTER_NAME" ]; then + echo "Error: CLUSTER_NAME is not set. Please set it in the script or as an environment variable." + exit 1 +fi +if [ -z "$ZONE" ]; then + echo "Error: ZONE is not set. Please set it in the script or as an environment variable." + exit 1 +fi +if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then + echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable." + exit 1 +fi +if [ -z "$DOCKER_IMAGE" ]; then + echo "Error: DOCKER_IMAGE is not set. Please set it in the script or as an environment variable." + exit 1 +fi + +if [ -z "$MAXTEXT_CKPT_PATH" ]; then + echo "MAXTEXT_CKPT_PATH is not set. Please set it in the script or as an environment variable." + exit 1 +fi + +# XLA Flags +XLA_FLAGS="--xla_tpu_dvfs_p_state=7 \ +--xla_tpu_scoped_vmem_limit_kib=65536 \ +--xla_tpu_num_sparse_cores_for_gather_offloading=1 \ +--xla_tpu_bf16_emission_mode=NATIVE_EMISSION \ +--xla_tpu_enable_sparse_core_reduce_scatter_v2=true \ +--xla_tpu_enable_sparse_core_collective_offload_all_gather=true \ +--xla_tpu_enable_sparse_core_collective_offload_2d_all_gather=true \ +--xla_tpu_use_tc_device_shape_on_sc=True \ +--xla_sc_disable_megacore_partitioning=True \ +--xla_tpu_enable_async_collective_fusion_fuse_all_gather=false \ +--xla_enable_async_all_gather=true \ +--xla_tpu_prefer_async_allgather_to_allreduce=true \ +--xla_tpu_enable_sparse_core_collective_offload_all_reduce=true \ +--xla_tpu_enable_sparse_core_collective_offload_reduce_scatter=true \ +--xla_tpu_enable_sparse_core_collective_offload_3d_all_gather=true \ +--xla_tpu_use_single_sparse_core_for_all_gather_offload=true \ +--xla_tpu_enable_concurrent_sparse_core_offloading=true \ +--xla_tpu_enable_offloading_gather_to_sparsecore=true \ +--xla_tpu_sparse_core_all_gather_latency_multiplier=1 \ +--xla_tpu_sparse_core_reduce_scatter_latency_multiplier=3 \ +--xla_tpu_enable_sparse_core_collective_aggregator=true \ +--xla_tpu_enable_latency_hiding_layer_scheduler=true \ +--xla_tpu_scheduler_percent_shared_memory_limit=150 \ +--xla_tpu_enable_layer_scheduler_for_dependent_collectives=true \ +--xla_tpu_enable_sparse_core_collective_offload_nd_reduce_scatter=true \ +--xla_tpu_pcie_bandwidth_multiplier=0.03 \ +--xla_tpu_enable_sparse_core_offload_queuing_in_lhs=true \ +--xla_tpu_enable_multi_compute_overlap_in_layer_scheduler=false \ +--xla_tpu_enable_3d_reduce_scatter_decomposer=false" + +# MaxText command +MAXTEXT_COMMAND="JAX_RANDOM_WEIGHTS=1 \ +VLLM_ENABLE_V1_MULTIPROCESSING=0 \ +NEW_MODEL_DESIGN=1 \ +TPU_MIN_LOG_LEVEL=0 \ +TF_CPP_MIN_LOG_LEVEL=0 \ +TPU_STDERR_LOG_LEVEL=0 \ +JAX_PLATFORMS=proxy,cpu \ +JAX_BACKEND_TARGET=grpc://127.0.0.1:29000 \ +ENABLE_PATHWAYS_PERSISTENCE=1 \ +python3 -m maxtext.trainers.post_train.rl.train_rl \ +model_name=qwen3-30b-a3b \ +tokenizer_path=Qwen/Qwen3-30B-A3B-Base \ +run_name=$WORKLOAD_NAME \ +async_scheduling=True \ +base_output_directory=$BASE_OUTPUT_DIRECTORY \ +chips_per_vm=8 \ +num_batches=500 \ +num_test_batches=10 \ +rl.num_generations=8 \ +rl.grpo_beta=0.05 \ +rl.grpo_epsilon=0.2 \ +gradient_clipping_threshold=1.0 \ +decode_sampling_temperature=0.8 \ +decode_sampling_top_k=50 \ +decode_sampling_nucleus_p=0.95 \ +dataset_name=nvidia/OpenMathInstruct-2 \ +hf_train_files=hf://datasets/nvidia/OpenMathInstruct-2/data/train_1M-*.parquet \ +train_split=train_1M \ +eval_dataset_name=nvidia/OpenMathInstruct-2 \ +eval_mode=pass_at_1 \ +num_eval_passes=4 \ +max_target_length=8192 \ +max_prefill_predict_length=512 \ +learning_rate=1e-6 \ +batch_size=128 \ +train_micro_batch_size=16 \ +rollout_micro_batch_size=128 \ +rollout_data_parallelism=16 \ +rollout_tensor_parallelism=4 \ +enable_dp_attention=True \ +hbm_utilization_vllm=0.75 \ +max_num_seqs=256 \ +max_num_batched_tokens=8192 \ +scan_layers=True \ +allow_split_physical_axes=True \ +enable_tunix_perf_metrics=True \ +checkpoint_period=2 \ +max_num_checkpoints_to_keep=1000 \ +enable_checkpointing=true \ +load_parameters_path=$MAXTEXT_CKPT_PATH" + +# Workload Creation +xpk workload create-pathways \ + --cluster=$CLUSTER_NAME \ + --project=$PROJECT_ID \ + --zone=$ZONE \ + --priority=medium \ + --max-restarts=0 \ + --tpu-type=$TPU_TYPE \ + --num-slices=1 \ + --docker-image="${DOCKER_IMAGE}" \ + --workload="${WORKLOAD_NAME}" \ + --custom-pathways-proxy-server-args='${XLA_FLAGS}' \ + --command="${MAXTEXT_COMMAND}"