diff --git a/docs/tutorials/post_training_index.md b/docs/tutorials/post_training_index.md
index d277cfb4be..89a4065867 100644
--- a/docs/tutorials/post_training_index.md
+++ b/docs/tutorials/post_training_index.md
@@ -1,7 +1,7 @@
 # Post-training
 
 ```{note}
-Post-training workflows on TPU require specific dependencies. Please ensure you have installed MaxText with `maxtext[tpu-post-train]` as described in the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html).
+Post-training workflows on TPU require specific dependencies. Please ensure you have installed MaxText with `maxtext[tpu-post-train]` as described in the [official documentation](../install_maxtext.md).
 ```
 
 ## What is MaxText post-training?
@@ -14,7 +14,7 @@ We’re investing in performance, scale, algorithms, models, reliability, and ea
 
 MaxText was co-designed with key Google led innovations to provide a unified post training experience:
 
-- [MaxText model library](https://maxtext.readthedocs.io/en/latest/reference/models/supported_models_and_architectures.html#supported-model-families) for JAX LLMs highly optimized for TPUs
+- [MaxText model library](../reference/models/supported_models_and_architectures.md#supported-model-families) for JAX LLMs highly optimized for TPUs
 - [Tunix](https://github.com/google/tunix) for the latest algorithms and post-training techniques
 - [vLLM on TPU](https://github.com/vllm-project/tpu-inference) for high performance sampling (inference) for Reinforcement Learning (RL)
 - [Pathways](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro) for multi-host inference (sampling) and highly efficient weight transfer
@@ -24,15 +24,16 @@ MaxText was co-designed with key Google led innovations to provide a unified pos
 ## Supported techniques & models
 
 - **SFT (Supervised Fine-Tuning)**
-  - [SFT on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft.html)
-  - [SFT on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html)
+  - [SFT on Single-Host TPUs](../tutorials/posttraining/sft.md)
+  - [SFT on Multi-Host TPUs](../tutorials/posttraining/sft_on_multi_host.md)
 - **LoRA (Low-Rank Adaptation)**
-  - [LoRA on Single-Host TPUs](posttraining/lora.md)
+  - [LoRA on Single-Host TPUs](../tutorials/posttraining/lora.md)
 - **Multimodal SFT**
-  - [Multimodal Support](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/multimodal.html)
+  - [Multimodal Support](../tutorials/posttraining/multimodal.md)
 - **Reinforcement Learning (RL)**
-  - [RL on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl.html)
-  - [RL on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html)
+  - [RL on Single-Host TPUs](../tutorials/posttraining/rl.md)
+  - [RL on Multi-Host TPUs](../tutorials/posttraining/rl_on_multi_host.md)
+  - [RL with Qwen3-30b-a3b](../tutorials/posttraining/rl_qwen3_30b.md)
 
 ## Step by step RL
 
@@ -57,7 +58,7 @@ Pathways supercharges RL with:
 
 ## Getting started
 
-Start your Post-Training journey through quick experimentation with [Python Notebooks](https://maxtext.readthedocs.io/en/latest/guides/run_python_notebook.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) and [RL](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html).
+Start your Post-Training journey through quick experimentation with [Python Notebooks](../guides/run_python_notebook.md) or our Production level tutorials for [SFT](../tutorials/posttraining/sft_on_multi_host.md) and [RL](../tutorials/posttraining/rl_on_multi_host.md).
 
 ## More tutorials
 
@@ -69,6 +70,7 @@ posttraining/sft.md
 posttraining/sft_on_multi_host.md
 posttraining/rl.md
 posttraining/rl_on_multi_host.md
+posttraining/rl_qwen3_30b.md
 posttraining/knowledge_distillation.md
 posttraining/lora.md
 posttraining/multimodal.md
diff --git a/docs/tutorials/posttraining/rl_qwen3_30b.md b/docs/tutorials/posttraining/rl_qwen3_30b.md
new file mode 100644
index 0000000000..28358d1152
--- /dev/null
+++ b/docs/tutorials/posttraining/rl_qwen3_30b.md
@@ -0,0 +1,160 @@
+<!--
+ Copyright 2023-2026 Google LLC
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      https://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ -->
+
+# Reinforcement Learning with Qwen3-30b-a3b on Multi-Host TPUs
+
+This tutorial provides step-by-step instructions for setting up the environment
+and training the Qwen3-30b-a3b model on the [OpenMathInstruct-2 dataset](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) on Ironwood GKE cluster with `tpu7x-128` nodes.
+
+## Prerequisites
+
+Before starting, ensure you have:
+
+- Access to a Google Cloud Project with TPU quotas.
+- A Hugging Face account with an access token for downloading models.
+- Permissions for Google Artifact Registry (Artifact Registry Writer role).
+- Prerequisites for XPK installed (follow [official documentation](https://github.com/AI-Hypercomputer/xpk/blob/main/docs/installation.md#1-prerequisites)).
+- A Pathways-ready GKE cluster (see [create GKE cluster](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster)).
+- **Docker** installed and configured for sudoless use. Follow the steps to [configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/).
+
+## Build and Upload MaxText Docker Image
+
+For instructions on building and uploading the MaxText Docker image with post-training dependencies, please refer to the [official documentation](../../build_maxtext.md).
+
+## Setup Environment Variables
+
+Set up the following environment variables to configure your training run. Replace
+placeholders with your actual values.
+
+```bash
+# Your GCP project ID.
+# If you've already set it in your local config, you can retrieve it via:
+# gcloud config get-value project
+export PROJECT_ID=<PROJECT_ID>
+
+# The name of your GKE cluster.
+export CLUSTER_NAME=<CLUSTER_NAME>
+
+# The GCP location of your GKE cluster.
+export ZONE=<ZONE> # e.g., 'us-central1' or 'us-central1-a'
+
+# Use a GCS bucket you own to store logs and checkpoints.
+export BASE_OUTPUT_DIRECTORY=<GCS_BUCKET> # e.g., gs://my-bucket/maxtext-runs
+
+# The Docker image you pushed in the previous step
+export CLOUD_IMAGE_NAME=<IMAGE_NAME>
+export DOCKER_IMAGE="gcr.io/${PROJECT_ID?}/${CLOUD_IMAGE_NAME?}"
+```
+
+# Clone MaxText Repository
+
+If you haven't already, clone the MaxText repository to your local machine:
+
+```bash
+git clone https://github.com/AI-Hypercomputer/maxtext.git
+cd maxtext
+```
+
+## Get Your MaxText Compatible Model Checkpoint
+
+### Option 1: Using an existing MaxText checkpoint
+
+If you already have a MaxText-compatible model checkpoint, simply set the
+following environment variable and move on to the next section.
+
+```bash
+export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
+```
+
+### Option 2: Converting from a Hugging Face checkpoint
+
+> **Note:** Converting the 30B model requires approximately 62 GB of free disk space to download its safetensors. Please verify you have sufficient space before running the conversion script.
+
+```bash
+# Optional: If you run out of disk space when downloading Hugging Face safetensors,
+# customize your "HF_HOME" to redirect the cache to a larger or mounted disk (e.g., on a TPU VM).
+# export HF_HOME="/dev/shm/huggingface_tmp"
+
+# Create and activate a virtual environment
+uv venv --python 3.12 --seed tpu_venv
+source tpu_venv/bin/activate
+uv pip install -e .[tpu] --resolution=lowest
+
+# Authenticate with Hugging Face
+hf auth login
+
+# Run the conversion script to convert the Hugging Face checkpoint to MaxText format
+bash scripts/run_qwen3_30b_hf_to_maxtext.sh
+
+# Deactivate the virtual environment
+deactivate
+rm -rf tpu_venv
+```
+
+## Run RL Workload
+
+### Submit your workload
+
+```bash
+# Create and activate a virtual environment
+uv venv --python 3.12 --seed runner_venv
+source runner_venv/bin/activate
+uv pip install -e .[runner] --resolution=lowest
+
+# Run the RL training script on your cluster
+bash scripts/run_qwen3_30b_rl.sh
+
+# Deactivate the virtual environment
+deactivate
+rm -rf runner_venv
+```
+
+### Monitor your workload
+
+To monitor your job's progress, you can use `kubectl` to check the `Jobset` status and stream logs directly from the pods.
+
+```bash
+kubectl get jobset -n default ${WORKLOAD_NAME}
+
+# List pods to find the specific name
+kubectl get pods | grep ${WORKLOAD_NAME}
+
+# stream the logs from the running pod (replace <POD_NAME> with the name you found)
+kubectl logs -f <POD_NAME>
+```
+
+Alternatively, after running the bash script, you will also get a link to the Google Cloud Console to view your workload logs. Follow the link to view logs and monitor your workload's progress in the Cloud Console.
+
+## Convert Checkpoint to Hugging Face Format
+
+After training, you may want to convert your MaxText checkpoint back to Hugging Face format. Use the following script to perform the conversion:
+
+```bash
+# Create and activate a virtual environment
+uv venv --python 3.12 --seed tpu_venv
+source tpu_venv/bin/activate
+uv pip install -e .[tpu] --resolution=lowest
+
+# Authenticate with Hugging Face
+hf auth login
+
+# Run the conversion script to convert the MaxText checkpoint back to Hugging Face format 
+bash scripts/run_qwen3_30b_maxtext_to_hf.sh
+
+# Deactivate the virtual environment
+deactivate
+rm -rf tpu_venv
+```
diff --git a/scripts/run_qwen3_30b_hf_to_maxtext.sh b/scripts/run_qwen3_30b_hf_to_maxtext.sh
new file mode 100644
index 0000000000..5631b35123
--- /dev/null
+++ b/scripts/run_qwen3_30b_hf_to_maxtext.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+
+# This script converts a Qwen3-30B-A3B model checkpoint from the Hugging Face
+# format to the MaxText format. It requires the BASE_OUTPUT_DIRECTORY environment
+# variable to be set to a GCS path where the converted checkpoint will be stored.
+
+set -e
+
+# --- Environment Setup ---
+if ! pip show maxtext &> /dev/null; then
+    echo "maxtext not found in the environment. Please install it by running:"
+    echo "uv pip install -e .[tpu] --resolution=lowest"
+    exit 1
+fi
+
+# --- Environment Variables ---
+export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for outputs (e.g., gs://my-bucket/outputs)
+
+# --- Variable Validation ---
+if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then
+    echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+
+# Install torch for conversion
+echo "Installing torch for checkpoint conversion..."
+python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
+
+# Define the output path for the converted checkpoint
+CONVERTED_CKPT_BASE_DIR="${BASE_OUTPUT_DIRECTORY}/checkpoints/qwen3-30b-a3b-converted"
+echo "Converted checkpoint will be saved to: ${CONVERTED_CKPT_BASE_DIR}"
+
+# Run the conversion script
+python3 -m maxtext.checkpoint_conversion.to_maxtext \
+    src/maxtext/configs/base.yml \
+    model_name=qwen3-30b-a3b \
+    base_output_directory="${CONVERTED_CKPT_BASE_DIR}" \
+    scan_layers=True \
+    weight_dtype=bfloat16 hardware=cpu skip_jax_distributed_system=True \
+    checkpoint_storage_use_ocdbt=False checkpoint_storage_use_zarr3=False \
+    --eager_load_method=safetensors
+
+# Set MAXTEXT_CKPT_PATH to the newly created checkpoint path
+export MAXTEXT_CKPT_PATH="${CONVERTED_CKPT_BASE_DIR}/0/items"
+echo "Conversion complete. Using checkpoint at: ${MAXTEXT_CKPT_PATH}"
diff --git a/scripts/run_qwen3_30b_maxtext_to_hf.sh b/scripts/run_qwen3_30b_maxtext_to_hf.sh
new file mode 100644
index 0000000000..35b529efbb
--- /dev/null
+++ b/scripts/run_qwen3_30b_maxtext_to_hf.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+
+# This script converts a Qwen3-30B-A3B model checkpoint from the MaxText
+# format to the Hugging Face format. It requires the MAXTEXT_CKPT_PATH and
+# BASE_OUTPUT_DIRECTORY environment variables to be set.
+
+set -e
+
+# --- Environment Setup ---
+if ! pip show maxtext &> /dev/null; then
+    echo "maxtext not found in the environment. Please install it by running:"
+    echo "uv pip install -e .[tpu] --resolution=lowest"
+    exit 1
+fi
+
+# --- Environment Variables ---
+export MAXTEXT_CKPT_PATH="${MAXTEXT_CKPT_PATH:-}" # GCS path to the MaxText checkpoint to convert
+export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for storing the converted HF checkpoint
+
+# --- Variable Validation ---
+if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then
+    echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+
+if [ -z "$MAXTEXT_CKPT_PATH" ]; then
+    echo "Error: MAXTEXT_CKPT_PATH is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+
+# Install torch for conversion
+echo "Installing torch for checkpoint conversion..."
+python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
+
+# Define the output path for the converted checkpoint
+HF_CKPT_OUTPUT_DIR="${BASE_OUTPUT_DIRECTORY}/checkpoints/qwen3-30b-a3b-hf-converted"
+echo "Converted Hugging Face checkpoint will be saved to: ${HF_CKPT_OUTPUT_DIR}"
+
+# Run the conversion script
+python -m maxtext.checkpoint_conversion.to_huggingface \
+    src/maxtext/configs/base.yml \
+    model_name=qwen3-30b-a3b \
+    load_parameters_path="${MAXTEXT_CKPT_PATH}" \
+    base_output_directory="${HF_CKPT_OUTPUT_DIR}" \
+    scan_layers=True \
+    weight_dtype=bfloat16 hardware=cpu skip_jax_distributed_system=True
+
+echo "Conversion to Hugging Face format complete. Checkpoint saved to: ${HF_CKPT_OUTPUT_DIR}"
diff --git a/scripts/run_qwen3_30b_rl.sh b/scripts/run_qwen3_30b_rl.sh
new file mode 100644
index 0000000000..c86d7ff1c8
--- /dev/null
+++ b/scripts/run_qwen3_30b_rl.sh
@@ -0,0 +1,147 @@
+#!/bin/bash
+
+# This script launches a Reinforcement Learning (RL) training workload for the
+# Qwen3-30B-A3B model on a GKE cluster using XPK.
+
+set -e
+
+# --- Environment Setup ---
+if ! pip show xpk &> /dev/null; then
+    echo "xpk not found in the environment. Please install it by running:"
+    echo "uv pip install -e .[runner] --resolution=lowest"
+    exit 1
+fi
+
+# --- Environment Variables ---
+export PROJECT_ID="${PROJECT_ID:-}" # GCP project ID where the Ironwood cluster is deployed
+export CLUSTER_NAME="${CLUSTER_NAME:-}" # Name of your Ironwood cluster
+export ZONE="${ZONE:-}" # Zone where your Ironwood cluster is deployed
+export BASE_OUTPUT_DIRECTORY="${BASE_OUTPUT_DIRECTORY:-}" # GCS bucket path for outputs (e.g., gs://my-bucket/outputs)
+export DOCKER_IMAGE="${DOCKER_IMAGE:-}" # Full path to the Docker image you pushed (e.g., gcr.io/my-project/my-image:tag)
+export MAXTEXT_CKPT_PATH="${MAXTEXT_CKPT_PATH:-}" # GCS path of the MaxText checkpoint you want to fine-tune from (e.g., gs://my-bucket/checkpoints/maxtext-ckpt)
+export TPU_TYPE="tpu7x-128"
+export WORKLOAD_NAME="rl-$(date +%Y%m%d-%H%M)"
+
+# --- Variable Validation ---
+if [ -z "$PROJECT_ID" ]; then
+    echo "Error: PROJECT_ID is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+if [ -z "$CLUSTER_NAME" ]; then
+    echo "Error: CLUSTER_NAME is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+if [ -z "$ZONE" ]; then
+    echo "Error: ZONE is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+if [ -z "$BASE_OUTPUT_DIRECTORY" ]; then
+    echo "Error: BASE_OUTPUT_DIRECTORY is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+if [ -z "$DOCKER_IMAGE" ]; then
+    echo "Error: DOCKER_IMAGE is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+
+if [ -z "$MAXTEXT_CKPT_PATH" ]; then
+    echo "MAXTEXT_CKPT_PATH is not set. Please set it in the script or as an environment variable."
+    exit 1
+fi
+
+# XLA Flags
+XLA_FLAGS="--xla_tpu_dvfs_p_state=7 \
+--xla_tpu_scoped_vmem_limit_kib=65536 \
+--xla_tpu_num_sparse_cores_for_gather_offloading=1 \
+--xla_tpu_bf16_emission_mode=NATIVE_EMISSION \
+--xla_tpu_enable_sparse_core_reduce_scatter_v2=true \
+--xla_tpu_enable_sparse_core_collective_offload_all_gather=true \
+--xla_tpu_enable_sparse_core_collective_offload_2d_all_gather=true \
+--xla_tpu_use_tc_device_shape_on_sc=True \
+--xla_sc_disable_megacore_partitioning=True \
+--xla_tpu_enable_async_collective_fusion_fuse_all_gather=false \
+--xla_enable_async_all_gather=true \
+--xla_tpu_prefer_async_allgather_to_allreduce=true \
+--xla_tpu_enable_sparse_core_collective_offload_all_reduce=true \
+--xla_tpu_enable_sparse_core_collective_offload_reduce_scatter=true \
+--xla_tpu_enable_sparse_core_collective_offload_3d_all_gather=true \
+--xla_tpu_use_single_sparse_core_for_all_gather_offload=true \
+--xla_tpu_enable_concurrent_sparse_core_offloading=true \
+--xla_tpu_enable_offloading_gather_to_sparsecore=true \
+--xla_tpu_sparse_core_all_gather_latency_multiplier=1 \
+--xla_tpu_sparse_core_reduce_scatter_latency_multiplier=3 \
+--xla_tpu_enable_sparse_core_collective_aggregator=true \
+--xla_tpu_enable_latency_hiding_layer_scheduler=true \
+--xla_tpu_scheduler_percent_shared_memory_limit=150 \
+--xla_tpu_enable_layer_scheduler_for_dependent_collectives=true \
+--xla_tpu_enable_sparse_core_collective_offload_nd_reduce_scatter=true \
+--xla_tpu_pcie_bandwidth_multiplier=0.03 \
+--xla_tpu_enable_sparse_core_offload_queuing_in_lhs=true \
+--xla_tpu_enable_multi_compute_overlap_in_layer_scheduler=false \
+--xla_tpu_enable_3d_reduce_scatter_decomposer=false"
+
+# MaxText command
+MAXTEXT_COMMAND="JAX_RANDOM_WEIGHTS=1 \
+VLLM_ENABLE_V1_MULTIPROCESSING=0 \
+NEW_MODEL_DESIGN=1 \
+TPU_MIN_LOG_LEVEL=0 \
+TF_CPP_MIN_LOG_LEVEL=0 \
+TPU_STDERR_LOG_LEVEL=0 \
+JAX_PLATFORMS=proxy,cpu \
+JAX_BACKEND_TARGET=grpc://127.0.0.1:29000 \
+ENABLE_PATHWAYS_PERSISTENCE=1 \
+python3 -m maxtext.trainers.post_train.rl.train_rl \
+model_name=qwen3-30b-a3b \
+tokenizer_path=Qwen/Qwen3-30B-A3B-Base \
+run_name=$WORKLOAD_NAME \
+async_scheduling=True \
+base_output_directory=$BASE_OUTPUT_DIRECTORY \
+chips_per_vm=8 \
+num_batches=500 \
+num_test_batches=10 \
+rl.num_generations=8 \
+rl.grpo_beta=0.05 \
+rl.grpo_epsilon=0.2 \
+gradient_clipping_threshold=1.0 \
+decode_sampling_temperature=0.8 \
+decode_sampling_top_k=50 \
+decode_sampling_nucleus_p=0.95 \
+dataset_name=nvidia/OpenMathInstruct-2 \
+hf_train_files=hf://datasets/nvidia/OpenMathInstruct-2/data/train_1M-*.parquet \
+train_split=train_1M \
+eval_dataset_name=nvidia/OpenMathInstruct-2 \
+eval_mode=pass_at_1 \
+num_eval_passes=4 \
+max_target_length=8192 \
+max_prefill_predict_length=512 \
+learning_rate=1e-6 \
+batch_size=128 \
+train_micro_batch_size=16 \
+rollout_micro_batch_size=128 \
+rollout_data_parallelism=16 \
+rollout_tensor_parallelism=4 \
+enable_dp_attention=True \
+hbm_utilization_vllm=0.75 \
+max_num_seqs=256 \
+max_num_batched_tokens=8192 \
+scan_layers=True \
+allow_split_physical_axes=True \
+enable_tunix_perf_metrics=True \
+checkpoint_period=2 \
+max_num_checkpoints_to_keep=1000 \
+enable_checkpointing=true \
+load_parameters_path=$MAXTEXT_CKPT_PATH"
+
+# Workload Creation
+xpk workload create-pathways \
+  --cluster=$CLUSTER_NAME \
+  --project=$PROJECT_ID \
+  --zone=$ZONE \
+  --priority=medium \
+  --max-restarts=0 \
+  --tpu-type=$TPU_TYPE \
+  --num-slices=1 \
+  --docker-image="${DOCKER_IMAGE}" \
+  --workload="${WORKLOAD_NAME}" \
+  --custom-pathways-proxy-server-args='${XLA_FLAGS}' \
+  --command="${MAXTEXT_COMMAND}"