From 6c580d8e1d1556f88f422809a85fafda0c742144 Mon Sep 17 00:00:00 2001
From: Ye Yu <yeyu@nvidia.com>
Date: Mon, 11 May 2026 10:55:40 -0700
Subject: [PATCH 1/3] feat(okr30): add EAGLE3 Claude Code skills for triage,
 validation, and new-model support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Four user-invocable skills for the EAGLE3 offline pipeline:

- eagle3-triage: diagnose failed pipeline runs step-by-step; failure tables
  for all 4 tasks (vLLM data synthesis, hidden state dump with 3 backends,
  training, benchmark); new-model-specific issue checklist
- eagle3-validate: verify completed runs; artifact checks; AR threshold (>= 2.1);
  structured validation report with next-step guidance
- eagle3-new-model: guided workflow for adding a new model; architecture lookup,
  GPU/TP calculation for GB200, backend selection, full YAML template with
  correct public-launcher script paths
- eagle3-review-logs: lightweight log reader; finds sbatch .out files, reads all
  task logs, produces pass/fail summary with root causes

Skills use public launcher paths (common/eagle3/, common/vllm/, etc.) and read
sbatch .out files directly — no sandbox-specific tooling required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
---
 .claude/skills/eagle3-new-model/SKILL.md   | 215 +++++++++++++++++++++
 .claude/skills/eagle3-review-logs/SKILL.md |  96 +++++++++
 .claude/skills/eagle3-triage/SKILL.md      | 177 +++++++++++++++++
 .claude/skills/eagle3-validate/SKILL.md    | 121 ++++++++++++
 4 files changed, 609 insertions(+)
 create mode 100644 .claude/skills/eagle3-new-model/SKILL.md
 create mode 100644 .claude/skills/eagle3-review-logs/SKILL.md
 create mode 100644 .claude/skills/eagle3-triage/SKILL.md
 create mode 100644 .claude/skills/eagle3-validate/SKILL.md
diff --git a/.claude/skills/eagle3-new-model/SKILL.md b/.claude/skills/eagle3-new-model/SKILL.md
new file mode 100644
index 00000000000..df9cc90a9a9
--- /dev/null
+++ b/.claude/skills/eagle3-new-model/SKILL.md
@@ -0,0 +1,215 @@
+---
+name: eagle3-new-model
+description: >
+  Add a new model to the EAGLE3 offline pipeline. Generates an hf_offline_eagle3.yaml
+  launcher config for a new model checkpoint, choosing the right hidden state dump
+  backend (TRT-LLM / HF / vLLM) and GPU configuration.
+  Use when user wants to run EAGLE3 on a model that does not yet have a YAML in
+  tools/launcher/examples/ or asks how to configure the pipeline for a new checkpoint.
+---
+
+# EAGLE3 New Model Configuration
+
+This skill guides you through creating `tools/launcher/examples/<Org>/<Model>/hf_offline_eagle3.yaml`
+for a new model.
+
+## Step 1 — Look up the model architecture
+
+Determine these values from the HuggingFace model card, `config.json`, and vLLM docs:
+
+| Property | Where to find it |
+|---|---|
+| Total / active parameters | Model card |
+| Dense or MoE? | `config.json` → `num_experts`, `num_experts_per_tok` |
+| Attention type (MHA / GQA / MLA / SWA) | Model card |
+| Multimodal? (vision encoder) | Model card |
+| BF16 weight size (GB) | `total_params × 2 bytes` |
+| Special serving flags | vLLM docs, model README (`--trust-remote-code`, parsers) |
+
+## Step 2 — Calculate GPU requirements (OCI-HSG / GB200)
+
+OCI-HSG nodes: **4 GPUs × 192 GB HBM3e = 768 GB per node**
+
+```
+BF16 weight size  = total_params × 2 bytes
+GPUs needed       = ceil(weight_size_GB / 192)
+nodes             = ceil(gpus_needed / 4)
+tp                = min(gpus_needed, 4)
+```
+
+| Model | Weights (BF16) | GPUs | nodes | tp |
+|---|---|---|---|---|
+| 8B dense | ~16 GB | 1 | 1 | 4 |
+| 70B dense | ~140 GB | 1 | 1 | 4 |
+| 685B MoE | ~340 GB | 2 | 1 | 4 |
+| 1T MoE | ~595 GB | 4 | 1 | 4 |
+
+## Step 3 — Choose the hidden state dump backend
+
+| Backend | Script | When to use |
+|---------|--------|-------------|
+| vLLM | `common/eagle3/dump_offline_data_vllm.sh` | Default; broad coverage via vLLM + speculators |
+| HF | `common/eagle3/dump_offline_data_hf.sh` | VLMs, custom-code models, SWA attention |
+| TRT-LLM | `common/eagle3/dump_offline_data.sh` | Pure-text models with TRT-LLM support (needs `--tp`/`--moe-ep`) |
+
+Use **HF** when the model is a VLM or uses sliding window attention (TRT-LLM does not support these).
+Use **vLLM** for everything else as the default.
+
+## Step 4 — Write the YAML
+
+Create `tools/launcher/examples/<Org>/<Model>/hf_offline_eagle3.yaml`.
+Use an existing config as a reference (e.g., `tools/launcher/examples/Qwen/Qwen3.5-35B-A3B/hf_offline_eagle3.yaml`).
+
+### Header comment
+
+```yaml
+# EAGLE3 offline speculative decoding pipeline for <org>/<model>.
+#
+# <Model> is a <size> <dense|MoE> model. <brief notes: attention type, special reqs>
+# BF16 weights ~<size> GB — fits on <N> GB200 node(s) (<N> × 192 GB).
+#
+# <Special requirements (if any)>
+#
+# 4-step pipeline:
+#   task_0: Data synthesis — query vLLM server to generate prompt samples
+#   task_1: Dump hidden states — run target model to capture hidden states
+#   task_2: Offline training — train the EAGLE3 draft head
+#   task_3: Benchmark — evaluate speculative decoding speedup via VLLM
+#
+# Usage:
+#   uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml --yes
+#   uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/<Org>/<Model>/hf_offline_eagle3.yaml --yes
+
+job_name: <Model>_EAGLE3_offline
+pipeline:
+  allow_to_fail: false
+  skip: false
+  note:
+
+  global_vars:
+    hf_model: /hf-local/<org>/<model>
+```
+
+### task_0 — Data synthesis (`common/vllm/query.sh`)
+
+Args before `--` go to the vLLM server; args after `--` go to `query.py`.
+
+```yaml
+  task_0:
+    script: common/vllm/query.sh
+    args:
+      - --model <<global_vars.hf_model>>
+      - --tensor-parallel-size <TP>
+      - --trust-remote-code          # add only if required
+      - --                           # separator
+      - --data /hf-local/modelopt/Speculative-Decoding-Dataset-v2-default
+      - --save /scratchspace/data
+    environment:
+      - HF_LOCAL: /hf-local
+    slurm_config:
+      _factory_: "slurm_factory"
+      nodes: <nodes>
+      ntasks_per_node: 1
+      gpus_per_node: 4
+      container: vllm/vllm-openai:latest
+```
+
+### task_1 — Hidden states (vLLM backend, default)
+
+```yaml
+  task_1:
+    script: common/eagle3/dump_offline_data_vllm.sh
+    args:
+      - --input-data /scratchspace/data
+      - --output-dir /scratchspace/offline_hidden_states
+      - --max-seq-len 8192
+    environment:
+      - HF_MODEL_CKPT: <<global_vars.hf_model>>
+    slurm_config:
+      _factory_: "slurm_factory"
+      nodes: <nodes>
+      ntasks_per_node: 1
+      gpus_per_node: 4
+      container: vllm/vllm-openai:latest
+```
+
+For **HF backend** (VLMs, SWA models), use `dump_offline_data_hf.sh` instead — same args, no TP flags needed.
+
+For **TRT-LLM backend**, use `dump_offline_data.sh` and add `--tp <TP>` and `--moe-ep 1` (or appropriate EP).
+
+### task_2 — Offline training (`common/eagle3/train_eagle.sh`)
+
+```yaml
+  task_2:
+    script: common/eagle3/train_eagle.sh
+    args:
+      - --config modules/Model-Optimizer/modelopt_recipes/general/speculative_decoding/eagle3.yaml
+      - model.model_name_or_path=<<global_vars.hf_model>>
+      - data.offline_data_path=/scratchspace/offline_hidden_states
+      - training.output_dir=/scratchspace/eagle3
+      - training.training_seq_len=4096
+      - training.disable_tqdm=true
+      - training.ar_validate_steps=500000
+    slurm_config:
+      _factory_: "slurm_factory"
+      nodes: 1
+      ntasks_per_node: 1
+      gpus_per_node: 4
+      container: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc10
+```
+
+> **MoE note:** For MoE models with large per-expert hidden dims, consider increasing
+> `intermediate_size` in `eagle_config.json` to match the model's `moe_intermediate_size`.
+
+### task_3 — Benchmark (`common/specdec_bench/quick_check.sh`)
+
+```yaml
+  task_3:
+    script: common/specdec_bench/quick_check.sh
+    args:
+      - --draft_model_dir /scratchspace/export
+      - --draft_length 3
+      - --output_length 4096
+      - --engine VLLM
+      - --tp_size <TP>
+      - --ep_size 1
+      - --speculative_algorithm EAGLE3
+      - --mtbench /hf-local/HuggingFaceH4/mt_bench_prompts/raw/question.jsonl
+      - --concurrency 1
+    environment:
+      - HF_LOCAL: /hf-local
+      - HF_MODEL_CKPT: <<global_vars.hf_model>>
+    slurm_config:
+      _factory_: "slurm_factory"
+      nodes: <nodes>
+      ntasks_per_node: 1
+      gpus_per_node: 4
+      container: vllm/vllm-openai:latest
+```
+
+## Step 5 — Common model-specific adjustments
+
+| Situation | What to change |
+|---|---|
+| Requires `--trust-remote-code` | Add to task_0 vLLM args (before `--`) |
+| VLM / multimodal | Use `dump_offline_data_hf.sh` for task_1 |
+| Sliding window attention | Use `dump_offline_data_hf.sh` or `_vllm.sh` for task_1 |
+| MoE with large expert hidden dim | Increase `intermediate_size` in eagle_config.json |
+| Non-standard attention (MLA) | Verify `eagle_decoder_type` in the eagle3 recipe YAML |
+| Custom tokenizer (e.g., tiktoken) | Set `TIKTOKEN_RS_CACHE_DIR` env var in task_0 and task_1 |
+| NVFP4 quant model | task_0/task_3 use quant container; task_1/task_2 use BF16 base model — add `hf_model_bf16` global_var |
+| Model needs `trust_remote_code` at benchmark | Add `--trust-remote-code` to task_3 args |
+
+## Step 6 — Test with dry run
+
+Preview the resolved config before submitting:
+
+```bash
+uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml --dryrun --yes -v
+```
+
+## Step 7 — Update triage chart
+
+After adding a new model, add a row to the test matrix in
+`tools/launcher/examples/EAGLE3_TRIAGE.md` with status 🔲 (not yet tested).
+Fill in results after running.
diff --git a/.claude/skills/eagle3-review-logs/SKILL.md b/.claude/skills/eagle3-review-logs/SKILL.md
new file mode 100644
index 00000000000..e9e519c5a5d
--- /dev/null
+++ b/.claude/skills/eagle3-review-logs/SKILL.md
@@ -0,0 +1,96 @@
+---
+name: eagle3-review-logs
+description: >
+  Review EAGLE3 pipeline experiment logs from the launcher's experiments/ directory.
+  Summarizes pass/fail status for all 4 tasks, diagnoses failures with root causes
+  and fixes, and flags warnings. Use when the user asks to review job logs,
+  check experiment results, or diagnose why a specific task failed.
+user_invocable: true
+---
+
+# Review EAGLE3 Experiment Logs
+
+Analyze output logs from an EAGLE3 pipeline run launched via `launch.py` or `slurm.py`.
+
+## Step 0 — Find experiment logs
+
+Locate the experiment directory. The default is `experiments/` relative to the launcher root,
+or wherever `--job-dir` was pointed.
+
+```bash
+ls -td experiments/cicd/cicd_* | head -10
+```
+
+Each experiment has one subdirectory per task (0–3). Logs are `sbatch_*.out` files inside:
+
+```bash
+find experiments/<exp_id>/ -name "sbatch_*.out" | sort
+```
+
+Do this in a single Bash call. If no experiments exist, ask the user for the directory.
+
+## Step 1 — Read all task logs
+
+Read the last 200 lines of each log in parallel. Errors appear at the end:
+
+```bash
+for f in $(find experiments/<exp_id>/ -name "sbatch_*.out" | sort); do
+  echo "=== $f ==="; tail -200 "$f"; echo
+done
+```
+
+## Step 2 — Analyze
+
+For each task log, check:
+
+- **Exit / cancellation**: `DUE TO TIME LIMIT`, `FAILED`, signal (e.g., `signal 15`)
+- **Python exceptions / tracebacks**: last exception is usually the root cause
+- **CUDA errors**: OOM, NCCL timeout
+- **Slurm state**: COMPLETED, FAILED, TIMEOUT, OUT_OF_MEMORY
+- **Success indicators**: "Saved N samples", "Successfully processed N conversations", training loss line, AR output
+
+## Step 3 — Produce report
+
+Output a structured markdown report:
+
+### Summary
+- Overall status: PASSED / FAILED / MIXED / PARTIAL
+- Task breakdown: e.g., task_0 TIMEOUT, task_1 FAIL, task_2 skipped, task_3 skipped
+
+### Task Results
+
+For each task (0–3):
+
+**Task N — \<name\>: PASS / FAIL / TIMEOUT**
+- Key output: (e.g., "3277/3295 samples generated" or "Script not found")
+- Error (if failed): quoted error message, max 10 lines
+- Root cause: one-line diagnosis
+- Suggested fix: actionable step
+
+### Warnings
+Non-fatal issues worth noting (near-OOM, tokenizer warnings, slow throughput).
+
+## Step 4 — Suggest next steps
+
+Based on results:
+
+- If a task failed due to a known issue, suggest the fix and how to re-run from that task:
+  ```bash
+  uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml \
+      pipeline.task_0.skip=true \
+      --yes
+  ```
+
+- If the failure pattern is new (not in `tools/launcher/examples/EAGLE3_TRIAGE.md`),
+  suggest adding it to the triage chart using `/eagle3-triage` guidance.
+
+- If all tasks passed, suggest running `/eagle3-validate` to confirm AR meets threshold.
+
+## Known benign patterns (do NOT mark as failures)
+
+| Pattern | Explanation |
+|---|---|
+| vLLM server exit code 143 | SIGTERM — server was killed after queries completed. Expected. |
+| `CANCELLED AT ... DUE TO TASK FAILURE` after `exit code: 0` | Slurm cleanup of worker nodes after main task succeeded. |
+| `destroy_process_group() was not called` | Benign PyTorch shutdown warning. |
+| `tokenizer class ... not equal to the registered tokenizer class` | Harmless tokenizer mismatch warning. |
diff --git a/.claude/skills/eagle3-triage/SKILL.md b/.claude/skills/eagle3-triage/SKILL.md
new file mode 100644
index 00000000000..7009b4d0523
--- /dev/null
+++ b/.claude/skills/eagle3-triage/SKILL.md
@@ -0,0 +1,177 @@
+---
+name: eagle3-triage
+description: >
+  Triage a failed EAGLE3 pipeline run. Identifies which step failed (data synthesis,
+  hidden state dump, training, or benchmark), diagnoses root cause from logs, and
+  suggests fixes. Use when user reports an EAGLE3 pipeline failure or asks why a
+  specific step failed. Also helps debug new model support issues.
+user_invocable: true
+---
+
+# EAGLE3 Pipeline Triage
+
+Diagnose failures in the 4-step EAGLE3 offline pipeline. This skill walks through
+each step, identifies the failure point, and provides actionable fixes.
+
+## Pipeline Overview
+
+| Step | Script | Purpose | Common failure area |
+|------|--------|---------|---------------------|
+| task_0 | `common/vllm/query.sh` | Data synthesis via vLLM server | Server startup, model loading, OOM |
+| task_1 | `common/eagle3/dump_offline_data_vllm.sh` (or `_hf.sh` / `.sh`) | Dump hidden states | Backend selection, OOM, unsupported arch |
+| task_2 | `common/eagle3/train_eagle.sh` | Train EAGLE3 draft head | Dependencies, training crash, export |
+| task_3 | `common/specdec_bench/quick_check.sh` | Benchmark acceptance rate | Engine startup, draft model loading |
+
+## Step 0 — Locate the experiment
+
+Ask the user for one of:
+- Experiment directory (e.g., the `--job-dir` passed to `launch.py` or `slurm.py`)
+- The model name / YAML they ran
+
+Find recent experiments under the job directory:
+
+```bash
+ls -td experiments/cicd/cicd_* | head -10
+# or wherever --job-dir was pointed
+```
+
+Each experiment directory contains one subdirectory per task (task_0 through task_3),
+each with a `sbatch_*.out` log file.
+
+## Step 1 — Fetch logs for the failed task
+
+Locate and read the Slurm output file for the failed task:
+
+```bash
+find experiments/ -name "sbatch_*.out" | sort
+```
+
+Read the last 200 lines — errors appear at the end:
+
+```bash
+tail -200 experiments/<exp_id>/<task_dir>/sbatch_<name>_<slurm_id>.out
+```
+
+Look for the first task with a non-zero exit code or error message.
+
+## Step 2 — Diagnose by step
+
+### task_0 failures (Data Synthesis)
+
+**How it works:** Launches a vLLM OpenAI-compatible server, polls `/health` until ready,
+then runs `query.py` to generate synthetic prompt/response pairs.
+Output goes to `/scratchspace/data/`.
+
+| Error pattern | Root cause | Fix |
+|---|---|---|
+| Server never becomes healthy (hangs at health check) | Model too large for allocated GPUs, or vLLM startup crash | Check BF16 weight size vs GPU memory. GB200: 192 GB/GPU × 4 GPUs/node = 768 GB. Increase TP. |
+| `CUDA out of memory` during model load | Insufficient GPU memory | Reduce `--max-model-len` or increase `--tensor-parallel-size` |
+| `trust_remote_code` error | Model requires custom code but flag not set | Add `--trust-remote-code` before the `--` separator in task_0 args |
+| Vocab / tokenizer error | Missing tokenizer cache (e.g., GPT-OSS-20B needs `TIKTOKEN_RS_CACHE_DIR`) | Set `TIKTOKEN_RS_CACHE_DIR` to a pre-populated cache path in the environment |
+| Architecture not supported | vLLM version doesn't support this model | Try a newer vLLM container (`vllm/vllm-openai:latest`) |
+| `CANCELLED ... DUE TO TIME LIMIT` | Job wall-clock limit too short | Increase Slurm `--time`. Note: `afterany` deps let task_1 still start. |
+| Empty `/scratchspace/data/` | query.py ran but produced no output | Check `--data` path exists and contains prompts. Check query.py logs. |
+
+### task_1 failures (Hidden State Dump)
+
+**How it works:** Loads the target model and runs a forward pass on each conversation,
+saving hidden states as `.pt` files in `/scratchspace/offline_hidden_states/`.
+
+Three backends are available:
+
+| Backend | Script | When to use |
+|---------|--------|-------------|
+| vLLM | `dump_offline_data_vllm.sh` | Broad model coverage; uses `speculators.VllmHiddenStatesGenerator` |
+| HF | `dump_offline_data_hf.sh` | VLMs, custom-code models, SWA attention; uses `device_map="auto"` |
+| TRT-LLM | `dump_offline_data.sh` | Pure-text models with TRT-LLM support; needs `--tp`/`--moe-ep` args |
+
+| Error pattern | Root cause | Fix |
+|---|---|---|
+| `No such file or directory: dump_offline_data_vllm.sh` | Wrong script path in YAML | Use the correct path under `common/eagle3/` |
+| `FileNotFoundError: /scratchspace/data` | task_0 failed or produced no output | Re-run task_0 first, or point `--input-data` to existing data |
+| `CUDA out of memory` | Model too large | Switch to `_hf.sh` (device_map="auto") or increase TP |
+| `RuntimeError` / unsupported arch | Model not supported by TRT-LLM backend | Switch to `dump_offline_data_hf.sh` or `dump_offline_data_vllm.sh` |
+| `NCCL timeout` / `NCCL error` | Multi-node communication failure | Retry. Reduce EP. |
+| No `.pt` files in output dir | Script ran but extraction produced nothing | Check `--max-seq-len` and input data format |
+| `pyxis: child terminated with signal 15` | SIGTERM — likely OOM | Increase TP or switch backends |
+
+### task_2 failures (Training)
+
+**How it works:** Installs requirements, runs `launch_train.sh` (Accelerate + FSDP) with the
+config from `modelopt_recipes/general/speculative_decoding/eagle3.yaml`, then exports via
+`export_hf_checkpoint.py`. Output: `/scratchspace/eagle3/` and `/scratchspace/export/`.
+
+| Error pattern | Root cause | Fix |
+|---|---|---|
+| `pip install` failure | Network issue or incompatible dependency | Check container has network access |
+| `ImportError: modelopt` | ModelOpt not installed or path issue | Check container version |
+| `FileNotFoundError: /scratchspace/offline_hidden_states` | task_1 failed or produced no output | Re-run task_1 first |
+| `CUDA out of memory` during training | Batch size too large | Reduce `training.train_bs` or `training.training_seq_len` |
+| `KeyError` / `AttributeError` in model loading | Model architecture not recognized by EAGLE3 | Check `eagle_decoder_type` in config. Model may need code changes in modelopt. |
+| `HFValidationError: Repo id must be in the form...` | Old `offline_training.sh` trying to upload to HF Hub | Use `train_eagle.sh` which does local export only |
+| Loss is NaN or diverges | LR too high or data quality issue | Reduce `training.lr`. Check hidden state data. |
+| `export_hf_checkpoint.py` fails | Training produced incomplete checkpoint | Check `/scratchspace/eagle3/` for `model.safetensors` |
+
+### task_3 failures (Benchmark)
+
+**How it works:** Launches vLLM with the target + draft model, runs acceptance rate and
+throughput benchmarks. Output: JSON files.
+
+| Error pattern | Root cause | Fix |
+|---|---|---|
+| `FileNotFoundError: /scratchspace/export` | task_2 failed or export step failed | Re-run task_2. Check export output. |
+| `trust_remote_code` error at benchmark | Model requires it but `quick_check.sh` doesn't forward the flag | Pass `--trust-remote-code` in task_3 args |
+| Server fails with draft model | Draft model config incompatible with engine | Check `eagle_config.json` and engine version |
+| AR below threshold / exit code 1 | Draft model quality too low | More epochs, data, or hyperparameter tuning |
+| `CUDA out of memory` | Target + draft exceeds GPU memory | Increase TP |
+| vLLM EAGLE3 not supported | vLLM version too old | Use `vllm/vllm-openai:latest` (≥ v0.15.0 for NVFP4) |
+
+## Step 3 — Check for new-model-specific issues
+
+If the user is adding support for a new model, also check:
+
+1. **Is the model a VLM?** → Use `dump_offline_data_hf.sh` (text-only path, no vision encoder invoked)
+2. **Does the model use sliding window attention (SWA)?** → TRT-LLM backend won't work; use HF or vLLM
+3. **Does the model need `trust_remote_code`?** → Add to task_0 args AND task_3 args
+4. **Is the model MoE?** → Check `eagle_config.json` `intermediate_size` matches model's `moe_intermediate_size`
+5. **Is the model architecture recognized by EAGLE3 training?** → Check `modelopt/torch/speculative/` for the model type
+6. **Custom tokenizer?** → May need additional environment vars (e.g., `TIKTOKEN_RS_CACHE_DIR`)
+
+## Step 4 — Suggest fix and next steps
+
+After diagnosis, provide:
+
+1. **Root cause** — one-line summary
+2. **Fix** — specific config change, code edit, or command to run
+3. **How to re-run** — skip earlier successful steps by pointing to existing scratchspace artifacts
+
+To skip task_0 and task_1 and re-run from task_2:
+```bash
+uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml \
+    pipeline.task_0.skip=true \
+    pipeline.task_1.skip=true \
+    --yes
+```
+
+To run only task_1 standalone (using existing task_0 data):
+```bash
+uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml \
+    pipeline.task_0.skip=true \
+    pipeline.task_2.skip=true \
+    pipeline.task_3.skip=true \
+    --yes
+```
+
+If the fix requires code changes in ModelOpt (e.g., adding a new `eagle_decoder_type`),
+note that a separate PR in the modelopt repo is needed.
+
+## Step 5 — Update triage chart
+
+If you encounter a failure pattern not in the triage chart at
+`tools/launcher/examples/EAGLE3_TRIAGE.md`, add it:
+
+1. Add a new branch in the Mermaid flowchart under the relevant step node
+2. Add a new issue entry in the "Known Issues" section
+3. Update the model's row in the test matrix
+
+This keeps the chart current for the next engineer debugging the same issue.
diff --git a/.claude/skills/eagle3-validate/SKILL.md b/.claude/skills/eagle3-validate/SKILL.md
new file mode 100644
index 00000000000..1b3665cf8b1
--- /dev/null
+++ b/.claude/skills/eagle3-validate/SKILL.md
@@ -0,0 +1,121 @@
+---
+name: eagle3-validate
+description: >
+  Validate that an EAGLE3 pipeline run completed successfully end-to-end.
+  Checks all 4 steps produced expected artifacts, verifies acceptance rate
+  meets threshold (>= 2.1), and produces a summary report.
+  Use when user wants to verify a pipeline run or check benchmark results.
+user_invocable: true
+---
+
+# EAGLE3 Pipeline Validation
+
+Verify that an EAGLE3 pipeline run completed successfully and meets quality criteria.
+
+## Step 0 — Identify the experiment
+
+Find the most recent experiment directory (or ask the user for the path):
+
+```bash
+ls -td experiments/cicd/cicd_* | head -5
+```
+
+Each experiment directory has one subdirectory per task (numbered 0–3), each containing
+a `sbatch_*.out` log file.
+
+## Step 1 — Check task outcomes
+
+Read the last 50 lines of each task's log file:
+
+```bash
+find experiments/<exp_id>/ -name "sbatch_*.out" | sort | while read f; do
+  echo "=== $f ==="; tail -50 "$f"; echo
+done
+```
+
+All 4 tasks must complete without error. Look for:
+- `exit code: 0` or no error — success
+- `DUE TO TIME LIMIT` — timeout
+- `FAILED` / `signal` / exception traceback — failure
+
+If any task failed, suggest running `/eagle3-triage` instead.
+
+## Step 2 — Verify artifacts exist
+
+Check each step produced the expected output (artifacts live on the cluster at `/scratchspace/`).
+Confirm via log messages:
+
+| Step | Expected log evidence | Artifact |
+|------|-----------------------|----------|
+| task_0 | "Saved N samples" or progress bar completing | `/scratchspace/data/*.jsonl` |
+| task_1 | "Successfully processed N conversations" | `/scratchspace/offline_hidden_states/*.pt` |
+| task_2 | Training loss decreasing, "export complete" | `/scratchspace/eagle3/model.safetensors`, `/scratchspace/export/` |
+| task_3 | `Average Acceptance Length ... ratio: X.XX` | JSON result files |
+
+## Step 3 — Check acceptance rate
+
+In the task_3 log, find:
+
+```
+Average Acceptance Length {'accept': X, 'count': Y, 'ratio': Z.ZZ}
+```
+
+The `ratio` field is the acceptance rate (AR).
+
+| Criterion | Threshold | Status |
+|-----------|-----------|--------|
+| AR (MT-Bench) | >= 2.1 | PASS / FAIL |
+
+If the log shows `AR ... < lower bound`, the run already triggered a threshold failure (exit code 1).
+
+## Step 4 — Check training quality
+
+In the task_2 log look for:
+- **Final training loss** — should be decreasing, not NaN
+- **AR validation during training** (if `training.ar_validate_steps` was set)
+- **Number of training steps** — confirms full training duration
+
+## Step 5 — Produce validation report
+
+```markdown
+## EAGLE3 Pipeline Validation Report
+
+**Experiment:** <exp_dir>
+**Model:** <model_name>
+**Date:** <date>
+**Pipeline config:** <yaml_path>
+
+### Step Status
+| Step | Task | Status | Notes |
+|------|------|--------|-------|
+| 0 | Data synthesis | PASS/FAIL/TIMEOUT | N samples generated |
+| 1 | Hidden state dump | PASS/FAIL | N .pt files |
+| 2 | Training + export | PASS/FAIL | Final loss: X.XX |
+| 3 | Benchmark | PASS/FAIL | AR: X.XX |
+
+### Acceptance Rate
+- MT-Bench AR: X.XX (threshold: >= 2.1) — PASS/FAIL
+
+### Training Summary
+- Final loss: X.XX
+- Training steps: N
+- AR during training: X.XX (if validated)
+
+### Overall: PASS / FAIL
+<one-line summary>
+```
+
+## Step 6 — Suggest next steps
+
+**If PASS:**
+- The model's row in `tools/launcher/examples/EAGLE3_TRIAGE.md` can be updated to ✅
+- Note the checkpoint path for downstream use
+
+**If FAIL:**
+- Identify which step or metric failed
+- Suggest running `/eagle3-triage` for diagnosis
+- If AR is close but below threshold, suggest:
+  - More training epochs (`training.num_epochs` override)
+  - More training data (re-run task_0 with larger dataset)
+  - Larger draft head (`num_hidden_layers` in `eagle_config.json`)
+  - Hyperparameter tuning (`training.lr`, `training.train_bs`)

From 01724172100ff3f9d1b9474e6c53e5727f26b0de Mon Sep 17 00:00:00 2001
From: Ye Yu <yeyu@nvidia.com>
Date: Mon, 11 May 2026 11:47:29 -0700
Subject: [PATCH 2/3] fix(okr30): fix markdownlint MD040 in eagle3 skill files

Add `text` language specifiers to bare fenced code blocks:
- eagle3-new-model/SKILL.md: GPU calculation formula block
- eagle3-validate/SKILL.md: acceptance rate log output block

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
---
 .claude/skills/eagle3-new-model/SKILL.md | 2 +-
 .claude/skills/eagle3-validate/SKILL.md  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.claude/skills/eagle3-new-model/SKILL.md b/.claude/skills/eagle3-new-model/SKILL.md
index df9cc90a9a9..c908ae7393e 100644
--- a/.claude/skills/eagle3-new-model/SKILL.md
+++ b/.claude/skills/eagle3-new-model/SKILL.md
@@ -30,7 +30,7 @@ Determine these values from the HuggingFace model card, `config.json`, and vLLM
 
 OCI-HSG nodes: **4 GPUs × 192 GB HBM3e = 768 GB per node**
 
-```
+```text
 BF16 weight size  = total_params × 2 bytes
 GPUs needed       = ceil(weight_size_GB / 192)
 nodes             = ceil(gpus_needed / 4)
diff --git a/.claude/skills/eagle3-validate/SKILL.md b/.claude/skills/eagle3-validate/SKILL.md
index 1b3665cf8b1..2a37318f3a4 100644
--- a/.claude/skills/eagle3-validate/SKILL.md
+++ b/.claude/skills/eagle3-validate/SKILL.md
@@ -56,7 +56,7 @@ Confirm via log messages:
 
 In the task_3 log, find:
 
-```
+```text
 Average Acceptance Length {'accept': X, 'count': Y, 'ratio': Z.ZZ}
 ```
 

From 7ccae94f137949d83293511f2186a67ed945b48c Mon Sep 17 00:00:00 2001
From: Ye Yu <yeyu@nvidia.com>
Date: Mon, 11 May 2026 11:55:02 -0700
Subject: [PATCH 3/3] fix(okr30): fix markdownlint MD031 in eagle3 skill files

Add blank lines before fenced code blocks as required by MD031:
- eagle3-triage/SKILL.md: two re-run command blocks
- eagle3-review-logs/SKILL.md: suggested fix block and section headers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
---
 .claude/skills/eagle3-review-logs/SKILL.md | 3 +++
 .claude/skills/eagle3-triage/SKILL.md      | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/.claude/skills/eagle3-review-logs/SKILL.md b/.claude/skills/eagle3-review-logs/SKILL.md
index e9e519c5a5d..18027a69096 100644
--- a/.claude/skills/eagle3-review-logs/SKILL.md
+++ b/.claude/skills/eagle3-review-logs/SKILL.md
@@ -54,6 +54,7 @@ For each task log, check:
 Output a structured markdown report:
 
 ### Summary
+
 - Overall status: PASSED / FAILED / MIXED / PARTIAL
 - Task breakdown: e.g., task_0 TIMEOUT, task_1 FAIL, task_2 skipped, task_3 skipped
 
@@ -68,6 +69,7 @@ For each task (0–3):
 - Suggested fix: actionable step
 
 ### Warnings
+
 Non-fatal issues worth noting (near-OOM, tokenizer warnings, slow throughput).
 
 ## Step 4 — Suggest next steps
@@ -75,6 +77,7 @@ Non-fatal issues worth noting (near-OOM, tokenizer warnings, slow throughput).
 Based on results:
 
 - If a task failed due to a known issue, suggest the fix and how to re-run from that task:
+
   ```bash
   uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml \
       pipeline.task_0.skip=true \
diff --git a/.claude/skills/eagle3-triage/SKILL.md b/.claude/skills/eagle3-triage/SKILL.md
index 7009b4d0523..ed2422e6f63 100644
--- a/.claude/skills/eagle3-triage/SKILL.md
+++ b/.claude/skills/eagle3-triage/SKILL.md
@@ -146,6 +146,7 @@ After diagnosis, provide:
 3. **How to re-run** — skip earlier successful steps by pointing to existing scratchspace artifacts
 
 To skip task_0 and task_1 and re-run from task_2:
+
 ```bash
 uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml \
     pipeline.task_0.skip=true \
@@ -154,6 +155,7 @@ uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml \
 ```
 
 To run only task_1 standalone (using existing task_0 data):
+
 ```bash
 uv run launch.py --yaml examples/<Org>/<Model>/hf_offline_eagle3.yaml \
     pipeline.task_0.skip=true \