diff --git a/.agents/TOOLING.md b/.agents/TOOLING.md
new file mode 100644
index 00000000000..ecef150f4ec
--- /dev/null
+++ b/.agents/TOOLING.md
@@ -0,0 +1,17 @@
+# Agent Tooling Notes
+
+These notes are for humans maintaining repository agent setup. They are not part
+of the always-loaded agent instructions.
+
+## Shared Instructions
+
+Update `AGENTS.md` for repository-wide agent instructions. `CLAUDE.md` is
+symlinked to `AGENTS.md`, so changes there apply to both Codex and Claude Code.
+
+## Local Overrides
+
+For private local instructions, use the tool-specific override file:
+
+- Claude Code: `CLAUDE.local.md` is additive; it is read after `CLAUDE.md`.
+- Codex: `AGENTS.override.md` replaces `AGENTS.md` in the same directory, so it
+  is not additive. Restate any shared instructions that should still apply.
diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
new file mode 100644
index 00000000000..10b3d901499
--- /dev/null
+++ b/.agents/developer-guidelines.md
@@ -0,0 +1,58 @@
+# Coding Principles
+
+Guidelines for production code in ModelOpt. Key values: simplicity, modularity,
+and conciseness.
+
+## Principles
+
+- **Prefer simple, surgical changes.** Touch only what the task requires. Avoid speculative
+  refactors, broad rewrites, and "while we're here" cleanups.
+- **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain.
+  Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers,
+  and treat heavy branching as a signal to reconsider the design.
+- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding.
+  Use existing extension points when they fit. If none fit, add a simple, focused helper,
+  class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases.
+- **Respect inheritance boundaries.** Parent abstractions should define shared contracts and
+  shared behavior, not child-specific special cases.
+- **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API,
+  or abstraction when doing so keeps the design simpler. Avoid duplication that can drift out of sync.
+- **Comment cautiously.** Comments should add context, not translate code into English.
+  Prefer making the code self-explanatory first. Use comments only for non-obvious
+  intent or constraints that remain unclear from the code. Apply this guidance to new
+  comments only; do not rewrite or delete existing comments just for style.
+- **Document public APIs.** Public and higher-level APIs should have docstrings, including examples when useful.
+  Internal helpers should usually be self-documenting through clear names and structure.
+- **Fix the bug cause, not the side effect.** For bug fixes, find the root cause instead of patching for its side effect.
+- **Validate external input once.** Check types and values at the interface boundary. Internal code can trust those
+  checks and avoid redundant assertions.
+- **Remove dead code.** Delete unused imports, unreachable branches, and obsolete helpers.
+- **Use relative paths** from the repo root in commands and file references.
+
+## Testing
+
+- **Develop with focused tests.** During development, write as many focused
+  tests as needed, including lower-level unit tests or internal probes, to
+  understand and harden behavior.
+- **Curate production tests and keep them lean.** Before staging or committing,
+  decide which tests should be checked in. Checked-in tests should document
+  expected behavior, protect against regressions, or flag backward-incompatible
+  behavior changes. Remove redundant lower-level tests when a higher-level test
+  already covers the same behavior, keeping CI/CD fast and lean.
+
+## Performant AI Code
+
+- **Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs.** Reading metadata such as `tensor.shape` is fine.
+  Avoid Python scalar extraction and operators such as `tensor.item()`, `float(tensor)`, or `min(tensor)` because they
+  can trigger CPU-GPU syncs. Use PyTorch tensor ops such as `tensor.min()` by default, and only extract Python scalars
+  when the CPU needs the value. Tensor-value-based Python branching can also break CUDA graphs.
+- **Develop with distributed processing in mind.** Examples: Use `print_rank_0` or `warn_rank_0`
+  when possible to avoid noisy logs. Guard shared side effects, such as
+  file writes or shared state updates, against race conditions between ranks.
+
+## Compatibility
+
+- **Preserve config and checkpoint backward compatibility.** ModelOpt checkpoints include serialized
+  `ModeloptBaseConfig` instances such as `QuantizeConfig`. If these Pydantic-based configs change
+  without backward compatibility handling, older checkpoints may no longer load. Make breaking changes
+  explicit and intentional.
diff --git a/.github/workflows/claude_review.yml b/.github/workflows/claude_review.yml
index 69119089fee..a41c7571acc 100644
--- a/.github/workflows/claude_review.yml
+++ b/.github/workflows/claude_review.yml
@@ -81,7 +81,8 @@ jobs:
 
             Mandatory workflow — never skip or reorder:
             1. Read the PR diff first (gh pr diff).
-            2. Read CLAUDE.md and CONTRIBUTING.md for project conventions and architecture.
+            2. Read AGENTS.md, .agents/developer-guidelines.md,
+               and CONTRIBUTING.md for project conventions, coding principles, and architecture.
             3. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
                `__init__.py` plus any `mode.py` / `config.py` to understand mode registration
                and config schema.
diff --git a/.gitignore b/.gitignore
index 09a61233b6a..66ce5568ee0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -61,6 +61,8 @@ venv/
 
 # Ignore claude local settings
 .claude/settings.local.json
+CLAUDE.local.md
+AGENTS.override.md
 
 # Ignore SonarQube analysis
 .sonar/
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 00000000000..3000fce922b
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,39 @@
+# Agent Instructions for ModelOpt
+
+These instructions apply to AI-assisted work in this repository.
+
+## Repository orientation
+
+- Start with `README.md` for project overview and install.
+- Use `modelopt/` for source, `tests/` for focused test coverage, and
+  `examples/` or `docs/` for usage patterns.
+
+## Coding guidelines
+
+- **Coding guide:** Code development and review require reading and following
+  [.agents/developer-guidelines.md](.agents/developer-guidelines.md);
+  do not skip this step.
+
+## Iterative development
+
+- **Running tests:** Follow the
+  [writing and running tests](CONTRIBUTING.md#-writing-and-running-tests)
+  instructions. For fast initial iteration, choose focused tests for the
+  changed area from `tests/`.
+- **Running pre-commit:** Follow the
+  [pre-commit hook instructions](CONTRIBUTING.md#pre-commit-hooks). Hooks may
+  modify files; review and re-stage those changes before committing.
+- **Signed commit:** Use `git commit -s -S -m "<message>"` for commits so they
+  follow the [signing your work](CONTRIBUTING.md#-signing-your-work)
+  requirements.
+- **Never `git push` without explicit approval in the current turn.** Commit
+  locally is fine; publishing to a remote is not.
+- After `git commit`, stop and wait for the user to say "push", "publish",
+  "ship", or equivalent before running `git push`, `gh pr create`, or any
+  push-option flags like `-o merge_request.create`.
+
+## Contributing and PR readiness
+
+- Before opening or marking a PR ready for review, read the
+  [submitting your code](CONTRIBUTING.md#submitting-your-code) guidance.
+- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist.
diff --git a/CLAUDE.md b/CLAUDE.md
deleted file mode 100644
index 346ac17eb5f..00000000000
--- a/CLAUDE.md
+++ /dev/null
@@ -1,133 +0,0 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-NVIDIA Model Optimizer (ModelOpt): open-source library for model optimization techniques including
-quantization, pruning, distillation, sparsity, and speculative decoding to accelerate inference.
-Primarily Python codebase with optional C++/CUDA extensions supporting PyTorch, ONNX, and Hugging Face/Megatron models.
-
-> If a `CLAUDE.local.md` file exists alongside this file, read and respect it — it contains
-> developer-specific overrides that supplement this shared guidance.
-
-## Rules (Read First)
-
-**CRITICAL (YOU MUST):**
-
-- NVIDIA Apache 2.0 license header on ALL new Python/C++/CUDA files — use the SPDX format from `LICENSE_HEADER` (auto-inserted by pre-commit for most files, but must be added manually for files copied from third-party sources, which are excluded from the hook)
-- `git commit -s -S` (DCO sign-off + cryptographic signing required). Never attribute AI tools in
-  sign-off line
-- `pre-commit` hooks run on commit — if files are modified by hooks, re-stage and commit again
-- PRs require CODEOWNERS review (auto-assigned based on `.github/CODEOWNERS`)
-- When creating PRs (`gh pr create`), fill in `.github/PULL_REQUEST_TEMPLATE.md` verbatim — do NOT substitute the harness's default `## Summary` / `## Test plan` format
-- For non-trivial PRs, run `/claude review` to get Claude approval before merging (NVIDIA org members can self-trigger; orthogonal to CodeRabbit)
-- After rebasing, always re-run tests locally before pushing
-- All code must follow the security guidelines in `SECURITY.md` — violations are blocked as pre-merge errors
-- For contribution guidelines, commit conventions, and PR requirements, see `CONTRIBUTING.md`
-- New PIP dependencies require license verification — non-permissive licenses need justification and approval from `@NVIDIA/modelopt-setup-codeowners`
-
-## Common Commands
-
-| Task | Command |
-|------|---------|
-| Install (editable + dev) | `pip install -e ".[dev]"` |
-| Enable pre-commit hooks | `pre-commit install` |
-| CPU unit tests | `python -m pytest tests/unit` |
-| GPU unit tests | `python -m pytest tests/gpu` |
-| Megatron GPU tests | `python -m pytest tests/gpu_megatron` |
-| TRT-LLM GPU tests | `python -m pytest tests/gpu_trtllm` |
-| Single test file | `python -m pytest tests/unit/torch/quantization/test_quant_config.py` |
-| Pattern match | `pytest tests/unit -k "test_quantize"` |
-| Lint + format (all files) | `pre-commit run --all-files` |
-| Lint (diff only) | `pre-commit run --from-ref origin/main --to-ref HEAD` |
-| Run via nox (CPU unit) | `nox -s "unit-3.12(torch_211, tf_latest)"` |
-| Build docs | `nox -s docs` |
-| Build wheel | `nox -s build_wheel` |
-
-## Architecture
-
-ModelOpt code base is organized into four top-level namespaces:
-
-| Namespace | Path | Role |
-|-----------|------|------|
-| `modelopt.torch` | `modelopt/torch/` | Core PyTorch optimization library |
-| `modelopt.onnx` | `modelopt/onnx/` | ONNX model quantization and export |
-| `modelopt.deploy` | `modelopt/deploy/` | Deployment utilities for LLMs |
-| `modelopt.recipe` | `modelopt/recipe/` | Recipe loading, parsing, and validation infrastructure |
-
-### `modelopt.torch` Sub-packages
-
-| Sub-package | Path | Role |
-|-------------|------|------|
-| `opt` | `modelopt/torch/opt/` | Core optimization infrastructure (modes, config, state dicts) |
-| `quantization` | `modelopt/torch/quantization/` | PTQ, QAT, and quantization-aware algorithms |
-| `prune` | `modelopt/torch/prune/` | Structured and unstructured pruning |
-| `distill` | `modelopt/torch/distill/` | Knowledge distillation |
-| `sparsity` | `modelopt/torch/sparsity/` | Weight and activation sparsity |
-| `speculative` | `modelopt/torch/speculative/` | Speculative decoding (Medusa, EAGLE, etc.) |
-| `nas` | `modelopt/torch/nas/` | Neural architecture search |
-| `export` | `modelopt/torch/export/` | Checkpoint export for TRT-LLM / Megatron |
-| `peft` | `modelopt/torch/peft/` | QLoRA and PEFT integration |
-| `kernels` | `modelopt/torch/kernels/` | Custom CUDA/Triton kernels grouped by role: `common/attention` (baseline Triton FA), `quantization/{conv,gemm}` (implicit-GEMM CUDA + tensor-quant C++/CUDA + fp4/fp8 Triton), `sparsity/attention` (skip-softmax / N:M / diffusers+LTX backends) |
-| `_deploy` | `modelopt/torch/_deploy/` | Internal deployment utilities |
-| `utils` | `modelopt/torch/utils/` | Shared utilities and plugin infrastructure |
-
-### Core Abstraction: Modes
-
-A **mode** is the unit of model optimization in ModelOpt. Each algorithm (quantization, pruning,
-etc.) is implemented as one or more modes. Modes are recorded in the model's `modelopt_state` so
-optimization workflows can be composed, saved, and restored.
-
-The main entry points are in `modelopt/torch/opt/conversion.py`:
-- `apply_mode(model, mode, ...)` — applies an optimization mode to a model
-- `restore(model, ...)` — restores a model to a previously saved optimization state
-- `save(model, ...)` / `modelopt_state(model)` — captures the current optimization state
-
-### Core Abstraction: Recipes
-
-A **recipe** is a declarative YAML specification of an optimization configuration. Recipes decouple optimization specs from code, enabling reuse, sharing, and version control.
-
-**Built-in recipes** (`modelopt_recipes/`):
-
-- `general/ptq/` — general-purpose PTQ recipes
-- `configs/` — shared configuration units referenced by recipes
-
-## Key Files
-
-| File | Role |
-|------|------|
-| `modelopt/torch/opt/mode.py` | Base class for all optimization modes |
-| `modelopt/torch/opt/config.py` | Configuration system for modes |
-| `modelopt/torch/opt/conversion.py` | `apply_mode()` / `restore()` entry points |
-| `modelopt/torch/quantization/__init__.py` | PTQ/QAT public API |
-| `modelopt/torch/export/unified_export_hf.py` | Unified HF checkpoint export |
-| `modelopt/torch/export/model_config_export.py` | TRT-LLM model config export |
-| `modelopt/deploy/llm/` | LLM deployment utilities |
-| `modelopt/recipe/loader.py` | `load_recipe()` / `load_config()` public API |
-| `modelopt/recipe/config.py` | Recipe Pydantic models (`ModelOptPTQRecipe`, `RecipeType`) |
-| `modelopt_recipes/general/ptq/` | Built-in PTQ recipe YAML files |
-| `pyproject.toml` | Optional dependency groups (`[onnx]`, `[hf]`, `[all]`, `[dev]`); ruff, mypy, pytest, bandit, and coverage config |
-| `.pre-commit-config.yaml` | Pre-commit hooks (ruff, mypy, clang-format, license headers) |
-| `noxfile.py` | Test session definitions |
-
-## Design Patterns
-
-| Pattern | Key Points |
-|---------|------------|
-| **Mode composition** | Optimization algorithms are composed as sequences of modes, each recorded in `modelopt_state` |
-| **Plugin system** | Optional integrations (HuggingFace, Megatron, etc.) loaded lazily via `import_plugin()` |
-| **Optional dependencies** | Features gated by install extras (`[onnx]`, `[hf]`, `[all]`); avoid hard imports at module level |
-| **Config dataclasses** | Each mode has a typed config; use Pydantic or dataclass conventions |
-| **State dict** | Models carry `modelopt_state` for checkpoint save/restore across optimization steps |
-| **Declarative recipes** | YAML-based optimization specs in `modelopt_recipes/`; loaded via `load_recipe()`, passed to the model optimization system |
-
-## CI / Testing
-
-| Layer | Location | Notes |
-|-------|----------|-------|
-| CPU unit tests | `tests/unit/` | Fast, no GPU needed; run in pre-merge CI |
-| GPU unit tests | `tests/gpu/` | Requires CUDA GPU |
-| Megatron GPU tests | `tests/gpu_megatron/` | Requires Megatron-Core + GPU |
-| TRT-LLM GPU tests | `tests/gpu_trtllm/` | Requires TensorRT-LLM + GPU |
-| Example/integration tests | `tests/examples/` | Integration tests for examples; see `tests/examples/README.md` |
-| Pre-commit / lint | `.pre-commit-config.yaml` | ruff, mypy, clang-format, license headers, bandit |
-| Coverage | `pyproject.toml` | 70% minimum on `modelopt/*` |
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 120000
index 00000000000..47dc3e3d863
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
\ No newline at end of file
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 53e879a7bf3..f7debbbc6ee 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -79,7 +79,7 @@ If you are an external contributor, seek guidance from `@NVIDIA/modelopt-setup-c
 
 See [`modelopt/torch/quantization/utils/calib_utils.py`](./modelopt/torch/quantization/utils/calib_utils.py) for an example of the correct license header format.
 
-## 📝 Writing tests
+## 📝 Writing and running tests
 
 We use [pytest](https://docs.pytest.org/) for all tests. For any new features / examples, make sure to add tests and that the coverage check in your PR passes. The tests are organized into the following directories:
 
@@ -89,7 +89,17 @@ We use [pytest](https://docs.pytest.org/) for all tests. For any new features /
 - `tests/gpu_trtllm`: Fast GPU-based unit tests for the core ModelOpt library for TensorRT-LLM features. In most cases, they should not take more than a few seconds to run.
 - `tests/examples`: Integration tests for ModelOpt examples. They should not take more than a few minutes to run. Please refer to [example test README](./tests/examples/README.md) for more details.
 
-Please refer to [noxfile.py](./noxfile.py) for more details on how to run the tests and their dependencies.
+For lightweight focused local validation, run `pytest` directly on the relevant test path. For example:
+
+```bash
+pytest tests/unit/torch/quantization
+```
+
+For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s <session>`. The `unit-3.12(torch_211, tf_latest)` session runs `tests/unit` with a specific Torch and Transformers combination:
+
+```bash
+nox -s "unit-3.12(torch_211, tf_latest)"
+```
 
 ## ✍️ Signing your work
 
diff --git a/README.md b/README.md
index ae17522c613..6a4f023e4f0 100644
--- a/README.md
+++ b/README.md
@@ -151,6 +151,10 @@ Model Optimizer follows a structured approach to managing deprecated features:
 Model Optimizer is now open source! We welcome any feedback, feature requests and PRs.
 Please read our [Contributing](./CONTRIBUTING.md) guidelines for details on how to contribute to this project.
 
+## AI Agents
+
+For AI-assisted development setup, see the [agent tooling notes](./.agents/TOOLING.md).
+
 ### Top Contributors
 
 [![Contributors](https://contrib.rocks/image?repo=NVIDIA/Model-Optimizer)](https://github.com/NVIDIA/Model-Optimizer/graphs/contributors)