From 85fd1fb63322ea10a695b10883a1d28c3c8c264f Mon Sep 17 00:00:00 2001
From: realAsma <akuriparambi@nvidia.com>
Date: Thu, 14 May 2026 01:02:38 +0000
Subject: [PATCH 1/8] docs: consolidate agent instructions

Signed-off-by: realAsma <akuriparambi@nvidia.com>
---
 .agents/README.md               |  38 +++++++++
 .agents/TOOLING.md              |  18 +++++
 .agents/developer-guidelines.md |  77 ++++++++++++++++++
 .gitignore                      |   2 +
 AGENTS.md                       |   1 +
 CLAUDE.md                       | 134 +-------------------------------
 CONTRIBUTING.md                 |  14 +++-
 README.md                       |   5 ++
 8 files changed, 154 insertions(+), 135 deletions(-)
 create mode 100644 .agents/README.md
 create mode 100644 .agents/TOOLING.md
 create mode 100644 .agents/developer-guidelines.md
 create mode 120000 AGENTS.md
 mode change 100644 => 120000 CLAUDE.md
diff --git a/.agents/README.md b/.agents/README.md
new file mode 100644
index 00000000000..c2b6727c582
--- /dev/null
+++ b/.agents/README.md
@@ -0,0 +1,38 @@
+# Agent Instructions for ModelOpt
+
+These instructions apply to AI-assisted work in this repository.
+
+## Repository orientation
+
+- Start with `README.md` for project overview and install.
+- Use `modelopt/` for source, `tests/` for focused test coverage, and
+  `examples/` or `docs/` for usage patterns.
+
+## Coding guidelines
+
+- **Coding guide:** Code development and review require reading and following
+  `.agents/developer-guidelines.md`; do not skip this step.
+
+## Iterative development
+
+- **Running tests:** Follow the
+  [writing and running tests](../CONTRIBUTING.md#-writing-and-running-tests)
+  instructions. For fast initial iteration, choose focused tests for the
+  changed area from `tests/`.
+- **Running pre-commit:** Follow the
+  [pre-commit hook instructions](../CONTRIBUTING.md#pre-commit-hooks). Hooks may
+  modify files; review and re-stage those changes before committing.
+- **Signed commit:** Use `git commit -s -S -m "<message>"` for commits so they
+  follow the [signing your work](../CONTRIBUTING.md#-signing-your-work)
+  requirements.
+- **Never `git push` without explicit approval in the current turn.** Commit
+  locally is fine; publishing to a remote is not.
+- After `git commit`, stop and wait for the user to say "push", "publish",
+  "ship", or equivalent before running `git push`, `gh pr create`, or any
+  push-option flags like `-o merge_request.create`.
+
+## Contributing and PR readiness
+
+- Before opening or marking a PR ready for review, read the
+  [submitting your code](../CONTRIBUTING.md#submitting-your-code) guidance.
+- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist.
diff --git a/.agents/TOOLING.md b/.agents/TOOLING.md
new file mode 100644
index 00000000000..20346f2e955
--- /dev/null
+++ b/.agents/TOOLING.md
@@ -0,0 +1,18 @@
+# Agent Tooling Notes
+
+These notes are for humans maintaining repository agent setup. They are not part
+of the always-loaded agent instructions.
+
+## Shared Instructions
+
+Update `.agents/README.md` for repository-wide agent instructions. The root
+`AGENTS.md` and `CLAUDE.md` files are symlinked to `.agents/README.md`, so
+changes there apply to both Codex and Claude Code.
+
+## Local Overrides
+
+For private local instructions, use the tool-specific override file:
+
+- Claude Code: `CLAUDE.local.md` is additive; it is read after `CLAUDE.md`.
+- Codex: `AGENTS.override.md` replaces `AGENTS.md` in the same directory, so it
+  is not additive. Restate any shared instructions that should still apply.
diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
new file mode 100644
index 00000000000..0c5dd057214
--- /dev/null
+++ b/.agents/developer-guidelines.md
@@ -0,0 +1,77 @@
+# Coding Principles
+
+Guidelines for production code in ModelOpt. Key values: simplicity, minimalism,
+and elegance.
+
+## Principles
+
+- **Be surgical.** Touch the code required to solve the actual problem, whether
+  that is one line or a broader design change. Avoid speculative refactors,
+  drive-by cleanup, unrelated rewrites, and half-finished implementations.
+- **Fix root causes.** Prefer the right fix over the most local patch. Do not
+  paper over symptoms with temporary fixes unless the temporary nature and
+  follow-up are explicit.
+- **Design for simplicity.** Choose the design that keeps code easiest to read
+  and change. Put behavior at the right level, tie extensibility to known needs,
+  and treat heavy branching or conditional logic as bad design smells.
+- **Respect ownership.** Keep behavior in the layer that owns it. Parent
+  abstractions should contain shared contracts and shared behavior, not
+  child-specific special cases.
+- **Keep one source of truth.** Put shared behavior, configuration, constants,
+  validation, and documentation in the single place that owns them. Reuse
+  existing helpers and shared APIs instead of copying logic or duplicating
+  state.
+- **Abstract to simplify.** Use helpers, base classes, registries, adapters,
+  plugins, or extension points when they remove real duplication, clarify
+  ownership, support current variation, or make call sites simpler. Do not add
+  abstractions for speculative future cases.
+- **Make code readable at the point of use.** Names, types, and structure should
+  make intent clear. Keep high-level orchestration clear, move low-level
+  mechanics into well-named helpers when helpful, and put critical code before
+  helper details when local conventions allow it.
+- **Validate at boundaries.** Check user input, files, network responses, and
+  external API results at the edge. Keep internal code simple by trusting types
+  and invariants instead of repeatedly checking for impossible states.
+- **Comment cautiously.** Code is the source of truth for what happens and how.
+  Add comments only when the reason is not obvious. Apply this guidance to new
+  comments only; do not rewrite or delete existing comments as cleanup.
+- **Scale documentation to the API.** Higher-level and user-visible APIs deserve
+  useful docstrings, including examples when helpful. Lower-level internals need
+  docstrings only when names, types, and structure are not enough.
+- **Remove touched dead code.** Delete unused imports, unreachable branches,
+  obsolete placeholders, stale TODOs, and debug code when they are part of the
+  behavior you are already touching.
+- **Use workspace-relative paths.** Use relative paths in commands and file
+  references unless an absolute path is needed to disambiguate.
+
+## Testing
+
+- **Develop with focused tests.** During development, write as many focused
+  tests as needed, including lower-level unit tests or internal probes, to
+  understand and harden behavior.
+- **Curate production tests and keep them lean.** Before staging or committing,
+  decide which tests should be checked in. Checked-in tests should document
+  expected behavior, protect against regressions, or flag backward-incompatible
+  behavior changes. Remove redundant lower-level tests when a higher-level test
+  already covers the same behavior, keeping CI/CD fast and lean.
+
+## Performant AI Code
+
+- **Avoid stray CPU-GPU syncs.** Tensor metadata such as `tensor.shape` is safe
+  to read, but scalar extraction or CPU transfers such as `tensor.item()`,
+  `float(tensor)`, `bool(tensor)`, `tensor.cpu()`, `tensor.numpy()`, etc. can
+  force CPU-GPU synchronization. Keep computation on GPU unless the CPU actually
+  needs the value.
+- **Use rank-aware logging.** Default to `print_rank_0` instead of `print` and
+  `warn_rank_0` instead of generic warnings. Use per-rank output only when each
+  process needs to report distinct state. Generic prints and warnings clog
+  distributed logs.
+- **Respect distributed invariants.** Avoid hidden synchronization, global state,
+  per-rank file races, or assumptions that only hold on a single process.
+
+## Compatibility
+
+- **Preserve config and checkpoint compatibility.** Treat ModelOpt config schemas
+  and checkpoint formats as persisted contracts. When changing configs such as
+  `QuantizeConfig`, maintain backward compatibility with previous ModelOpt
+  checkpoints unless a breaking change is explicit and intentionally handled.
diff --git a/.gitignore b/.gitignore
index 09a61233b6a..66ce5568ee0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -61,6 +61,8 @@ venv/
 
 # Ignore claude local settings
 .claude/settings.local.json
+CLAUDE.local.md
+AGENTS.override.md
 
 # Ignore SonarQube analysis
 .sonar/
diff --git a/AGENTS.md b/AGENTS.md
new file mode 120000
index 00000000000..fa67a2d3e2a
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1 @@
+.agents/README.md
\ No newline at end of file
diff --git a/CLAUDE.md b/CLAUDE.md
deleted file mode 100644
index 346ac17eb5f..00000000000
--- a/CLAUDE.md
+++ /dev/null
@@ -1,133 +0,0 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-NVIDIA Model Optimizer (ModelOpt): open-source library for model optimization techniques including
-quantization, pruning, distillation, sparsity, and speculative decoding to accelerate inference.
-Primarily Python codebase with optional C++/CUDA extensions supporting PyTorch, ONNX, and Hugging Face/Megatron models.
-
-> If a `CLAUDE.local.md` file exists alongside this file, read and respect it — it contains
-> developer-specific overrides that supplement this shared guidance.
-
-## Rules (Read First)
-
-**CRITICAL (YOU MUST):**
-
-- NVIDIA Apache 2.0 license header on ALL new Python/C++/CUDA files — use the SPDX format from `LICENSE_HEADER` (auto-inserted by pre-commit for most files, but must be added manually for files copied from third-party sources, which are excluded from the hook)
-- `git commit -s -S` (DCO sign-off + cryptographic signing required). Never attribute AI tools in
-  sign-off line
-- `pre-commit` hooks run on commit — if files are modified by hooks, re-stage and commit again
-- PRs require CODEOWNERS review (auto-assigned based on `.github/CODEOWNERS`)
-- When creating PRs (`gh pr create`), fill in `.github/PULL_REQUEST_TEMPLATE.md` verbatim — do NOT substitute the harness's default `## Summary` / `## Test plan` format
-- For non-trivial PRs, run `/claude review` to get Claude approval before merging (NVIDIA org members can self-trigger; orthogonal to CodeRabbit)
-- After rebasing, always re-run tests locally before pushing
-- All code must follow the security guidelines in `SECURITY.md` — violations are blocked as pre-merge errors
-- For contribution guidelines, commit conventions, and PR requirements, see `CONTRIBUTING.md`
-- New PIP dependencies require license verification — non-permissive licenses need justification and approval from `@NVIDIA/modelopt-setup-codeowners`
-
-## Common Commands
-
-| Task | Command |
-|------|---------|
-| Install (editable + dev) | `pip install -e ".[dev]"` |
-| Enable pre-commit hooks | `pre-commit install` |
-| CPU unit tests | `python -m pytest tests/unit` |
-| GPU unit tests | `python -m pytest tests/gpu` |
-| Megatron GPU tests | `python -m pytest tests/gpu_megatron` |
-| TRT-LLM GPU tests | `python -m pytest tests/gpu_trtllm` |
-| Single test file | `python -m pytest tests/unit/torch/quantization/test_quant_config.py` |
-| Pattern match | `pytest tests/unit -k "test_quantize"` |
-| Lint + format (all files) | `pre-commit run --all-files` |
-| Lint (diff only) | `pre-commit run --from-ref origin/main --to-ref HEAD` |
-| Run via nox (CPU unit) | `nox -s "unit-3.12(torch_211, tf_latest)"` |
-| Build docs | `nox -s docs` |
-| Build wheel | `nox -s build_wheel` |
-
-## Architecture
-
-ModelOpt code base is organized into four top-level namespaces:
-
-| Namespace | Path | Role |
-|-----------|------|------|
-| `modelopt.torch` | `modelopt/torch/` | Core PyTorch optimization library |
-| `modelopt.onnx` | `modelopt/onnx/` | ONNX model quantization and export |
-| `modelopt.deploy` | `modelopt/deploy/` | Deployment utilities for LLMs |
-| `modelopt.recipe` | `modelopt/recipe/` | Recipe loading, parsing, and validation infrastructure |
-
-### `modelopt.torch` Sub-packages
-
-| Sub-package | Path | Role |
-|-------------|------|------|
-| `opt` | `modelopt/torch/opt/` | Core optimization infrastructure (modes, config, state dicts) |
-| `quantization` | `modelopt/torch/quantization/` | PTQ, QAT, and quantization-aware algorithms |
-| `prune` | `modelopt/torch/prune/` | Structured and unstructured pruning |
-| `distill` | `modelopt/torch/distill/` | Knowledge distillation |
-| `sparsity` | `modelopt/torch/sparsity/` | Weight and activation sparsity |
-| `speculative` | `modelopt/torch/speculative/` | Speculative decoding (Medusa, EAGLE, etc.) |
-| `nas` | `modelopt/torch/nas/` | Neural architecture search |
-| `export` | `modelopt/torch/export/` | Checkpoint export for TRT-LLM / Megatron |
-| `peft` | `modelopt/torch/peft/` | QLoRA and PEFT integration |
-| `kernels` | `modelopt/torch/kernels/` | Custom CUDA/Triton kernels grouped by role: `common/attention` (baseline Triton FA), `quantization/{conv,gemm}` (implicit-GEMM CUDA + tensor-quant C++/CUDA + fp4/fp8 Triton), `sparsity/attention` (skip-softmax / N:M / diffusers+LTX backends) |
-| `_deploy` | `modelopt/torch/_deploy/` | Internal deployment utilities |
-| `utils` | `modelopt/torch/utils/` | Shared utilities and plugin infrastructure |
-
-### Core Abstraction: Modes
-
-A **mode** is the unit of model optimization in ModelOpt. Each algorithm (quantization, pruning,
-etc.) is implemented as one or more modes. Modes are recorded in the model's `modelopt_state` so
-optimization workflows can be composed, saved, and restored.
-
-The main entry points are in `modelopt/torch/opt/conversion.py`:
-- `apply_mode(model, mode, ...)` — applies an optimization mode to a model
-- `restore(model, ...)` — restores a model to a previously saved optimization state
-- `save(model, ...)` / `modelopt_state(model)` — captures the current optimization state
-
-### Core Abstraction: Recipes
-
-A **recipe** is a declarative YAML specification of an optimization configuration. Recipes decouple optimization specs from code, enabling reuse, sharing, and version control.
-
-**Built-in recipes** (`modelopt_recipes/`):
-
-- `general/ptq/` — general-purpose PTQ recipes
-- `configs/` — shared configuration units referenced by recipes
-
-## Key Files
-
-| File | Role |
-|------|------|
-| `modelopt/torch/opt/mode.py` | Base class for all optimization modes |
-| `modelopt/torch/opt/config.py` | Configuration system for modes |
-| `modelopt/torch/opt/conversion.py` | `apply_mode()` / `restore()` entry points |
-| `modelopt/torch/quantization/__init__.py` | PTQ/QAT public API |
-| `modelopt/torch/export/unified_export_hf.py` | Unified HF checkpoint export |
-| `modelopt/torch/export/model_config_export.py` | TRT-LLM model config export |
-| `modelopt/deploy/llm/` | LLM deployment utilities |
-| `modelopt/recipe/loader.py` | `load_recipe()` / `load_config()` public API |
-| `modelopt/recipe/config.py` | Recipe Pydantic models (`ModelOptPTQRecipe`, `RecipeType`) |
-| `modelopt_recipes/general/ptq/` | Built-in PTQ recipe YAML files |
-| `pyproject.toml` | Optional dependency groups (`[onnx]`, `[hf]`, `[all]`, `[dev]`); ruff, mypy, pytest, bandit, and coverage config |
-| `.pre-commit-config.yaml` | Pre-commit hooks (ruff, mypy, clang-format, license headers) |
-| `noxfile.py` | Test session definitions |
-
-## Design Patterns
-
-| Pattern | Key Points |
-|---------|------------|
-| **Mode composition** | Optimization algorithms are composed as sequences of modes, each recorded in `modelopt_state` |
-| **Plugin system** | Optional integrations (HuggingFace, Megatron, etc.) loaded lazily via `import_plugin()` |
-| **Optional dependencies** | Features gated by install extras (`[onnx]`, `[hf]`, `[all]`); avoid hard imports at module level |
-| **Config dataclasses** | Each mode has a typed config; use Pydantic or dataclass conventions |
-| **State dict** | Models carry `modelopt_state` for checkpoint save/restore across optimization steps |
-| **Declarative recipes** | YAML-based optimization specs in `modelopt_recipes/`; loaded via `load_recipe()`, passed to the model optimization system |
-
-## CI / Testing
-
-| Layer | Location | Notes |
-|-------|----------|-------|
-| CPU unit tests | `tests/unit/` | Fast, no GPU needed; run in pre-merge CI |
-| GPU unit tests | `tests/gpu/` | Requires CUDA GPU |
-| Megatron GPU tests | `tests/gpu_megatron/` | Requires Megatron-Core + GPU |
-| TRT-LLM GPU tests | `tests/gpu_trtllm/` | Requires TensorRT-LLM + GPU |
-| Example/integration tests | `tests/examples/` | Integration tests for examples; see `tests/examples/README.md` |
-| Pre-commit / lint | `.pre-commit-config.yaml` | ruff, mypy, clang-format, license headers, bandit |
-| Coverage | `pyproject.toml` | 70% minimum on `modelopt/*` |
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 120000
index 00000000000..fa67a2d3e2a
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+.agents/README.md
\ No newline at end of file
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 53e879a7bf3..a1f2d114351 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -79,7 +79,7 @@ If you are an external contributor, seek guidance from `@NVIDIA/modelopt-setup-c
 
 See [`modelopt/torch/quantization/utils/calib_utils.py`](./modelopt/torch/quantization/utils/calib_utils.py) for an example of the correct license header format.
 
-## 📝 Writing tests
+## 📝 Writing and running tests
 
 We use [pytest](https://docs.pytest.org/) for all tests. For any new features / examples, make sure to add tests and that the coverage check in your PR passes. The tests are organized into the following directories:
 
@@ -89,7 +89,17 @@ We use [pytest](https://docs.pytest.org/) for all tests. For any new features /
 - `tests/gpu_trtllm`: Fast GPU-based unit tests for the core ModelOpt library for TensorRT-LLM features. In most cases, they should not take more than a few seconds to run.
 - `tests/examples`: Integration tests for ModelOpt examples. They should not take more than a few minutes to run. Please refer to [example test README](./tests/examples/README.md) for more details.
 
-Please refer to [noxfile.py](./noxfile.py) for more details on how to run the tests and their dependencies.
+For lightweight focused local validation, run `pytest` directly on the relevant test path. For example:
+
+```bash
+pytest tests/unit/torch/quantization
+```
+
+For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s <session>`. The `partial_unit-3.12(torch)` session covers the broader torch unit test suite and installs heavier dependencies, including `megatron-core`:
+
+```bash
+nox -s "partial_unit-3.12(torch)"
+```
 
 ## ✍️ Signing your work
 
diff --git a/README.md b/README.md
index ae17522c613..cfd6c0adb3e 100644
--- a/README.md
+++ b/README.md
@@ -151,6 +151,11 @@ Model Optimizer follows a structured approach to managing deprecated features:
 Model Optimizer is now open source! We welcome any feedback, feature requests and PRs.
 Please read our [Contributing](./CONTRIBUTING.md) guidelines for details on how to contribute to this project.
 
+## AI Agents
+
+For AI-assisted development setup, including local Claude Code and Codex
+override files, see the [agent tooling notes](./.agents/TOOLING.md).
+
 ### Top Contributors
 
 [![Contributors](https://contrib.rocks/image?repo=NVIDIA/Model-Optimizer)](https://github.com/NVIDIA/Model-Optimizer/graphs/contributors)

From ed53a34ac1b5ec24bd6dd505af1847cc57e54fa2 Mon Sep 17 00:00:00 2001
From: realAsma <akuriparambi@nvidia.com>
Date: Thu, 14 May 2026 01:31:11 +0000
Subject: [PATCH 2/8] Refine developer comment guidance

Signed-off-by: realAsma <akuriparambi@nvidia.com>
---
 .agents/developer-guidelines.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
index 0c5dd057214..2feb1484830 100644
--- a/.agents/developer-guidelines.md
+++ b/.agents/developer-guidelines.md
@@ -29,15 +29,17 @@ and elegance.
   make intent clear. Keep high-level orchestration clear, move low-level
   mechanics into well-named helpers when helpful, and put critical code before
   helper details when local conventions allow it.
-- **Validate at boundaries.** Check user input, files, network responses, and
-  external API results at the edge. Keep internal code simple by trusting types
-  and invariants instead of repeatedly checking for impossible states.
-- **Comment cautiously.** Code is the source of truth for what happens and how.
-  Add comments only when the reason is not obvious. Apply this guidance to new
-  comments only; do not rewrite or delete existing comments as cleanup.
+- **Comment cautiously.** Code should be clear and be the source of truth
+  for what happens, how it happens, and why; use comments only when the why is
+  not obvious from the code. First ask whether better names, clearer structure,
+  or simpler code can explain the intent without a comment. (Apply this guidance
+  to new comments only; do not rewrite or delete existing comments.)
 - **Scale documentation to the API.** Higher-level and user-visible APIs deserve
   useful docstrings, including examples when helpful. Lower-level internals need
   docstrings only when names, types, and structure are not enough.
+- **Validate at boundaries.** Check user input, files, network responses, and
+  external API results at the edge. Keep internal code simple by trusting types
+  and invariants instead of repeatedly checking for impossible states.
 - **Remove touched dead code.** Delete unused imports, unreachable branches,
   obsolete placeholders, stale TODOs, and debug code when they are part of the
   behavior you are already touching.

From 69e2c49686fe0c33c7602b008b414854c75115c0 Mon Sep 17 00:00:00 2001
From: realAsma <akuriparambi@nvidia.com>
Date: Thu, 14 May 2026 16:18:27 +0000
Subject: [PATCH 3/8] docs: update developer guidelines

Signed-off-by: realAsma <akuriparambi@nvidia.com>
---
 .agents/developer-guidelines.md | 94 +++++++++++++--------------------
 1 file changed, 37 insertions(+), 57 deletions(-)

diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
index 2feb1484830..8034c531165 100644
--- a/.agents/developer-guidelines.md
+++ b/.agents/developer-guidelines.md
@@ -1,50 +1,34 @@
 # Coding Principles
 
-Guidelines for production code in ModelOpt. Key values: simplicity, minimalism,
-and elegance.
+Guidelines for production code in ModelOpt. Key values: simplicity, modularity,
+and conciseness.
 
 ## Principles
 
-- **Be surgical.** Touch the code required to solve the actual problem, whether
-  that is one line or a broader design change. Avoid speculative refactors,
-  drive-by cleanup, unrelated rewrites, and half-finished implementations.
-- **Fix root causes.** Prefer the right fix over the most local patch. Do not
-  paper over symptoms with temporary fixes unless the temporary nature and
-  follow-up are explicit.
-- **Design for simplicity.** Choose the design that keeps code easiest to read
-  and change. Put behavior at the right level, tie extensibility to known needs,
-  and treat heavy branching or conditional logic as bad design smells.
-- **Respect ownership.** Keep behavior in the layer that owns it. Parent
-  abstractions should contain shared contracts and shared behavior, not
-  child-specific special cases.
-- **Keep one source of truth.** Put shared behavior, configuration, constants,
-  validation, and documentation in the single place that owns them. Reuse
-  existing helpers and shared APIs instead of copying logic or duplicating
-  state.
-- **Abstract to simplify.** Use helpers, base classes, registries, adapters,
-  plugins, or extension points when they remove real duplication, clarify
-  ownership, support current variation, or make call sites simpler. Do not add
-  abstractions for speculative future cases.
-- **Make code readable at the point of use.** Names, types, and structure should
-  make intent clear. Keep high-level orchestration clear, move low-level
-  mechanics into well-named helpers when helpful, and put critical code before
-  helper details when local conventions allow it.
-- **Comment cautiously.** Code should be clear and be the source of truth
-  for what happens, how it happens, and why; use comments only when the why is
-  not obvious from the code. First ask whether better names, clearer structure,
-  or simpler code can explain the intent without a comment. (Apply this guidance
-  to new comments only; do not rewrite or delete existing comments.)
-- **Scale documentation to the API.** Higher-level and user-visible APIs deserve
-  useful docstrings, including examples when helpful. Lower-level internals need
-  docstrings only when names, types, and structure are not enough.
-- **Validate at boundaries.** Check user input, files, network responses, and
-  external API results at the edge. Keep internal code simple by trusting types
-  and invariants instead of repeatedly checking for impossible states.
-- **Remove touched dead code.** Delete unused imports, unreachable branches,
-  obsolete placeholders, stale TODOs, and debug code when they are part of the
-  behavior you are already touching.
-- **Use workspace-relative paths.** Use relative paths in commands and file
-  references unless an absolute path is needed to disambiguate.
+- **Prefer simple, surgical changes.** Touch only what the task requires. Avoid speculative
+  refactors, broad rewrites, and "while we're here" cleanups.
+- **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain.
+  Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers,
+  and treat heavy branching as a signal to reconsider the design.
+- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding. Use
+  existing helpers, base classes, registries, and plugins when they fit. Keep scope limited to known
+  cases; don't add complexity for speculative future needs.
+- **Respect inheritance.** Parent abstractions should contain shared contracts and
+  shared behavior, not child-specific special cases.
+- **Don't repeat yourself.** Reuse existing helpers, APIs, and classes when possible. Prefer reusing or
+  generalizing existing code when it keeps the design simpler. If the same logic or intent appears elsewhere,
+  consolidate it.
+- **Comment cautiously.** Comments should add context, not translate code into English.
+  Prefer making the code self-explanatory first. Use comments only for non-obvious
+  intent or constraints that remain unclear from the code. Apply this guidance to new
+  comments only; do not rewrite or delete existing comments just for style.
+- **Document public APIs.** Public and higher-level APIs should have docstrings, including examples when useful.
+  Internal helpers should usually be self-documenting through clear names and structure.
+- **Fix the bug cause, not the side effect.** For bug fixes, find the root cause instead of patching for its side effect.
+- **Validate external input once.** Check types and values at the interface boundary. Internal code can trust those
+  checks and avoid redundant assertions.
+- **Remove dead code.** Delete unused imports, unreachable branches, and obsolete helpers.
+- **Use relative paths** from the repo root in commands and file references.
 
 ## Testing
 
@@ -59,21 +43,17 @@ and elegance.
 
 ## Performant AI Code
 
-- **Avoid stray CPU-GPU syncs.** Tensor metadata such as `tensor.shape` is safe
-  to read, but scalar extraction or CPU transfers such as `tensor.item()`,
-  `float(tensor)`, `bool(tensor)`, `tensor.cpu()`, `tensor.numpy()`, etc. can
-  force CPU-GPU synchronization. Keep computation on GPU unless the CPU actually
-  needs the value.
-- **Use rank-aware logging.** Default to `print_rank_0` instead of `print` and
-  `warn_rank_0` instead of generic warnings. Use per-rank output only when each
-  process needs to report distinct state. Generic prints and warnings clog
-  distributed logs.
-- **Respect distributed invariants.** Avoid hidden synchronization, global state,
-  per-rank file races, or assumptions that only hold on a single process.
+- **Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs.** Reading metadata such as `tensor.shape` is fine.
+  Avoid Python scalar extraction and operators such as `tensor.item()`, `float(tensor)`, or `min(tensor)` because they
+  can trigger CPU-GPU syncs. Use PyTorch tensor ops such as `tensor.min()` by default, and only extract Python scalars
+  when the CPU needs the value. Tensor-value-based Python branching can also break CUDA graphs.
+- **Develop with distributed processing in mind.** Examples: Use `print_rank_0` or `warn_rank_0`
+  when possible to avoid noisy logs. Guard shared side effects, such as
+  file writes or shared state updates, against race conditions between ranks.
 
 ## Compatibility
 
-- **Preserve config and checkpoint compatibility.** Treat ModelOpt config schemas
-  and checkpoint formats as persisted contracts. When changing configs such as
-  `QuantizeConfig`, maintain backward compatibility with previous ModelOpt
-  checkpoints unless a breaking change is explicit and intentionally handled.
+- **Preserve config and checkpoint backward compatibility.** ModelOpt checkpoints include serialized
+  `ModeloptBaseConfig` instances such as `QuantizeConfig`. If these Pydantic-based configs change
+  without backward compatibility handling, older checkpoints may no longer load. Make breaking changes
+  explicit and intentional.

From 6419d7739e7d5442f28d56c9dbb5c7c055e7c54d Mon Sep 17 00:00:00 2001
From: realAsma <akuriparambi@nvidia.com>
Date: Thu, 14 May 2026 16:45:33 +0000
Subject: [PATCH 4/8] docs: address agent guidance review follow-ups

Signed-off-by: realAsma <akuriparambi@nvidia.com>
---
 .github/workflows/claude_review.yml | 3 ++-
 CONTRIBUTING.md                     | 4 ++--
 README.md                           | 3 +--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/.github/workflows/claude_review.yml b/.github/workflows/claude_review.yml
index 69119089fee..b62985e24e0 100644
--- a/.github/workflows/claude_review.yml
+++ b/.github/workflows/claude_review.yml
@@ -81,7 +81,8 @@ jobs:
 
             Mandatory workflow — never skip or reorder:
             1. Read the PR diff first (gh pr diff).
-            2. Read CLAUDE.md and CONTRIBUTING.md for project conventions and architecture.
+            2. Read CLAUDE.md (symlinked to .agents/README.md), .agents/developer-guidelines.md,
+               and CONTRIBUTING.md for project conventions, coding principles, and architecture.
             3. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
                `__init__.py` plus any `mode.py` / `config.py` to understand mode registration
                and config schema.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index a1f2d114351..f7debbbc6ee 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -95,10 +95,10 @@ For lightweight focused local validation, run `pytest` directly on the relevant
 pytest tests/unit/torch/quantization
 ```
 
-For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s <session>`. The `partial_unit-3.12(torch)` session covers the broader torch unit test suite and installs heavier dependencies, including `megatron-core`:
+For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s <session>`. The `unit-3.12(torch_211, tf_latest)` session runs `tests/unit` with a specific Torch and Transformers combination:
 
 ```bash
-nox -s "partial_unit-3.12(torch)"
+nox -s "unit-3.12(torch_211, tf_latest)"
 ```
 
 ## ✍️ Signing your work
diff --git a/README.md b/README.md
index cfd6c0adb3e..6a4f023e4f0 100644
--- a/README.md
+++ b/README.md
@@ -153,8 +153,7 @@ Please read our [Contributing](./CONTRIBUTING.md) guidelines for details on how
 
 ## AI Agents
 
-For AI-assisted development setup, including local Claude Code and Codex
-override files, see the [agent tooling notes](./.agents/TOOLING.md).
+For AI-assisted development setup, see the [agent tooling notes](./.agents/TOOLING.md).
 
 ### Top Contributors
 

From 9cdcc1a7963bdaf3b81553b8a22d2ca319daf559 Mon Sep 17 00:00:00 2001
From: realAsma <akuriparambi@nvidia.com>
Date: Thu, 14 May 2026 17:36:08 +0000
Subject: [PATCH 5/8] docs: refine agent entrypoint guidance

Signed-off-by: realAsma <akuriparambi@nvidia.com>
---
 .agents/developer-guidelines.md | 13 +++++------
 AGENTS.md                       | 40 ++++++++++++++++++++++++++++++++-
 CLAUDE.md                       |  2 +-
 3 files changed, 46 insertions(+), 9 deletions(-)
 mode change 120000 => 100644 AGENTS.md

diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
index 8034c531165..d9fbbed2ecd 100644
--- a/.agents/developer-guidelines.md
+++ b/.agents/developer-guidelines.md
@@ -10,14 +10,13 @@ and conciseness.
 - **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain.
   Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers,
   and treat heavy branching as a signal to reconsider the design.
-- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding. Use
-  existing helpers, base classes, registries, and plugins when they fit. Keep scope limited to known
-  cases; don't add complexity for speculative future needs.
-- **Respect inheritance.** Parent abstractions should contain shared contracts and
+- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding.
+  Use existing extension points when they fit. If none fit, add a simple, focused helper,
+  class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases.
+- **Respect inheritance boundaries.** Parent abstractions should define shared contracts and
   shared behavior, not child-specific special cases.
-- **Don't repeat yourself.** Reuse existing helpers, APIs, and classes when possible. Prefer reusing or
-  generalizing existing code when it keeps the design simpler. If the same logic or intent appears elsewhere,
-  consolidate it.
+- **Don't repeat yourself.** Consolidate repeated logic or intent with a shared helper, API,
+  or abstraction when doing so keeps the design simpler.
 - **Comment cautiously.** Comments should add context, not translate code into English.
   Prefer making the code self-explanatory first. Use comments only for non-obvious
   intent or constraints that remain unclear from the code. Apply this guidance to new
diff --git a/AGENTS.md b/AGENTS.md
deleted file mode 120000
index fa67a2d3e2a..00000000000
--- a/AGENTS.md
+++ /dev/null
@@ -1 +0,0 @@
-.agents/README.md
\ No newline at end of file
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 00000000000..2ca840f8a5d
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,39 @@
+# Agent Instructions for ModelOpt
+
+These instructions apply to AI-assisted work in this repository.
+
+## Repository orientation
+
+- Start with `README.md` for project overview and install.
+- Use `modelopt/` for source, `tests/` for focused test coverage, and
+  `examples/` or `docs/` for usage patterns.
+
+## Coding guidelines
+
+- **Coding guide:** Code development and review require reading and following
+  [.agents/developer-guidelines.md](/home/scratch.akuriparambi_coreai/Model-Optimizer-Agent/.agents/developer-guidelines.md);
+  do not skip this step.
+
+## Iterative development
+
+- **Running tests:** Follow the
+  [writing and running tests](CONTRIBUTING.md#-writing-and-running-tests)
+  instructions. For fast initial iteration, choose focused tests for the
+  changed area from `tests/`.
+- **Running pre-commit:** Follow the
+  [pre-commit hook instructions](CONTRIBUTING.md#pre-commit-hooks). Hooks may
+  modify files; review and re-stage those changes before committing.
+- **Signed commit:** Use `git commit -s -S -m "<message>"` for commits so they
+  follow the [signing your work](CONTRIBUTING.md#-signing-your-work)
+  requirements.
+- **Never `git push` without explicit approval in the current turn.** Commit
+  locally is fine; publishing to a remote is not.
+- After `git commit`, stop and wait for the user to say "push", "publish",
+  "ship", or equivalent before running `git push`, `gh pr create`, or any
+  push-option flags like `-o merge_request.create`.
+
+## Contributing and PR readiness
+
+- Before opening or marking a PR ready for review, read the
+  [submitting your code](CONTRIBUTING.md#submitting-your-code) guidance.
+- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist.
diff --git a/CLAUDE.md b/CLAUDE.md
index fa67a2d3e2a..47dc3e3d863 120000
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1 +1 @@
-.agents/README.md
\ No newline at end of file
+AGENTS.md
\ No newline at end of file

From 3cfb9a4a772f9e82052f704e8f7378fdf4f31f5e Mon Sep 17 00:00:00 2001
From: realAsma <akuriparambi@nvidia.com>
Date: Thu, 14 May 2026 17:42:47 +0000
Subject: [PATCH 6/8] docs: use relative agent guide link

Signed-off-by: realAsma <akuriparambi@nvidia.com>
---
 AGENTS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/AGENTS.md b/AGENTS.md
index 2ca840f8a5d..3000fce922b 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -11,7 +11,7 @@ These instructions apply to AI-assisted work in this repository.
 ## Coding guidelines
 
 - **Coding guide:** Code development and review require reading and following
-  [.agents/developer-guidelines.md](/home/scratch.akuriparambi_coreai/Model-Optimizer-Agent/.agents/developer-guidelines.md);
+  [.agents/developer-guidelines.md](.agents/developer-guidelines.md);
   do not skip this step.
 
 ## Iterative development

From 1bf4c0123f8165920814304b372fa8406ada5576 Mon Sep 17 00:00:00 2001
From: realAsma <akuriparambi@nvidia.com>
Date: Thu, 14 May 2026 17:53:54 +0000
Subject: [PATCH 7/8] docs: align agent entrypoint docs

Signed-off-by: realAsma <akuriparambi@nvidia.com>
---
 .agents/README.md                   | 38 -----------------------------
 .agents/TOOLING.md                  |  5 ++--
 .agents/developer-guidelines.md     |  4 +--
 .github/workflows/claude_review.yml |  2 +-
 4 files changed, 5 insertions(+), 44 deletions(-)
 delete mode 100644 .agents/README.md

diff --git a/.agents/README.md b/.agents/README.md
deleted file mode 100644
index c2b6727c582..00000000000
--- a/.agents/README.md
+++ /dev/null
@@ -1,38 +0,0 @@
-# Agent Instructions for ModelOpt
-
-These instructions apply to AI-assisted work in this repository.
-
-## Repository orientation
-
-- Start with `README.md` for project overview and install.
-- Use `modelopt/` for source, `tests/` for focused test coverage, and
-  `examples/` or `docs/` for usage patterns.
-
-## Coding guidelines
-
-- **Coding guide:** Code development and review require reading and following
-  `.agents/developer-guidelines.md`; do not skip this step.
-
-## Iterative development
-
-- **Running tests:** Follow the
-  [writing and running tests](../CONTRIBUTING.md#-writing-and-running-tests)
-  instructions. For fast initial iteration, choose focused tests for the
-  changed area from `tests/`.
-- **Running pre-commit:** Follow the
-  [pre-commit hook instructions](../CONTRIBUTING.md#pre-commit-hooks). Hooks may
-  modify files; review and re-stage those changes before committing.
-- **Signed commit:** Use `git commit -s -S -m "<message>"` for commits so they
-  follow the [signing your work](../CONTRIBUTING.md#-signing-your-work)
-  requirements.
-- **Never `git push` without explicit approval in the current turn.** Commit
-  locally is fine; publishing to a remote is not.
-- After `git commit`, stop and wait for the user to say "push", "publish",
-  "ship", or equivalent before running `git push`, `gh pr create`, or any
-  push-option flags like `-o merge_request.create`.
-
-## Contributing and PR readiness
-
-- Before opening or marking a PR ready for review, read the
-  [submitting your code](../CONTRIBUTING.md#submitting-your-code) guidance.
-- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist.
diff --git a/.agents/TOOLING.md b/.agents/TOOLING.md
index 20346f2e955..ecef150f4ec 100644
--- a/.agents/TOOLING.md
+++ b/.agents/TOOLING.md
@@ -5,9 +5,8 @@ of the always-loaded agent instructions.
 
 ## Shared Instructions
 
-Update `.agents/README.md` for repository-wide agent instructions. The root
-`AGENTS.md` and `CLAUDE.md` files are symlinked to `.agents/README.md`, so
-changes there apply to both Codex and Claude Code.
+Update `AGENTS.md` for repository-wide agent instructions. `CLAUDE.md` is
+symlinked to `AGENTS.md`, so changes there apply to both Codex and Claude Code.
 
 ## Local Overrides
 
diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
index d9fbbed2ecd..6fc5e7e980f 100644
--- a/.agents/developer-guidelines.md
+++ b/.agents/developer-guidelines.md
@@ -15,8 +15,8 @@ and conciseness.
   class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases.
 - **Respect inheritance boundaries.** Parent abstractions should define shared contracts and
   shared behavior, not child-specific special cases.
-- **Don't repeat yourself.** Consolidate repeated logic or intent with a shared helper, API,
-  or abstraction when doing so keeps the design simpler.
+- **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API,
+  or abstraction when doing so keeps the design simpler. Avoid parallel implementations that can drift out of sync.
 - **Comment cautiously.** Comments should add context, not translate code into English.
   Prefer making the code self-explanatory first. Use comments only for non-obvious
   intent or constraints that remain unclear from the code. Apply this guidance to new
diff --git a/.github/workflows/claude_review.yml b/.github/workflows/claude_review.yml
index b62985e24e0..a41c7571acc 100644
--- a/.github/workflows/claude_review.yml
+++ b/.github/workflows/claude_review.yml
@@ -81,7 +81,7 @@ jobs:
 
             Mandatory workflow — never skip or reorder:
             1. Read the PR diff first (gh pr diff).
-            2. Read CLAUDE.md (symlinked to .agents/README.md), .agents/developer-guidelines.md,
+            2. Read AGENTS.md, .agents/developer-guidelines.md,
                and CONTRIBUTING.md for project conventions, coding principles, and architecture.
             3. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
                `__init__.py` plus any `mode.py` / `config.py` to understand mode registration

From 283f5170c7db95edb1b8c2098a508efa61d54800 Mon Sep 17 00:00:00 2001
From: realAsma <akuriparambi@nvidia.com>
Date: Thu, 14 May 2026 17:57:43 +0000
Subject: [PATCH 8/8] docs: clarify single-source guidance

Signed-off-by: realAsma <akuriparambi@nvidia.com>
---
 .agents/developer-guidelines.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
index 6fc5e7e980f..10b3d901499 100644
--- a/.agents/developer-guidelines.md
+++ b/.agents/developer-guidelines.md
@@ -16,7 +16,7 @@ and conciseness.
 - **Respect inheritance boundaries.** Parent abstractions should define shared contracts and
   shared behavior, not child-specific special cases.
 - **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API,
-  or abstraction when doing so keeps the design simpler. Avoid parallel implementations that can drift out of sync.
+  or abstraction when doing so keeps the design simpler. Avoid duplication that can drift out of sync.
 - **Comment cautiously.** Comments should add context, not translate code into English.
   Prefer making the code self-explanatory first. Use comments only for non-obvious
   intent or constraints that remain unclear from the code. Apply this guidance to new