NVIDIA · realAsma · May 14, 2026 · May 14, 2026 · May 14, 2026 · May 14, 2026
diff --git a/.agents/TOOLING.md b/.agents/TOOLING.md
@@ -0,0 +1,17 @@
+# Agent Tooling Notes
+
+These notes are for humans maintaining repository agent setup. They are not part
+of the always-loaded agent instructions.
+
+## Shared Instructions
+
+Update `AGENTS.md` for repository-wide agent instructions. `CLAUDE.md` is
+symlinked to `AGENTS.md`, so changes there apply to both Codex and Claude Code.
+
+## Local Overrides
+
+For private local instructions, use the tool-specific override file:
+
+- Claude Code: `CLAUDE.local.md` is additive; it is read after `CLAUDE.md`.
+- Codex: `AGENTS.override.md` replaces `AGENTS.md` in the same directory, so it
+  is not additive. Restate any shared instructions that should still apply.
diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
@@ -0,0 +1,58 @@
+# Coding Principles
+
+Guidelines for production code in ModelOpt. Key values: simplicity, modularity,
+and conciseness.
+
+## Principles
+
+- **Prefer simple, surgical changes.** Touch only what the task requires. Avoid speculative
+  refactors, broad rewrites, and "while we're here" cleanups.
+- **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain.
+  Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers,
+  and treat heavy branching as a signal to reconsider the design.
+- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding.
+  Use existing extension points when they fit. If none fit, add a simple, focused helper,
+  class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases.
+- **Respect inheritance boundaries.** Parent abstractions should define shared contracts and
+  shared behavior, not child-specific special cases.
+- **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API,
+  or abstraction when doing so keeps the design simpler. Avoid duplication that can drift out of sync.
+- **Comment cautiously.** Comments should add context, not translate code into English.
+  Prefer making the code self-explanatory first. Use comments only for non-obvious
+  intent or constraints that remain unclear from the code. Apply this guidance to new
+  comments only; do not rewrite or delete existing comments just for style.
+- **Document public APIs.** Public and higher-level APIs should have docstrings, including examples when useful.
+  Internal helpers should usually be self-documenting through clear names and structure.
+- **Fix the bug cause, not the side effect.** For bug fixes, find the root cause instead of patching for its side effect.
+- **Validate external input once.** Check types and values at the interface boundary. Internal code can trust those
+  checks and avoid redundant assertions.
+- **Remove dead code.** Delete unused imports, unreachable branches, and obsolete helpers.
+- **Use relative paths** from the repo root in commands and file references.
+
+## Testing
+
+- **Develop with focused tests.** During development, write as many focused
+  tests as needed, including lower-level unit tests or internal probes, to
+  understand and harden behavior.
+- **Curate production tests and keep them lean.** Before staging or committing,
+  decide which tests should be checked in. Checked-in tests should document
+  expected behavior, protect against regressions, or flag backward-incompatible
+  behavior changes. Remove redundant lower-level tests when a higher-level test
+  already covers the same behavior, keeping CI/CD fast and lean.
+
+## Performant AI Code
+
+- **Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs.** Reading metadata such as `tensor.shape` is fine.
+  Avoid Python scalar extraction and operators such as `tensor.item()`, `float(tensor)`, or `min(tensor)` because they
+  can trigger CPU-GPU syncs. Use PyTorch tensor ops such as `tensor.min()` by default, and only extract Python scalars
+  when the CPU needs the value. Tensor-value-based Python branching can also break CUDA graphs.
+- **Develop with distributed processing in mind.** Examples: Use `print_rank_0` or `warn_rank_0`
+  when possible to avoid noisy logs. Guard shared side effects, such as
+  file writes or shared state updates, against race conditions between ranks.
+
+## Compatibility
+
+- **Preserve config and checkpoint backward compatibility.** ModelOpt checkpoints include serialized
+  `ModeloptBaseConfig` instances such as `QuantizeConfig`. If these Pydantic-based configs change
+  without backward compatibility handling, older checkpoints may no longer load. Make breaking changes
+  explicit and intentional.
@@ -81,7 +81,8 @@ jobs:
 
             Mandatory workflow — never skip or reorder:
             1. Read the PR diff first (gh pr diff).
-            2. Read CLAUDE.md and CONTRIBUTING.md for project conventions and architecture.
+            2. Read AGENTS.md, .agents/developer-guidelines.md,
+               and CONTRIBUTING.md for project conventions, coding principles, and architecture.
             3. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
                `__init__.py` plus any `mode.py` / `config.py` to understand mode registration
                and config schema.

diff --git a/.gitignore b/.gitignore
@@ -61,6 +61,8 @@ venv/
 
 # Ignore claude local settings
 .claude/settings.local.json
+CLAUDE.local.md
+AGENTS.override.md
 
 # Ignore SonarQube analysis
 .sonar/
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,39 @@
+# Agent Instructions for ModelOpt
+
+These instructions apply to AI-assisted work in this repository.
+
+## Repository orientation
+
+- Start with `README.md` for project overview and install.
+- Use `modelopt/` for source, `tests/` for focused test coverage, and
+  `examples/` or `docs/` for usage patterns.
+
+## Coding guidelines
+
+- **Coding guide:** Code development and review require reading and following
+  [.agents/developer-guidelines.md](.agents/developer-guidelines.md);
+  do not skip this step.
+
+## Iterative development
+
+- **Running tests:** Follow the
+  [writing and running tests](CONTRIBUTING.md#-writing-and-running-tests)
+  instructions. For fast initial iteration, choose focused tests for the
+  changed area from `tests/`.
+- **Running pre-commit:** Follow the
+  [pre-commit hook instructions](CONTRIBUTING.md#pre-commit-hooks). Hooks may
+  modify files; review and re-stage those changes before committing.
+- **Signed commit:** Use `git commit -s -S -m "<message>"` for commits so they
+  follow the [signing your work](CONTRIBUTING.md#-signing-your-work)
+  requirements.
+- **Never `git push` without explicit approval in the current turn.** Commit
+  locally is fine; publishing to a remote is not.
+- After `git commit`, stop and wait for the user to say "push", "publish",
+  "ship", or equivalent before running `git push`, `gh pr create`, or any
+  push-option flags like `-o merge_request.create`.
+
+## Contributing and PR readiness
+
+- Before opening or marking a PR ready for review, read the
+  [submitting your code](CONTRIBUTING.md#submitting-your-code) guidance.
+- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist.
diff --git a/CLAUDE.md b/CLAUDE.md
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
@@ -79,7 +79,7 @@ If you are an external contributor, seek guidance from `@NVIDIA/modelopt-setup-c
 
 See [`modelopt/torch/quantization/utils/calib_utils.py`](./modelopt/torch/quantization/utils/calib_utils.py) for an example of the correct license header format.
 
-## 📝 Writing tests
+## 📝 Writing and running tests
 
 We use [pytest](https://docs.pytest.org/) for all tests. For any new features / examples, make sure to add tests and that the coverage check in your PR passes. The tests are organized into the following directories:
 
@@ -89,7 +89,17 @@ We use [pytest](https://docs.pytest.org/) for all tests. For any new features /
 - `tests/gpu_trtllm`: Fast GPU-based unit tests for the core ModelOpt library for TensorRT-LLM features. In most cases, they should not take more than a few seconds to run.
 - `tests/examples`: Integration tests for ModelOpt examples. They should not take more than a few minutes to run. Please refer to [example test README](./tests/examples/README.md) for more details.
 
-Please refer to [noxfile.py](./noxfile.py) for more details on how to run the tests and their dependencies.
+For lightweight focused local validation, run `pytest` directly on the relevant test path. For example:
+
+```bash
+pytest tests/unit/torch/quantization
+```
+
+For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s <session>`. The `unit-3.12(torch_211, tf_latest)` session runs `tests/unit` with a specific Torch and Transformers combination:
+
+```bash
+nox -s "unit-3.12(torch_211, tf_latest)"
+```
 
 ## ✍️ Signing your work
 

@@ -151,6 +151,10 @@ Model Optimizer follows a structured approach to managing deprecated features:
 Model Optimizer is now open source! We welcome any feedback, feature requests and PRs.
 Please read our [Contributing](./CONTRIBUTING.md) guidelines for details on how to contribute to this project.
 
+## AI Agents
+
+For AI-assisted development setup, see the [agent tooling notes](./.agents/TOOLING.md).
+
 ### Top Contributors
 
 [![Contributors](https://contrib.rocks/image?repo=NVIDIA/Model-Optimizer)](https://github.com/NVIDIA/Model-Optimizer/graphs/contributors)