From 85fd1fb63322ea10a695b10883a1d28c3c8c264f Mon Sep 17 00:00:00 2001 From: realAsma Date: Thu, 14 May 2026 01:02:38 +0000 Subject: [PATCH 1/8] docs: consolidate agent instructions Signed-off-by: realAsma --- .agents/README.md | 38 +++++++++ .agents/TOOLING.md | 18 +++++ .agents/developer-guidelines.md | 77 ++++++++++++++++++ .gitignore | 2 + AGENTS.md | 1 + CLAUDE.md | 134 +------------------------------- CONTRIBUTING.md | 14 +++- README.md | 5 ++ 8 files changed, 154 insertions(+), 135 deletions(-) create mode 100644 .agents/README.md create mode 100644 .agents/TOOLING.md create mode 100644 .agents/developer-guidelines.md create mode 120000 AGENTS.md mode change 100644 => 120000 CLAUDE.md diff --git a/.agents/README.md b/.agents/README.md new file mode 100644 index 00000000000..c2b6727c582 --- /dev/null +++ b/.agents/README.md @@ -0,0 +1,38 @@ +# Agent Instructions for ModelOpt + +These instructions apply to AI-assisted work in this repository. + +## Repository orientation + +- Start with `README.md` for project overview and install. +- Use `modelopt/` for source, `tests/` for focused test coverage, and + `examples/` or `docs/` for usage patterns. + +## Coding guidelines + +- **Coding guide:** Code development and review require reading and following + `.agents/developer-guidelines.md`; do not skip this step. + +## Iterative development + +- **Running tests:** Follow the + [writing and running tests](../CONTRIBUTING.md#-writing-and-running-tests) + instructions. For fast initial iteration, choose focused tests for the + changed area from `tests/`. +- **Running pre-commit:** Follow the + [pre-commit hook instructions](../CONTRIBUTING.md#pre-commit-hooks). Hooks may + modify files; review and re-stage those changes before committing. +- **Signed commit:** Use `git commit -s -S -m ""` for commits so they + follow the [signing your work](../CONTRIBUTING.md#-signing-your-work) + requirements. +- **Never `git push` without explicit approval in the current turn.** Commit + locally is fine; publishing to a remote is not. +- After `git commit`, stop and wait for the user to say "push", "publish", + "ship", or equivalent before running `git push`, `gh pr create`, or any + push-option flags like `-o merge_request.create`. + +## Contributing and PR readiness + +- Before opening or marking a PR ready for review, read the + [submitting your code](../CONTRIBUTING.md#submitting-your-code) guidance. +- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist. diff --git a/.agents/TOOLING.md b/.agents/TOOLING.md new file mode 100644 index 00000000000..20346f2e955 --- /dev/null +++ b/.agents/TOOLING.md @@ -0,0 +1,18 @@ +# Agent Tooling Notes + +These notes are for humans maintaining repository agent setup. They are not part +of the always-loaded agent instructions. + +## Shared Instructions + +Update `.agents/README.md` for repository-wide agent instructions. The root +`AGENTS.md` and `CLAUDE.md` files are symlinked to `.agents/README.md`, so +changes there apply to both Codex and Claude Code. + +## Local Overrides + +For private local instructions, use the tool-specific override file: + +- Claude Code: `CLAUDE.local.md` is additive; it is read after `CLAUDE.md`. +- Codex: `AGENTS.override.md` replaces `AGENTS.md` in the same directory, so it + is not additive. Restate any shared instructions that should still apply. diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md new file mode 100644 index 00000000000..0c5dd057214 --- /dev/null +++ b/.agents/developer-guidelines.md @@ -0,0 +1,77 @@ +# Coding Principles + +Guidelines for production code in ModelOpt. Key values: simplicity, minimalism, +and elegance. + +## Principles + +- **Be surgical.** Touch the code required to solve the actual problem, whether + that is one line or a broader design change. Avoid speculative refactors, + drive-by cleanup, unrelated rewrites, and half-finished implementations. +- **Fix root causes.** Prefer the right fix over the most local patch. Do not + paper over symptoms with temporary fixes unless the temporary nature and + follow-up are explicit. +- **Design for simplicity.** Choose the design that keeps code easiest to read + and change. Put behavior at the right level, tie extensibility to known needs, + and treat heavy branching or conditional logic as bad design smells. +- **Respect ownership.** Keep behavior in the layer that owns it. Parent + abstractions should contain shared contracts and shared behavior, not + child-specific special cases. +- **Keep one source of truth.** Put shared behavior, configuration, constants, + validation, and documentation in the single place that owns them. Reuse + existing helpers and shared APIs instead of copying logic or duplicating + state. +- **Abstract to simplify.** Use helpers, base classes, registries, adapters, + plugins, or extension points when they remove real duplication, clarify + ownership, support current variation, or make call sites simpler. Do not add + abstractions for speculative future cases. +- **Make code readable at the point of use.** Names, types, and structure should + make intent clear. Keep high-level orchestration clear, move low-level + mechanics into well-named helpers when helpful, and put critical code before + helper details when local conventions allow it. +- **Validate at boundaries.** Check user input, files, network responses, and + external API results at the edge. Keep internal code simple by trusting types + and invariants instead of repeatedly checking for impossible states. +- **Comment cautiously.** Code is the source of truth for what happens and how. + Add comments only when the reason is not obvious. Apply this guidance to new + comments only; do not rewrite or delete existing comments as cleanup. +- **Scale documentation to the API.** Higher-level and user-visible APIs deserve + useful docstrings, including examples when helpful. Lower-level internals need + docstrings only when names, types, and structure are not enough. +- **Remove touched dead code.** Delete unused imports, unreachable branches, + obsolete placeholders, stale TODOs, and debug code when they are part of the + behavior you are already touching. +- **Use workspace-relative paths.** Use relative paths in commands and file + references unless an absolute path is needed to disambiguate. + +## Testing + +- **Develop with focused tests.** During development, write as many focused + tests as needed, including lower-level unit tests or internal probes, to + understand and harden behavior. +- **Curate production tests and keep them lean.** Before staging or committing, + decide which tests should be checked in. Checked-in tests should document + expected behavior, protect against regressions, or flag backward-incompatible + behavior changes. Remove redundant lower-level tests when a higher-level test + already covers the same behavior, keeping CI/CD fast and lean. + +## Performant AI Code + +- **Avoid stray CPU-GPU syncs.** Tensor metadata such as `tensor.shape` is safe + to read, but scalar extraction or CPU transfers such as `tensor.item()`, + `float(tensor)`, `bool(tensor)`, `tensor.cpu()`, `tensor.numpy()`, etc. can + force CPU-GPU synchronization. Keep computation on GPU unless the CPU actually + needs the value. +- **Use rank-aware logging.** Default to `print_rank_0` instead of `print` and + `warn_rank_0` instead of generic warnings. Use per-rank output only when each + process needs to report distinct state. Generic prints and warnings clog + distributed logs. +- **Respect distributed invariants.** Avoid hidden synchronization, global state, + per-rank file races, or assumptions that only hold on a single process. + +## Compatibility + +- **Preserve config and checkpoint compatibility.** Treat ModelOpt config schemas + and checkpoint formats as persisted contracts. When changing configs such as + `QuantizeConfig`, maintain backward compatibility with previous ModelOpt + checkpoints unless a breaking change is explicit and intentionally handled. diff --git a/.gitignore b/.gitignore index 09a61233b6a..66ce5568ee0 100644 --- a/.gitignore +++ b/.gitignore @@ -61,6 +61,8 @@ venv/ # Ignore claude local settings .claude/settings.local.json +CLAUDE.local.md +AGENTS.override.md # Ignore SonarQube analysis .sonar/ diff --git a/AGENTS.md b/AGENTS.md new file mode 120000 index 00000000000..fa67a2d3e2a --- /dev/null +++ b/AGENTS.md @@ -0,0 +1 @@ +.agents/README.md \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 346ac17eb5f..00000000000 --- a/CLAUDE.md +++ /dev/null @@ -1,133 +0,0 @@ -# CLAUDE.md - -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. - -NVIDIA Model Optimizer (ModelOpt): open-source library for model optimization techniques including -quantization, pruning, distillation, sparsity, and speculative decoding to accelerate inference. -Primarily Python codebase with optional C++/CUDA extensions supporting PyTorch, ONNX, and Hugging Face/Megatron models. - -> If a `CLAUDE.local.md` file exists alongside this file, read and respect it — it contains -> developer-specific overrides that supplement this shared guidance. - -## Rules (Read First) - -**CRITICAL (YOU MUST):** - -- NVIDIA Apache 2.0 license header on ALL new Python/C++/CUDA files — use the SPDX format from `LICENSE_HEADER` (auto-inserted by pre-commit for most files, but must be added manually for files copied from third-party sources, which are excluded from the hook) -- `git commit -s -S` (DCO sign-off + cryptographic signing required). Never attribute AI tools in - sign-off line -- `pre-commit` hooks run on commit — if files are modified by hooks, re-stage and commit again -- PRs require CODEOWNERS review (auto-assigned based on `.github/CODEOWNERS`) -- When creating PRs (`gh pr create`), fill in `.github/PULL_REQUEST_TEMPLATE.md` verbatim — do NOT substitute the harness's default `## Summary` / `## Test plan` format -- For non-trivial PRs, run `/claude review` to get Claude approval before merging (NVIDIA org members can self-trigger; orthogonal to CodeRabbit) -- After rebasing, always re-run tests locally before pushing -- All code must follow the security guidelines in `SECURITY.md` — violations are blocked as pre-merge errors -- For contribution guidelines, commit conventions, and PR requirements, see `CONTRIBUTING.md` -- New PIP dependencies require license verification — non-permissive licenses need justification and approval from `@NVIDIA/modelopt-setup-codeowners` - -## Common Commands - -| Task | Command | -|------|---------| -| Install (editable + dev) | `pip install -e ".[dev]"` | -| Enable pre-commit hooks | `pre-commit install` | -| CPU unit tests | `python -m pytest tests/unit` | -| GPU unit tests | `python -m pytest tests/gpu` | -| Megatron GPU tests | `python -m pytest tests/gpu_megatron` | -| TRT-LLM GPU tests | `python -m pytest tests/gpu_trtllm` | -| Single test file | `python -m pytest tests/unit/torch/quantization/test_quant_config.py` | -| Pattern match | `pytest tests/unit -k "test_quantize"` | -| Lint + format (all files) | `pre-commit run --all-files` | -| Lint (diff only) | `pre-commit run --from-ref origin/main --to-ref HEAD` | -| Run via nox (CPU unit) | `nox -s "unit-3.12(torch_211, tf_latest)"` | -| Build docs | `nox -s docs` | -| Build wheel | `nox -s build_wheel` | - -## Architecture - -ModelOpt code base is organized into four top-level namespaces: - -| Namespace | Path | Role | -|-----------|------|------| -| `modelopt.torch` | `modelopt/torch/` | Core PyTorch optimization library | -| `modelopt.onnx` | `modelopt/onnx/` | ONNX model quantization and export | -| `modelopt.deploy` | `modelopt/deploy/` | Deployment utilities for LLMs | -| `modelopt.recipe` | `modelopt/recipe/` | Recipe loading, parsing, and validation infrastructure | - -### `modelopt.torch` Sub-packages - -| Sub-package | Path | Role | -|-------------|------|------| -| `opt` | `modelopt/torch/opt/` | Core optimization infrastructure (modes, config, state dicts) | -| `quantization` | `modelopt/torch/quantization/` | PTQ, QAT, and quantization-aware algorithms | -| `prune` | `modelopt/torch/prune/` | Structured and unstructured pruning | -| `distill` | `modelopt/torch/distill/` | Knowledge distillation | -| `sparsity` | `modelopt/torch/sparsity/` | Weight and activation sparsity | -| `speculative` | `modelopt/torch/speculative/` | Speculative decoding (Medusa, EAGLE, etc.) | -| `nas` | `modelopt/torch/nas/` | Neural architecture search | -| `export` | `modelopt/torch/export/` | Checkpoint export for TRT-LLM / Megatron | -| `peft` | `modelopt/torch/peft/` | QLoRA and PEFT integration | -| `kernels` | `modelopt/torch/kernels/` | Custom CUDA/Triton kernels grouped by role: `common/attention` (baseline Triton FA), `quantization/{conv,gemm}` (implicit-GEMM CUDA + tensor-quant C++/CUDA + fp4/fp8 Triton), `sparsity/attention` (skip-softmax / N:M / diffusers+LTX backends) | -| `_deploy` | `modelopt/torch/_deploy/` | Internal deployment utilities | -| `utils` | `modelopt/torch/utils/` | Shared utilities and plugin infrastructure | - -### Core Abstraction: Modes - -A **mode** is the unit of model optimization in ModelOpt. Each algorithm (quantization, pruning, -etc.) is implemented as one or more modes. Modes are recorded in the model's `modelopt_state` so -optimization workflows can be composed, saved, and restored. - -The main entry points are in `modelopt/torch/opt/conversion.py`: -- `apply_mode(model, mode, ...)` — applies an optimization mode to a model -- `restore(model, ...)` — restores a model to a previously saved optimization state -- `save(model, ...)` / `modelopt_state(model)` — captures the current optimization state - -### Core Abstraction: Recipes - -A **recipe** is a declarative YAML specification of an optimization configuration. Recipes decouple optimization specs from code, enabling reuse, sharing, and version control. - -**Built-in recipes** (`modelopt_recipes/`): - -- `general/ptq/` — general-purpose PTQ recipes -- `configs/` — shared configuration units referenced by recipes - -## Key Files - -| File | Role | -|------|------| -| `modelopt/torch/opt/mode.py` | Base class for all optimization modes | -| `modelopt/torch/opt/config.py` | Configuration system for modes | -| `modelopt/torch/opt/conversion.py` | `apply_mode()` / `restore()` entry points | -| `modelopt/torch/quantization/__init__.py` | PTQ/QAT public API | -| `modelopt/torch/export/unified_export_hf.py` | Unified HF checkpoint export | -| `modelopt/torch/export/model_config_export.py` | TRT-LLM model config export | -| `modelopt/deploy/llm/` | LLM deployment utilities | -| `modelopt/recipe/loader.py` | `load_recipe()` / `load_config()` public API | -| `modelopt/recipe/config.py` | Recipe Pydantic models (`ModelOptPTQRecipe`, `RecipeType`) | -| `modelopt_recipes/general/ptq/` | Built-in PTQ recipe YAML files | -| `pyproject.toml` | Optional dependency groups (`[onnx]`, `[hf]`, `[all]`, `[dev]`); ruff, mypy, pytest, bandit, and coverage config | -| `.pre-commit-config.yaml` | Pre-commit hooks (ruff, mypy, clang-format, license headers) | -| `noxfile.py` | Test session definitions | - -## Design Patterns - -| Pattern | Key Points | -|---------|------------| -| **Mode composition** | Optimization algorithms are composed as sequences of modes, each recorded in `modelopt_state` | -| **Plugin system** | Optional integrations (HuggingFace, Megatron, etc.) loaded lazily via `import_plugin()` | -| **Optional dependencies** | Features gated by install extras (`[onnx]`, `[hf]`, `[all]`); avoid hard imports at module level | -| **Config dataclasses** | Each mode has a typed config; use Pydantic or dataclass conventions | -| **State dict** | Models carry `modelopt_state` for checkpoint save/restore across optimization steps | -| **Declarative recipes** | YAML-based optimization specs in `modelopt_recipes/`; loaded via `load_recipe()`, passed to the model optimization system | - -## CI / Testing - -| Layer | Location | Notes | -|-------|----------|-------| -| CPU unit tests | `tests/unit/` | Fast, no GPU needed; run in pre-merge CI | -| GPU unit tests | `tests/gpu/` | Requires CUDA GPU | -| Megatron GPU tests | `tests/gpu_megatron/` | Requires Megatron-Core + GPU | -| TRT-LLM GPU tests | `tests/gpu_trtllm/` | Requires TensorRT-LLM + GPU | -| Example/integration tests | `tests/examples/` | Integration tests for examples; see `tests/examples/README.md` | -| Pre-commit / lint | `.pre-commit-config.yaml` | ruff, mypy, clang-format, license headers, bandit | -| Coverage | `pyproject.toml` | 70% minimum on `modelopt/*` | diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 00000000000..fa67a2d3e2a --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +.agents/README.md \ No newline at end of file diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 53e879a7bf3..a1f2d114351 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -79,7 +79,7 @@ If you are an external contributor, seek guidance from `@NVIDIA/modelopt-setup-c See [`modelopt/torch/quantization/utils/calib_utils.py`](./modelopt/torch/quantization/utils/calib_utils.py) for an example of the correct license header format. -## 📝 Writing tests +## 📝 Writing and running tests We use [pytest](https://docs.pytest.org/) for all tests. For any new features / examples, make sure to add tests and that the coverage check in your PR passes. The tests are organized into the following directories: @@ -89,7 +89,17 @@ We use [pytest](https://docs.pytest.org/) for all tests. For any new features / - `tests/gpu_trtllm`: Fast GPU-based unit tests for the core ModelOpt library for TensorRT-LLM features. In most cases, they should not take more than a few seconds to run. - `tests/examples`: Integration tests for ModelOpt examples. They should not take more than a few minutes to run. Please refer to [example test README](./tests/examples/README.md) for more details. -Please refer to [noxfile.py](./noxfile.py) for more details on how to run the tests and their dependencies. +For lightweight focused local validation, run `pytest` directly on the relevant test path. For example: + +```bash +pytest tests/unit/torch/quantization +``` + +For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s `. The `partial_unit-3.12(torch)` session covers the broader torch unit test suite and installs heavier dependencies, including `megatron-core`: + +```bash +nox -s "partial_unit-3.12(torch)" +``` ## ✍️ Signing your work diff --git a/README.md b/README.md index ae17522c613..cfd6c0adb3e 100644 --- a/README.md +++ b/README.md @@ -151,6 +151,11 @@ Model Optimizer follows a structured approach to managing deprecated features: Model Optimizer is now open source! We welcome any feedback, feature requests and PRs. Please read our [Contributing](./CONTRIBUTING.md) guidelines for details on how to contribute to this project. +## AI Agents + +For AI-assisted development setup, including local Claude Code and Codex +override files, see the [agent tooling notes](./.agents/TOOLING.md). + ### Top Contributors [![Contributors](https://contrib.rocks/image?repo=NVIDIA/Model-Optimizer)](https://github.com/NVIDIA/Model-Optimizer/graphs/contributors) From ed53a34ac1b5ec24bd6dd505af1847cc57e54fa2 Mon Sep 17 00:00:00 2001 From: realAsma Date: Thu, 14 May 2026 01:31:11 +0000 Subject: [PATCH 2/8] Refine developer comment guidance Signed-off-by: realAsma --- .agents/developer-guidelines.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md index 0c5dd057214..2feb1484830 100644 --- a/.agents/developer-guidelines.md +++ b/.agents/developer-guidelines.md @@ -29,15 +29,17 @@ and elegance. make intent clear. Keep high-level orchestration clear, move low-level mechanics into well-named helpers when helpful, and put critical code before helper details when local conventions allow it. -- **Validate at boundaries.** Check user input, files, network responses, and - external API results at the edge. Keep internal code simple by trusting types - and invariants instead of repeatedly checking for impossible states. -- **Comment cautiously.** Code is the source of truth for what happens and how. - Add comments only when the reason is not obvious. Apply this guidance to new - comments only; do not rewrite or delete existing comments as cleanup. +- **Comment cautiously.** Code should be clear and be the source of truth + for what happens, how it happens, and why; use comments only when the why is + not obvious from the code. First ask whether better names, clearer structure, + or simpler code can explain the intent without a comment. (Apply this guidance + to new comments only; do not rewrite or delete existing comments.) - **Scale documentation to the API.** Higher-level and user-visible APIs deserve useful docstrings, including examples when helpful. Lower-level internals need docstrings only when names, types, and structure are not enough. +- **Validate at boundaries.** Check user input, files, network responses, and + external API results at the edge. Keep internal code simple by trusting types + and invariants instead of repeatedly checking for impossible states. - **Remove touched dead code.** Delete unused imports, unreachable branches, obsolete placeholders, stale TODOs, and debug code when they are part of the behavior you are already touching. From 69e2c49686fe0c33c7602b008b414854c75115c0 Mon Sep 17 00:00:00 2001 From: realAsma Date: Thu, 14 May 2026 16:18:27 +0000 Subject: [PATCH 3/8] docs: update developer guidelines Signed-off-by: realAsma --- .agents/developer-guidelines.md | 94 +++++++++++++-------------------- 1 file changed, 37 insertions(+), 57 deletions(-) diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md index 2feb1484830..8034c531165 100644 --- a/.agents/developer-guidelines.md +++ b/.agents/developer-guidelines.md @@ -1,50 +1,34 @@ # Coding Principles -Guidelines for production code in ModelOpt. Key values: simplicity, minimalism, -and elegance. +Guidelines for production code in ModelOpt. Key values: simplicity, modularity, +and conciseness. ## Principles -- **Be surgical.** Touch the code required to solve the actual problem, whether - that is one line or a broader design change. Avoid speculative refactors, - drive-by cleanup, unrelated rewrites, and half-finished implementations. -- **Fix root causes.** Prefer the right fix over the most local patch. Do not - paper over symptoms with temporary fixes unless the temporary nature and - follow-up are explicit. -- **Design for simplicity.** Choose the design that keeps code easiest to read - and change. Put behavior at the right level, tie extensibility to known needs, - and treat heavy branching or conditional logic as bad design smells. -- **Respect ownership.** Keep behavior in the layer that owns it. Parent - abstractions should contain shared contracts and shared behavior, not - child-specific special cases. -- **Keep one source of truth.** Put shared behavior, configuration, constants, - validation, and documentation in the single place that owns them. Reuse - existing helpers and shared APIs instead of copying logic or duplicating - state. -- **Abstract to simplify.** Use helpers, base classes, registries, adapters, - plugins, or extension points when they remove real duplication, clarify - ownership, support current variation, or make call sites simpler. Do not add - abstractions for speculative future cases. -- **Make code readable at the point of use.** Names, types, and structure should - make intent clear. Keep high-level orchestration clear, move low-level - mechanics into well-named helpers when helpful, and put critical code before - helper details when local conventions allow it. -- **Comment cautiously.** Code should be clear and be the source of truth - for what happens, how it happens, and why; use comments only when the why is - not obvious from the code. First ask whether better names, clearer structure, - or simpler code can explain the intent without a comment. (Apply this guidance - to new comments only; do not rewrite or delete existing comments.) -- **Scale documentation to the API.** Higher-level and user-visible APIs deserve - useful docstrings, including examples when helpful. Lower-level internals need - docstrings only when names, types, and structure are not enough. -- **Validate at boundaries.** Check user input, files, network responses, and - external API results at the edge. Keep internal code simple by trusting types - and invariants instead of repeatedly checking for impossible states. -- **Remove touched dead code.** Delete unused imports, unreachable branches, - obsolete placeholders, stale TODOs, and debug code when they are part of the - behavior you are already touching. -- **Use workspace-relative paths.** Use relative paths in commands and file - references unless an absolute path is needed to disambiguate. +- **Prefer simple, surgical changes.** Touch only what the task requires. Avoid speculative + refactors, broad rewrites, and "while we're here" cleanups. +- **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain. + Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers, + and treat heavy branching as a signal to reconsider the design. +- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding. Use + existing helpers, base classes, registries, and plugins when they fit. Keep scope limited to known + cases; don't add complexity for speculative future needs. +- **Respect inheritance.** Parent abstractions should contain shared contracts and + shared behavior, not child-specific special cases. +- **Don't repeat yourself.** Reuse existing helpers, APIs, and classes when possible. Prefer reusing or + generalizing existing code when it keeps the design simpler. If the same logic or intent appears elsewhere, + consolidate it. +- **Comment cautiously.** Comments should add context, not translate code into English. + Prefer making the code self-explanatory first. Use comments only for non-obvious + intent or constraints that remain unclear from the code. Apply this guidance to new + comments only; do not rewrite or delete existing comments just for style. +- **Document public APIs.** Public and higher-level APIs should have docstrings, including examples when useful. + Internal helpers should usually be self-documenting through clear names and structure. +- **Fix the bug cause, not the side effect.** For bug fixes, find the root cause instead of patching for its side effect. +- **Validate external input once.** Check types and values at the interface boundary. Internal code can trust those + checks and avoid redundant assertions. +- **Remove dead code.** Delete unused imports, unreachable branches, and obsolete helpers. +- **Use relative paths** from the repo root in commands and file references. ## Testing @@ -59,21 +43,17 @@ and elegance. ## Performant AI Code -- **Avoid stray CPU-GPU syncs.** Tensor metadata such as `tensor.shape` is safe - to read, but scalar extraction or CPU transfers such as `tensor.item()`, - `float(tensor)`, `bool(tensor)`, `tensor.cpu()`, `tensor.numpy()`, etc. can - force CPU-GPU synchronization. Keep computation on GPU unless the CPU actually - needs the value. -- **Use rank-aware logging.** Default to `print_rank_0` instead of `print` and - `warn_rank_0` instead of generic warnings. Use per-rank output only when each - process needs to report distinct state. Generic prints and warnings clog - distributed logs. -- **Respect distributed invariants.** Avoid hidden synchronization, global state, - per-rank file races, or assumptions that only hold on a single process. +- **Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs.** Reading metadata such as `tensor.shape` is fine. + Avoid Python scalar extraction and operators such as `tensor.item()`, `float(tensor)`, or `min(tensor)` because they + can trigger CPU-GPU syncs. Use PyTorch tensor ops such as `tensor.min()` by default, and only extract Python scalars + when the CPU needs the value. Tensor-value-based Python branching can also break CUDA graphs. +- **Develop with distributed processing in mind.** Examples: Use `print_rank_0` or `warn_rank_0` + when possible to avoid noisy logs. Guard shared side effects, such as + file writes or shared state updates, against race conditions between ranks. ## Compatibility -- **Preserve config and checkpoint compatibility.** Treat ModelOpt config schemas - and checkpoint formats as persisted contracts. When changing configs such as - `QuantizeConfig`, maintain backward compatibility with previous ModelOpt - checkpoints unless a breaking change is explicit and intentionally handled. +- **Preserve config and checkpoint backward compatibility.** ModelOpt checkpoints include serialized + `ModeloptBaseConfig` instances such as `QuantizeConfig`. If these Pydantic-based configs change + without backward compatibility handling, older checkpoints may no longer load. Make breaking changes + explicit and intentional. From 6419d7739e7d5442f28d56c9dbb5c7c055e7c54d Mon Sep 17 00:00:00 2001 From: realAsma Date: Thu, 14 May 2026 16:45:33 +0000 Subject: [PATCH 4/8] docs: address agent guidance review follow-ups Signed-off-by: realAsma --- .github/workflows/claude_review.yml | 3 ++- CONTRIBUTING.md | 4 ++-- README.md | 3 +-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/.github/workflows/claude_review.yml b/.github/workflows/claude_review.yml index 69119089fee..b62985e24e0 100644 --- a/.github/workflows/claude_review.yml +++ b/.github/workflows/claude_review.yml @@ -81,7 +81,8 @@ jobs: Mandatory workflow — never skip or reorder: 1. Read the PR diff first (gh pr diff). - 2. Read CLAUDE.md and CONTRIBUTING.md for project conventions and architecture. + 2. Read CLAUDE.md (symlinked to .agents/README.md), .agents/developer-guidelines.md, + and CONTRIBUTING.md for project conventions, coding principles, and architecture. 3. For changed files under `modelopt/torch//`, read the sub-package's `__init__.py` plus any `mode.py` / `config.py` to understand mode registration and config schema. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a1f2d114351..f7debbbc6ee 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -95,10 +95,10 @@ For lightweight focused local validation, run `pytest` directly on the relevant pytest tests/unit/torch/quantization ``` -For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s `. The `partial_unit-3.12(torch)` session covers the broader torch unit test suite and installs heavier dependencies, including `megatron-core`: +For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s `. The `unit-3.12(torch_211, tf_latest)` session runs `tests/unit` with a specific Torch and Transformers combination: ```bash -nox -s "partial_unit-3.12(torch)" +nox -s "unit-3.12(torch_211, tf_latest)" ``` ## ✍️ Signing your work diff --git a/README.md b/README.md index cfd6c0adb3e..6a4f023e4f0 100644 --- a/README.md +++ b/README.md @@ -153,8 +153,7 @@ Please read our [Contributing](./CONTRIBUTING.md) guidelines for details on how ## AI Agents -For AI-assisted development setup, including local Claude Code and Codex -override files, see the [agent tooling notes](./.agents/TOOLING.md). +For AI-assisted development setup, see the [agent tooling notes](./.agents/TOOLING.md). ### Top Contributors From 9cdcc1a7963bdaf3b81553b8a22d2ca319daf559 Mon Sep 17 00:00:00 2001 From: realAsma Date: Thu, 14 May 2026 17:36:08 +0000 Subject: [PATCH 5/8] docs: refine agent entrypoint guidance Signed-off-by: realAsma --- .agents/developer-guidelines.md | 13 +++++------ AGENTS.md | 40 ++++++++++++++++++++++++++++++++- CLAUDE.md | 2 +- 3 files changed, 46 insertions(+), 9 deletions(-) mode change 120000 => 100644 AGENTS.md diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md index 8034c531165..d9fbbed2ecd 100644 --- a/.agents/developer-guidelines.md +++ b/.agents/developer-guidelines.md @@ -10,14 +10,13 @@ and conciseness. - **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain. Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers, and treat heavy branching as a signal to reconsider the design. -- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding. Use - existing helpers, base classes, registries, and plugins when they fit. Keep scope limited to known - cases; don't add complexity for speculative future needs. -- **Respect inheritance.** Parent abstractions should contain shared contracts and +- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding. + Use existing extension points when they fit. If none fit, add a simple, focused helper, + class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases. +- **Respect inheritance boundaries.** Parent abstractions should define shared contracts and shared behavior, not child-specific special cases. -- **Don't repeat yourself.** Reuse existing helpers, APIs, and classes when possible. Prefer reusing or - generalizing existing code when it keeps the design simpler. If the same logic or intent appears elsewhere, - consolidate it. +- **Don't repeat yourself.** Consolidate repeated logic or intent with a shared helper, API, + or abstraction when doing so keeps the design simpler. - **Comment cautiously.** Comments should add context, not translate code into English. Prefer making the code self-explanatory first. Use comments only for non-obvious intent or constraints that remain unclear from the code. Apply this guidance to new diff --git a/AGENTS.md b/AGENTS.md deleted file mode 120000 index fa67a2d3e2a..00000000000 --- a/AGENTS.md +++ /dev/null @@ -1 +0,0 @@ -.agents/README.md \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000000..2ca840f8a5d --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,39 @@ +# Agent Instructions for ModelOpt + +These instructions apply to AI-assisted work in this repository. + +## Repository orientation + +- Start with `README.md` for project overview and install. +- Use `modelopt/` for source, `tests/` for focused test coverage, and + `examples/` or `docs/` for usage patterns. + +## Coding guidelines + +- **Coding guide:** Code development and review require reading and following + [.agents/developer-guidelines.md](/home/scratch.akuriparambi_coreai/Model-Optimizer-Agent/.agents/developer-guidelines.md); + do not skip this step. + +## Iterative development + +- **Running tests:** Follow the + [writing and running tests](CONTRIBUTING.md#-writing-and-running-tests) + instructions. For fast initial iteration, choose focused tests for the + changed area from `tests/`. +- **Running pre-commit:** Follow the + [pre-commit hook instructions](CONTRIBUTING.md#pre-commit-hooks). Hooks may + modify files; review and re-stage those changes before committing. +- **Signed commit:** Use `git commit -s -S -m ""` for commits so they + follow the [signing your work](CONTRIBUTING.md#-signing-your-work) + requirements. +- **Never `git push` without explicit approval in the current turn.** Commit + locally is fine; publishing to a remote is not. +- After `git commit`, stop and wait for the user to say "push", "publish", + "ship", or equivalent before running `git push`, `gh pr create`, or any + push-option flags like `-o merge_request.create`. + +## Contributing and PR readiness + +- Before opening or marking a PR ready for review, read the + [submitting your code](CONTRIBUTING.md#submitting-your-code) guidance. +- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist. diff --git a/CLAUDE.md b/CLAUDE.md index fa67a2d3e2a..47dc3e3d863 120000 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1 +1 @@ -.agents/README.md \ No newline at end of file +AGENTS.md \ No newline at end of file From 3cfb9a4a772f9e82052f704e8f7378fdf4f31f5e Mon Sep 17 00:00:00 2001 From: realAsma Date: Thu, 14 May 2026 17:42:47 +0000 Subject: [PATCH 6/8] docs: use relative agent guide link Signed-off-by: realAsma --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 2ca840f8a5d..3000fce922b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -11,7 +11,7 @@ These instructions apply to AI-assisted work in this repository. ## Coding guidelines - **Coding guide:** Code development and review require reading and following - [.agents/developer-guidelines.md](/home/scratch.akuriparambi_coreai/Model-Optimizer-Agent/.agents/developer-guidelines.md); + [.agents/developer-guidelines.md](.agents/developer-guidelines.md); do not skip this step. ## Iterative development From 1bf4c0123f8165920814304b372fa8406ada5576 Mon Sep 17 00:00:00 2001 From: realAsma Date: Thu, 14 May 2026 17:53:54 +0000 Subject: [PATCH 7/8] docs: align agent entrypoint docs Signed-off-by: realAsma --- .agents/README.md | 38 ----------------------------- .agents/TOOLING.md | 5 ++-- .agents/developer-guidelines.md | 4 +-- .github/workflows/claude_review.yml | 2 +- 4 files changed, 5 insertions(+), 44 deletions(-) delete mode 100644 .agents/README.md diff --git a/.agents/README.md b/.agents/README.md deleted file mode 100644 index c2b6727c582..00000000000 --- a/.agents/README.md +++ /dev/null @@ -1,38 +0,0 @@ -# Agent Instructions for ModelOpt - -These instructions apply to AI-assisted work in this repository. - -## Repository orientation - -- Start with `README.md` for project overview and install. -- Use `modelopt/` for source, `tests/` for focused test coverage, and - `examples/` or `docs/` for usage patterns. - -## Coding guidelines - -- **Coding guide:** Code development and review require reading and following - `.agents/developer-guidelines.md`; do not skip this step. - -## Iterative development - -- **Running tests:** Follow the - [writing and running tests](../CONTRIBUTING.md#-writing-and-running-tests) - instructions. For fast initial iteration, choose focused tests for the - changed area from `tests/`. -- **Running pre-commit:** Follow the - [pre-commit hook instructions](../CONTRIBUTING.md#pre-commit-hooks). Hooks may - modify files; review and re-stage those changes before committing. -- **Signed commit:** Use `git commit -s -S -m ""` for commits so they - follow the [signing your work](../CONTRIBUTING.md#-signing-your-work) - requirements. -- **Never `git push` without explicit approval in the current turn.** Commit - locally is fine; publishing to a remote is not. -- After `git commit`, stop and wait for the user to say "push", "publish", - "ship", or equivalent before running `git push`, `gh pr create`, or any - push-option flags like `-o merge_request.create`. - -## Contributing and PR readiness - -- Before opening or marking a PR ready for review, read the - [submitting your code](../CONTRIBUTING.md#submitting-your-code) guidance. -- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist. diff --git a/.agents/TOOLING.md b/.agents/TOOLING.md index 20346f2e955..ecef150f4ec 100644 --- a/.agents/TOOLING.md +++ b/.agents/TOOLING.md @@ -5,9 +5,8 @@ of the always-loaded agent instructions. ## Shared Instructions -Update `.agents/README.md` for repository-wide agent instructions. The root -`AGENTS.md` and `CLAUDE.md` files are symlinked to `.agents/README.md`, so -changes there apply to both Codex and Claude Code. +Update `AGENTS.md` for repository-wide agent instructions. `CLAUDE.md` is +symlinked to `AGENTS.md`, so changes there apply to both Codex and Claude Code. ## Local Overrides diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md index d9fbbed2ecd..6fc5e7e980f 100644 --- a/.agents/developer-guidelines.md +++ b/.agents/developer-guidelines.md @@ -15,8 +15,8 @@ and conciseness. class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases. - **Respect inheritance boundaries.** Parent abstractions should define shared contracts and shared behavior, not child-specific special cases. -- **Don't repeat yourself.** Consolidate repeated logic or intent with a shared helper, API, - or abstraction when doing so keeps the design simpler. +- **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API, + or abstraction when doing so keeps the design simpler. Avoid parallel implementations that can drift out of sync. - **Comment cautiously.** Comments should add context, not translate code into English. Prefer making the code self-explanatory first. Use comments only for non-obvious intent or constraints that remain unclear from the code. Apply this guidance to new diff --git a/.github/workflows/claude_review.yml b/.github/workflows/claude_review.yml index b62985e24e0..a41c7571acc 100644 --- a/.github/workflows/claude_review.yml +++ b/.github/workflows/claude_review.yml @@ -81,7 +81,7 @@ jobs: Mandatory workflow — never skip or reorder: 1. Read the PR diff first (gh pr diff). - 2. Read CLAUDE.md (symlinked to .agents/README.md), .agents/developer-guidelines.md, + 2. Read AGENTS.md, .agents/developer-guidelines.md, and CONTRIBUTING.md for project conventions, coding principles, and architecture. 3. For changed files under `modelopt/torch//`, read the sub-package's `__init__.py` plus any `mode.py` / `config.py` to understand mode registration From 283f5170c7db95edb1b8c2098a508efa61d54800 Mon Sep 17 00:00:00 2001 From: realAsma Date: Thu, 14 May 2026 17:57:43 +0000 Subject: [PATCH 8/8] docs: clarify single-source guidance Signed-off-by: realAsma --- .agents/developer-guidelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md index 6fc5e7e980f..10b3d901499 100644 --- a/.agents/developer-guidelines.md +++ b/.agents/developer-guidelines.md @@ -16,7 +16,7 @@ and conciseness. - **Respect inheritance boundaries.** Parent abstractions should define shared contracts and shared behavior, not child-specific special cases. - **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API, - or abstraction when doing so keeps the design simpler. Avoid parallel implementations that can drift out of sync. + or abstraction when doing so keeps the design simpler. Avoid duplication that can drift out of sync. - **Comment cautiously.** Comments should add context, not translate code into English. Prefer making the code self-explanatory first. Use comments only for non-obvious intent or constraints that remain unclear from the code. Apply this guidance to new