diff --git a/.agents/TOOLING.md b/.agents/TOOLING.md new file mode 100644 index 00000000000..ecef150f4ec --- /dev/null +++ b/.agents/TOOLING.md @@ -0,0 +1,17 @@ +# Agent Tooling Notes + +These notes are for humans maintaining repository agent setup. They are not part +of the always-loaded agent instructions. + +## Shared Instructions + +Update `AGENTS.md` for repository-wide agent instructions. `CLAUDE.md` is +symlinked to `AGENTS.md`, so changes there apply to both Codex and Claude Code. + +## Local Overrides + +For private local instructions, use the tool-specific override file: + +- Claude Code: `CLAUDE.local.md` is additive; it is read after `CLAUDE.md`. +- Codex: `AGENTS.override.md` replaces `AGENTS.md` in the same directory, so it + is not additive. Restate any shared instructions that should still apply. diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md new file mode 100644 index 00000000000..10b3d901499 --- /dev/null +++ b/.agents/developer-guidelines.md @@ -0,0 +1,58 @@ +# Coding Principles + +Guidelines for production code in ModelOpt. Key values: simplicity, modularity, +and conciseness. + +## Principles + +- **Prefer simple, surgical changes.** Touch only what the task requires. Avoid speculative + refactors, broad rewrites, and "while we're here" cleanups. +- **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain. + Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers, + and treat heavy branching as a signal to reconsider the design. +- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding. + Use existing extension points when they fit. If none fit, add a simple, focused helper, + class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases. +- **Respect inheritance boundaries.** Parent abstractions should define shared contracts and + shared behavior, not child-specific special cases. +- **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API, + or abstraction when doing so keeps the design simpler. Avoid duplication that can drift out of sync. +- **Comment cautiously.** Comments should add context, not translate code into English. + Prefer making the code self-explanatory first. Use comments only for non-obvious + intent or constraints that remain unclear from the code. Apply this guidance to new + comments only; do not rewrite or delete existing comments just for style. +- **Document public APIs.** Public and higher-level APIs should have docstrings, including examples when useful. + Internal helpers should usually be self-documenting through clear names and structure. +- **Fix the bug cause, not the side effect.** For bug fixes, find the root cause instead of patching for its side effect. +- **Validate external input once.** Check types and values at the interface boundary. Internal code can trust those + checks and avoid redundant assertions. +- **Remove dead code.** Delete unused imports, unreachable branches, and obsolete helpers. +- **Use relative paths** from the repo root in commands and file references. + +## Testing + +- **Develop with focused tests.** During development, write as many focused + tests as needed, including lower-level unit tests or internal probes, to + understand and harden behavior. +- **Curate production tests and keep them lean.** Before staging or committing, + decide which tests should be checked in. Checked-in tests should document + expected behavior, protect against regressions, or flag backward-incompatible + behavior changes. Remove redundant lower-level tests when a higher-level test + already covers the same behavior, keeping CI/CD fast and lean. + +## Performant AI Code + +- **Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs.** Reading metadata such as `tensor.shape` is fine. + Avoid Python scalar extraction and operators such as `tensor.item()`, `float(tensor)`, or `min(tensor)` because they + can trigger CPU-GPU syncs. Use PyTorch tensor ops such as `tensor.min()` by default, and only extract Python scalars + when the CPU needs the value. Tensor-value-based Python branching can also break CUDA graphs. +- **Develop with distributed processing in mind.** Examples: Use `print_rank_0` or `warn_rank_0` + when possible to avoid noisy logs. Guard shared side effects, such as + file writes or shared state updates, against race conditions between ranks. + +## Compatibility + +- **Preserve config and checkpoint backward compatibility.** ModelOpt checkpoints include serialized + `ModeloptBaseConfig` instances such as `QuantizeConfig`. If these Pydantic-based configs change + without backward compatibility handling, older checkpoints may no longer load. Make breaking changes + explicit and intentional. diff --git a/.github/workflows/claude_review.yml b/.github/workflows/claude_review.yml index 69119089fee..a41c7571acc 100644 --- a/.github/workflows/claude_review.yml +++ b/.github/workflows/claude_review.yml @@ -81,7 +81,8 @@ jobs: Mandatory workflow — never skip or reorder: 1. Read the PR diff first (gh pr diff). - 2. Read CLAUDE.md and CONTRIBUTING.md for project conventions and architecture. + 2. Read AGENTS.md, .agents/developer-guidelines.md, + and CONTRIBUTING.md for project conventions, coding principles, and architecture. 3. For changed files under `modelopt/torch//`, read the sub-package's `__init__.py` plus any `mode.py` / `config.py` to understand mode registration and config schema. diff --git a/.gitignore b/.gitignore index 09a61233b6a..66ce5568ee0 100644 --- a/.gitignore +++ b/.gitignore @@ -61,6 +61,8 @@ venv/ # Ignore claude local settings .claude/settings.local.json +CLAUDE.local.md +AGENTS.override.md # Ignore SonarQube analysis .sonar/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000000..3000fce922b --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,39 @@ +# Agent Instructions for ModelOpt + +These instructions apply to AI-assisted work in this repository. + +## Repository orientation + +- Start with `README.md` for project overview and install. +- Use `modelopt/` for source, `tests/` for focused test coverage, and + `examples/` or `docs/` for usage patterns. + +## Coding guidelines + +- **Coding guide:** Code development and review require reading and following + [.agents/developer-guidelines.md](.agents/developer-guidelines.md); + do not skip this step. + +## Iterative development + +- **Running tests:** Follow the + [writing and running tests](CONTRIBUTING.md#-writing-and-running-tests) + instructions. For fast initial iteration, choose focused tests for the + changed area from `tests/`. +- **Running pre-commit:** Follow the + [pre-commit hook instructions](CONTRIBUTING.md#pre-commit-hooks). Hooks may + modify files; review and re-stage those changes before committing. +- **Signed commit:** Use `git commit -s -S -m ""` for commits so they + follow the [signing your work](CONTRIBUTING.md#-signing-your-work) + requirements. +- **Never `git push` without explicit approval in the current turn.** Commit + locally is fine; publishing to a remote is not. +- After `git commit`, stop and wait for the user to say "push", "publish", + "ship", or equivalent before running `git push`, `gh pr create`, or any + push-option flags like `-o merge_request.create`. + +## Contributing and PR readiness + +- Before opening or marking a PR ready for review, read the + [submitting your code](CONTRIBUTING.md#submitting-your-code) guidance. +- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist. diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 346ac17eb5f..00000000000 --- a/CLAUDE.md +++ /dev/null @@ -1,133 +0,0 @@ -# CLAUDE.md - -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. - -NVIDIA Model Optimizer (ModelOpt): open-source library for model optimization techniques including -quantization, pruning, distillation, sparsity, and speculative decoding to accelerate inference. -Primarily Python codebase with optional C++/CUDA extensions supporting PyTorch, ONNX, and Hugging Face/Megatron models. - -> If a `CLAUDE.local.md` file exists alongside this file, read and respect it — it contains -> developer-specific overrides that supplement this shared guidance. - -## Rules (Read First) - -**CRITICAL (YOU MUST):** - -- NVIDIA Apache 2.0 license header on ALL new Python/C++/CUDA files — use the SPDX format from `LICENSE_HEADER` (auto-inserted by pre-commit for most files, but must be added manually for files copied from third-party sources, which are excluded from the hook) -- `git commit -s -S` (DCO sign-off + cryptographic signing required). Never attribute AI tools in - sign-off line -- `pre-commit` hooks run on commit — if files are modified by hooks, re-stage and commit again -- PRs require CODEOWNERS review (auto-assigned based on `.github/CODEOWNERS`) -- When creating PRs (`gh pr create`), fill in `.github/PULL_REQUEST_TEMPLATE.md` verbatim — do NOT substitute the harness's default `## Summary` / `## Test plan` format -- For non-trivial PRs, run `/claude review` to get Claude approval before merging (NVIDIA org members can self-trigger; orthogonal to CodeRabbit) -- After rebasing, always re-run tests locally before pushing -- All code must follow the security guidelines in `SECURITY.md` — violations are blocked as pre-merge errors -- For contribution guidelines, commit conventions, and PR requirements, see `CONTRIBUTING.md` -- New PIP dependencies require license verification — non-permissive licenses need justification and approval from `@NVIDIA/modelopt-setup-codeowners` - -## Common Commands - -| Task | Command | -|------|---------| -| Install (editable + dev) | `pip install -e ".[dev]"` | -| Enable pre-commit hooks | `pre-commit install` | -| CPU unit tests | `python -m pytest tests/unit` | -| GPU unit tests | `python -m pytest tests/gpu` | -| Megatron GPU tests | `python -m pytest tests/gpu_megatron` | -| TRT-LLM GPU tests | `python -m pytest tests/gpu_trtllm` | -| Single test file | `python -m pytest tests/unit/torch/quantization/test_quant_config.py` | -| Pattern match | `pytest tests/unit -k "test_quantize"` | -| Lint + format (all files) | `pre-commit run --all-files` | -| Lint (diff only) | `pre-commit run --from-ref origin/main --to-ref HEAD` | -| Run via nox (CPU unit) | `nox -s "unit-3.12(torch_211, tf_latest)"` | -| Build docs | `nox -s docs` | -| Build wheel | `nox -s build_wheel` | - -## Architecture - -ModelOpt code base is organized into four top-level namespaces: - -| Namespace | Path | Role | -|-----------|------|------| -| `modelopt.torch` | `modelopt/torch/` | Core PyTorch optimization library | -| `modelopt.onnx` | `modelopt/onnx/` | ONNX model quantization and export | -| `modelopt.deploy` | `modelopt/deploy/` | Deployment utilities for LLMs | -| `modelopt.recipe` | `modelopt/recipe/` | Recipe loading, parsing, and validation infrastructure | - -### `modelopt.torch` Sub-packages - -| Sub-package | Path | Role | -|-------------|------|------| -| `opt` | `modelopt/torch/opt/` | Core optimization infrastructure (modes, config, state dicts) | -| `quantization` | `modelopt/torch/quantization/` | PTQ, QAT, and quantization-aware algorithms | -| `prune` | `modelopt/torch/prune/` | Structured and unstructured pruning | -| `distill` | `modelopt/torch/distill/` | Knowledge distillation | -| `sparsity` | `modelopt/torch/sparsity/` | Weight and activation sparsity | -| `speculative` | `modelopt/torch/speculative/` | Speculative decoding (Medusa, EAGLE, etc.) | -| `nas` | `modelopt/torch/nas/` | Neural architecture search | -| `export` | `modelopt/torch/export/` | Checkpoint export for TRT-LLM / Megatron | -| `peft` | `modelopt/torch/peft/` | QLoRA and PEFT integration | -| `kernels` | `modelopt/torch/kernels/` | Custom CUDA/Triton kernels grouped by role: `common/attention` (baseline Triton FA), `quantization/{conv,gemm}` (implicit-GEMM CUDA + tensor-quant C++/CUDA + fp4/fp8 Triton), `sparsity/attention` (skip-softmax / N:M / diffusers+LTX backends) | -| `_deploy` | `modelopt/torch/_deploy/` | Internal deployment utilities | -| `utils` | `modelopt/torch/utils/` | Shared utilities and plugin infrastructure | - -### Core Abstraction: Modes - -A **mode** is the unit of model optimization in ModelOpt. Each algorithm (quantization, pruning, -etc.) is implemented as one or more modes. Modes are recorded in the model's `modelopt_state` so -optimization workflows can be composed, saved, and restored. - -The main entry points are in `modelopt/torch/opt/conversion.py`: -- `apply_mode(model, mode, ...)` — applies an optimization mode to a model -- `restore(model, ...)` — restores a model to a previously saved optimization state -- `save(model, ...)` / `modelopt_state(model)` — captures the current optimization state - -### Core Abstraction: Recipes - -A **recipe** is a declarative YAML specification of an optimization configuration. Recipes decouple optimization specs from code, enabling reuse, sharing, and version control. - -**Built-in recipes** (`modelopt_recipes/`): - -- `general/ptq/` — general-purpose PTQ recipes -- `configs/` — shared configuration units referenced by recipes - -## Key Files - -| File | Role | -|------|------| -| `modelopt/torch/opt/mode.py` | Base class for all optimization modes | -| `modelopt/torch/opt/config.py` | Configuration system for modes | -| `modelopt/torch/opt/conversion.py` | `apply_mode()` / `restore()` entry points | -| `modelopt/torch/quantization/__init__.py` | PTQ/QAT public API | -| `modelopt/torch/export/unified_export_hf.py` | Unified HF checkpoint export | -| `modelopt/torch/export/model_config_export.py` | TRT-LLM model config export | -| `modelopt/deploy/llm/` | LLM deployment utilities | -| `modelopt/recipe/loader.py` | `load_recipe()` / `load_config()` public API | -| `modelopt/recipe/config.py` | Recipe Pydantic models (`ModelOptPTQRecipe`, `RecipeType`) | -| `modelopt_recipes/general/ptq/` | Built-in PTQ recipe YAML files | -| `pyproject.toml` | Optional dependency groups (`[onnx]`, `[hf]`, `[all]`, `[dev]`); ruff, mypy, pytest, bandit, and coverage config | -| `.pre-commit-config.yaml` | Pre-commit hooks (ruff, mypy, clang-format, license headers) | -| `noxfile.py` | Test session definitions | - -## Design Patterns - -| Pattern | Key Points | -|---------|------------| -| **Mode composition** | Optimization algorithms are composed as sequences of modes, each recorded in `modelopt_state` | -| **Plugin system** | Optional integrations (HuggingFace, Megatron, etc.) loaded lazily via `import_plugin()` | -| **Optional dependencies** | Features gated by install extras (`[onnx]`, `[hf]`, `[all]`); avoid hard imports at module level | -| **Config dataclasses** | Each mode has a typed config; use Pydantic or dataclass conventions | -| **State dict** | Models carry `modelopt_state` for checkpoint save/restore across optimization steps | -| **Declarative recipes** | YAML-based optimization specs in `modelopt_recipes/`; loaded via `load_recipe()`, passed to the model optimization system | - -## CI / Testing - -| Layer | Location | Notes | -|-------|----------|-------| -| CPU unit tests | `tests/unit/` | Fast, no GPU needed; run in pre-merge CI | -| GPU unit tests | `tests/gpu/` | Requires CUDA GPU | -| Megatron GPU tests | `tests/gpu_megatron/` | Requires Megatron-Core + GPU | -| TRT-LLM GPU tests | `tests/gpu_trtllm/` | Requires TensorRT-LLM + GPU | -| Example/integration tests | `tests/examples/` | Integration tests for examples; see `tests/examples/README.md` | -| Pre-commit / lint | `.pre-commit-config.yaml` | ruff, mypy, clang-format, license headers, bandit | -| Coverage | `pyproject.toml` | 70% minimum on `modelopt/*` | diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 00000000000..47dc3e3d863 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 53e879a7bf3..f7debbbc6ee 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -79,7 +79,7 @@ If you are an external contributor, seek guidance from `@NVIDIA/modelopt-setup-c See [`modelopt/torch/quantization/utils/calib_utils.py`](./modelopt/torch/quantization/utils/calib_utils.py) for an example of the correct license header format. -## 📝 Writing tests +## 📝 Writing and running tests We use [pytest](https://docs.pytest.org/) for all tests. For any new features / examples, make sure to add tests and that the coverage check in your PR passes. The tests are organized into the following directories: @@ -89,7 +89,17 @@ We use [pytest](https://docs.pytest.org/) for all tests. For any new features / - `tests/gpu_trtllm`: Fast GPU-based unit tests for the core ModelOpt library for TensorRT-LLM features. In most cases, they should not take more than a few seconds to run. - `tests/examples`: Integration tests for ModelOpt examples. They should not take more than a few minutes to run. Please refer to [example test README](./tests/examples/README.md) for more details. -Please refer to [noxfile.py](./noxfile.py) for more details on how to run the tests and their dependencies. +For lightweight focused local validation, run `pytest` directly on the relevant test path. For example: + +```bash +pytest tests/unit/torch/quantization +``` + +For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s `. The `unit-3.12(torch_211, tf_latest)` session runs `tests/unit` with a specific Torch and Transformers combination: + +```bash +nox -s "unit-3.12(torch_211, tf_latest)" +``` ## ✍️ Signing your work diff --git a/README.md b/README.md index ae17522c613..6a4f023e4f0 100644 --- a/README.md +++ b/README.md @@ -151,6 +151,10 @@ Model Optimizer follows a structured approach to managing deprecated features: Model Optimizer is now open source! We welcome any feedback, feature requests and PRs. Please read our [Contributing](./CONTRIBUTING.md) guidelines for details on how to contribute to this project. +## AI Agents + +For AI-assisted development setup, see the [agent tooling notes](./.agents/TOOLING.md). + ### Top Contributors [![Contributors](https://contrib.rocks/image?repo=NVIDIA/Model-Optimizer)](https://github.com/NVIDIA/Model-Optimizer/graphs/contributors)