support quarot/spinquant rotation before quantization#1797
support quarot/spinquant rotation before quantization#1797lkk12014402 wants to merge 25 commits into
Conversation
Signed-off-by: lkk12014402 <kaokao.lv@intel.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR introduces SpinQuant/QuaRot rotation support as a first-class “transform” in AutoRound, enabling orthogonal rotations (R1–R4) to be applied before quantization via unified config normalization and BaseRotation registry dispatch.
Changes:
- Adds a new
spinquanttransform package (config, preprocessor, online hook/monkeypatch utilities, and optional training helpers/trainer). - Extends rotation config normalization and AutoRound’s new-arch entrypoints to accept
"quarot"/"spinquant"shorthands and SpinQuant configs/dicts. - Adds CUDA tests covering config normalization, registry integration, hook lifecycle, and end-to-end pipeline integration.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
test/test_cuda/transform/test_spinquant.py |
Adds CUDA tests for SpinQuant/QuaRot config normalization, hook behavior, and pipeline integration. |
auto_round/compressors_new/spinquant_mixin.py |
Introduces a deprecated compatibility mixin that forwards to the unified rotation pipeline. |
auto_round/compressors_new/entry.py |
Extends config resolution / rotation_config handling to accept SpinQuant configs and "quarot"/"spinquant" shorthands. |
auto_round/algorithms/transforms/spinquant/training.py |
Adds experimental SpinQuant training helpers (hooks, callbacks, optimizer/loss utilities, state). |
auto_round/algorithms/transforms/spinquant/training_core.py |
Adds shared primitives for loss computation, reference-model cloning, optimizer creation, and a common training loop. |
auto_round/algorithms/transforms/spinquant/trainer.py |
Adds an experimental “Trainer-like” interface for SpinQuant training + fusion + checkpointing. |
auto_round/algorithms/transforms/spinquant/rotation_utils.py |
Adds SpinQuant rotation math utilities (Hadamard construction, fusion helpers, wrappers). |
auto_round/algorithms/transforms/spinquant/preprocessor.py |
Implements the main SpinQuant/QuaRot preprocessing pipeline (init/train/fuse/hooks/cleanup). |
auto_round/algorithms/transforms/spinquant/monkeypatch.py |
Adds monkeypatch mechanism to wrap RoPE application for R3 (Q/K rotation after RoPE). |
auto_round/algorithms/transforms/spinquant/inplace/apply.py |
Adds in-place hook registration/removal and a convenience “apply in place” entrypoint. |
auto_round/algorithms/transforms/spinquant/inplace/__init__.py |
Exposes the in-place SpinQuant APIs. |
auto_round/algorithms/transforms/spinquant/cayley_optimizer.py |
Adds/ports the Cayley/SGDG optimizer and a combined Adam+SGDG optimizer. |
auto_round/algorithms/transforms/spinquant/algorithm.py |
Registers the SpinQuant rotation as a BaseRotation algorithm ("spinquant"). |
auto_round/algorithms/transforms/spinquant/__init__.py |
Exposes SpinQuant public API surface and documents feature status/limitations. |
auto_round/algorithms/transforms/base.py |
Ensures the BaseRotation registry imports rotation and spinquant. |
auto_round/algorithms/transforms/__init__.py |
Extends rotation config normalization to dispatch spinquant and string shorthands. |
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
for more information, see https://pre-commit.ci
Qwen/Qwen3-0.6BRTN: Average Accuracy (across 4 tasks: hellaswag, piqa, winogrande, lambada_openai)
tuning (iters=200): Average Accuracy (across 4 tasks: hellaswag, piqa, winogrande, lambada_openai)
|
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
|
Please sync with Heng and try to port it to the new architecture, which uses block-wise quantization. Otherwise, it will be difficult to support multiple algorithms simultaneously since it consumes a large amount of RAM. |
the current implementation supports block-wise quantization |
Nice, then it's better to align with the code, e.g., the transformation should inherit this class |
the code is for quantization? the pull request is for rotation, which should aligh with this code https://github.com/intel/auto-round/blob/main/auto_round/algorithms/transforms/base.py#L51? right? |
|
all algorithms should be compatible with base_compressor or base quantizer for easily scheduling different algorithms. If it's hard to align with them, at least it should support this feature, apply_rotation+apply_awq+apply_autoround for each block to save ram and be aware of different algorihtms, instead of running apply rotation for the whole model and then apply_awq and so on |
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
for more information, see https://pre-commit.ci
| lr: float = 1e-4, | ||
| stiefel: bool = True, | ||
| momentum: float = 0.0, | ||
| weight_decay: float = 0.0, |
There was a problem hiding this comment.
1 For the accuracy-related part, at minimum the Hadamard rotation case should be included.
2 The accuracy impact should also be properly documented. I left a similar comment on your Hadamard PR, but I still haven’t seen the corresponding documentation yet.
3 SpinQuant should be updated to follow the new architecture and switch to the block-wise implementation approach.
There was a problem hiding this comment.
answer your comment 1
The SpinQuant algorithm is still in the experimental stage in this release and has not yet been fully validated for accuracy. There is already a note in the code indicating that enabling the training of rotation matrices is part of this experimental phase.
As we know, apart from the training component, the QuaRot algorithm shares the same rotation structure as SpinQuant, including R1/R2/R3/R4. In this PR, we mainly support QuaRot combined with quantization (AutoRound, RTN, and tuning). The implementation of QuaRot’s R1/R2/R3/R4 is largely aligned with community implementations such as Quark and LLMC.
answer your comment 2
I didn’t notice the “accuracy impact” comment you mentioned—could you explain it a bit more?
answer your comment 3:
In fact, I have a block-wise rotation implementation that can be combined with block-wise quantization in AutoRound. However, I need to wait until @n1ck-guo finishes the API, after which I will submit another PR to add block-wise rotation support
There was a problem hiding this comment.
1 As far as I know, QuaRot + AutoRound has already been supported by your previous PR. I also noticed that you are continuing to fix bugs, and since the implementation is largely self-contained, I don’t think we need to rush merging this PR. It would be better to further refine it to a more product-level quality and provide more accuracy results to demonstrate either better accuracy than Hadamard rotation or comparable accuracy against other repositories.
2 We should also document benchmark and document data so users can clearly understand the accuracy improvements, computational cost, and potential side effects.
3 Feel free to handle this in a separate PR.
Signed-off-by: lkk12014402 <kakao.lv@intel.com>
|
@copilot resolve the merge conflicts in this pull request |
Description
What Problem Does Rotation Solve?
Quantization accuracy degrades when weight/activation distributions have outlier channels —
a few dimensions with magnitudes 10–100× larger than the rest. Rotation applies an orthogonal
transform (Hadamard matrix) to redistribute these outliers uniformly across all channels, making
the distribution more quantization-friendly.
Since orthogonal transforms preserve mathematical equivalence (Q @ Q^T = I), the model's
FP16 output is unchanged — only quantization behavior improves.
QuaRot vs SpinQuant
In auto-round, both share the same codebase. The difference is a config flag:
trainable_rotation=False→ QuaRottrainable_rotation=True→ SpinQuantQuick Start