Awq algorithm#1749
Conversation
…uo/new_ar_arch
…uo/new_ar_arch
…uo/new_ar_arch
Signed-off-by: n1ck-guo <heng.guo@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: n1ck-guo <heng.guo@intel.com>
…ound into hengguo/new_ar_arch
…uo/new_ar_arch
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
…uo/new_ar_arch
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
…uo/new_ar_arch
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>
…nto awq_algorithm
for more information, see https://pre-commit.ci
…nto awq_algorithm
for more information, see https://pre-commit.ci
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@copilot resolve the merge conflicts in this pull request |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (1)
auto_round/compressors_new/base.py:1
format.save_quantized(...)is being called with a newalgorithm=kwarg. Unless everyformatimplementation’ssave_quantizedsignature (and any downstream wrappers) has been updated to accept this parameter, this will raise aTypeErrorat runtime for unrelated formats. Consider either (a) updating the base format interface to acceptalgorithm: str | None = Noneeverywhere, or (b) only passingalgorithmwhen the target format explicitly supports it (e.g., by feature-detecting viainspect.signature, or by routing throughkwargswith a known-safe key for that formatter).
# Copyright (c) 2026 Intel Corporation
Co-authored-by: WeiweiZhang1 <109071285+WeiweiZhang1@users.noreply.github.com>
You’re right. The AR time shown there is slower than expected; that number was from a non-optimized run and doesn’t reflect the expected <1200s (no compile) / <800s (with torch.compile) target. I’ll rerun and refresh the benchmark numbers accordingly.
It’s W8A8 (fake quant), not W4A16. The benchmark section header is
Resolved by merging |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>
for more information, see https://pre-commit.ci
| # Cohere / Command-R | ||
| "CohereForCausalLM": _cohere_mappings, | ||
| "Cohere2ForCausalLM": _cohere_mappings, | ||
| "Cohere2VisionForConditionalGeneration": _cohere_mappings, |
There was a problem hiding this comment.
is this mapping same with llmc or gptqmodel. As we may have no time to update this file, so it would be better we leverage a same file used in these repo
| from auto_round.logger import logger | ||
|
|
||
|
|
||
| class AWQQuantizer(RTNQuantizer): |
There was a problem hiding this comment.
why inherits RTNQuantizers here,
| @@ -225,6 +227,10 @@ def __new__( | |||
| if isinstance(quant_config, SignRoundConfig): | |||
| return _get_compressor_class(model_type, CalibCompressor)(alg_configs, **local_args, **kwargs) | |||
There was a problem hiding this comment.
@n1ck-guo It would be better to use a registry or something similar. An algorithm developer should only need to care about the code inside their own alg folder.
| act_sym=act_sym, | ||
| act_data_type=act_data_type, | ||
| act_dynamic=act_dynamic, | ||
| duo_scaling=duo_scaling, |
| type=str.lower, | ||
| choices=["auto_round", "rtn", "awq"], | ||
| help="Quantization algorithm to use. " | ||
| "auto_round: SignSGD-based optimization (default when iters > 0). " |
There was a problem hiding this comment.
how to set multlple algorithm
| return | ||
|
|
||
| # Resolve mappings | ||
| self._resolved_mappings = resolve_mappings(model, self._user_mappings) |
There was a problem hiding this comment.
@n1ck-guo algorithm should control the act hook
| # subsequent mappings.) | ||
| seen_parents = set() | ||
| for mapping in block_mappings: | ||
| pid = id(mapping.parent) |
There was a problem hiding this comment.
where is the entry to quantize the layer
| self.n_grid = config.n_grid | ||
|
|
||
| # Populated during calibration | ||
| self._user_mappings = config.mappings |
There was a problem hiding this comment.
please add two more args, 1 eanble_minmax_tuning(this name could not change) 2 apply_smooth(feel free to give a better neam) to control the two alg parts in awq.
Description
Add AWQ algorithm support for the new architecture
Type of Change
New feature
Related Issues
Fixes or relates to #1469
Checklist Before Submitting
/azp run Unit-Test-CUDA-AutoRound.Benchmark Results (W8A8_INT8, group_size=128, sym=True, fake)
Still WIP for more results
Llama-3.1-8B-Instruct
Qwen3-8B