[bug fix] model free: fix copy folder of diffusion model, fix unresponsive disable_model_free, and cuda UT#1809
[bug fix] model free: fix copy folder of diffusion model, fix unresponsive disable_model_free, and cuda UT#1809xin3he wants to merge 5 commits into
Conversation
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR fixes and improves “model-free” quantization behavior (especially for diffusion models), simplifies CLI routing by delegating model-free selection to AutoRound, and expands unit tests to cover metadata/subfolder copying and CUDA model block-name detection.
Changes:
- Refactors
auto_round/__main__.pyto remove explicit model-free routing and instead passmodel_free/disable_model_freethrough toAutoRound. - Improves diffusion model metadata copying in model-free mode to include non-transformer subcomponent directories while preserving the quantized
transformer/. - Adds/updates tests for diffusion/non-diffusion metadata copying and adjusts Gemma3 CUDA test expectations for tiny-model layer counts.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
auto_round/__main__.py |
Removes CLI-side model-free routing logic; forwards flags to AutoRound. |
auto_round/compressors/model_free.py |
Copies diffusion root metadata + subcomponent directories to output without overwriting quantized transformer. |
auto_round/eval/evaluation.py |
Moves model.eval() to evaluation entry point. |
test/test_cpu/quantization/test_model_free.py |
Adds tests for metadata/subfolder copying (diffusion + non-diffusion) and non-overwrite of quantized transformer. |
test/test_cuda/models/test_get_block_name.py |
Fixes Gemma3 tiny-model layer count expectations. |
README.md |
Documents updated model-free default behavior for --iters 0 --disable_opt_rtn. |
README_CN.md |
Chinese translation of the same model-free default documentation update. |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
is it possible to copy the folders regardless of whether it's model free or not |
These are different saving paths, I had enhanced auto-round/auto_round/utils/model.py Line 1742 in 5bbe39b |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
This pull request introduces several improvements and fixes related to model-free quantization, especially for diffusion models, and enhances test coverage for metadata copying. The most significant changes include updating the documentation to clarify model-free quantization defaults, refactoring how model-free mode is triggered in the main entry point, improving metadata and subdirectory copying for diffusion models, and expanding the test suite to ensure correct behavior. Additionally, a minor fix is made to a Gemma3 model test.
Model-free quantization improvements:
README.mdandREADME_CN.mdto clarify that model-free quantization is now the default when using--iters 0 --disable_opt_rtn, with links to relevant documentation. [1] [2]auto_round/__main__.pyto remove the explicit model-free routing logic, as model-free mode is now handled internally byAutoRound. The main entry point now simply passes the relevant flags. [1] [2]Diffusion model metadata handling:
_copy_metadata_filesmethod inauto_round/compressors/model_free.pyto copy both root-level files and all sub-component directories (e.g.,vae,scheduler,tokenizer) for diffusion models, ensuring the output directory is a faithful replica without overwriting the quantized transformer.Testing improvements:
test/test_cpu/quantization/test_model_free.pyto verify correct copying of metadata and subfolders for both standard and diffusion models, and to ensure the quantized transformer is not overwritten during the process.Evaluation and model handling:
model.eval()call from the main quantization routine to the evaluation phase to ensure the model is in evaluation mode only when needed. [1] [2]Minor fixes:
Type of Change
Bug fix
Related Issues
Fixes or relates to #1800, #1778
Checklist Before Submitting
/azp run Unit-Test-CUDA-AutoRound.