[bug fix] model free: fix copy folder of diffusion model, fix unresponsive disable_model_free, and cuda UT by xin3he · Pull Request #1809 · intel/auto-round

xin3he · 2026-05-13T07:38:31Z

Description

This pull request introduces several improvements and fixes related to model-free quantization, especially for diffusion models, and enhances test coverage for metadata copying. The most significant changes include updating the documentation to clarify model-free quantization defaults, refactoring how model-free mode is triggered in the main entry point, improving metadata and subdirectory copying for diffusion models, and expanding the test suite to ensure correct behavior. Additionally, a minor fix is made to a Gemma3 model test.

Model-free quantization improvements:

Updated documentation in both README.md and README_CN.md to clarify that model-free quantization is now the default when using --iters 0 --disable_opt_rtn, with links to relevant documentation. [1] [2]
Refactored auto_round/__main__.py to remove the explicit model-free routing logic, as model-free mode is now handled internally by AutoRound. The main entry point now simply passes the relevant flags. [1] [2]

Diffusion model metadata handling:

Enhanced the _copy_metadata_files method in auto_round/compressors/model_free.py to copy both root-level files and all sub-component directories (e.g., vae, scheduler, tokenizer) for diffusion models, ensuring the output directory is a faithful replica without overwriting the quantized transformer.

Testing improvements:

Added comprehensive tests in test/test_cpu/quantization/test_model_free.py to verify correct copying of metadata and subfolders for both standard and diffusion models, and to ensure the quantized transformer is not overwritten during the process.

Evaluation and model handling:

Moved the model.eval() call from the main quantization routine to the evaluation phase to ensure the model is in evaluation mode only when needed. [1] [2]

Minor fixes:

Updated the Gemma3 model test to use the correct number of layers, improving test reliability.

Type of Change

Bug fix

Related Issues

Fixes or relates to #1800, #1778

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.
The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he · 2026-05-13T07:38:54Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-05-13T07:39:02Z

Azure Pipelines successfully started running 1 pipeline(s).

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR fixes and improves “model-free” quantization behavior (especially for diffusion models), simplifies CLI routing by delegating model-free selection to AutoRound, and expands unit tests to cover metadata/subfolder copying and CUDA model block-name detection.

Changes:

Refactors auto_round/__main__.py to remove explicit model-free routing and instead pass model_free / disable_model_free through to AutoRound.
Improves diffusion model metadata copying in model-free mode to include non-transformer subcomponent directories while preserving the quantized transformer/.
Adds/updates tests for diffusion/non-diffusion metadata copying and adjusts Gemma3 CUDA test expectations for tiny-model layer counts.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`auto_round/__main__.py`	Removes CLI-side model-free routing logic; forwards flags to `AutoRound`.
`auto_round/compressors/model_free.py`	Copies diffusion root metadata + subcomponent directories to output without overwriting quantized transformer.
`auto_round/eval/evaluation.py`	Moves `model.eval()` to evaluation entry point.
`test/test_cpu/quantization/test_model_free.py`	Adds tests for metadata/subfolder copying (diffusion + non-diffusion) and non-overwrite of quantized transformer.
`test/test_cuda/models/test_get_block_name.py`	Fixes Gemma3 tiny-model layer count expectations.
`README.md`	Documents updated model-free default behavior for `--iters 0 --disable_opt_rtn`.
`README_CN.md`	Chinese translation of the same model-free default documentation update.

xin3he · 2026-05-14T01:55:41Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-05-14T01:55:50Z

Azure Pipelines successfully started running 1 pipeline(s).

wenhuach21 · 2026-05-14T01:57:54Z

is it possible to copy the folders regardless of whether it's model free or not

xin3he · 2026-05-14T03:08:08Z

is it possible to copy the folders regardless of whether it's model free or not

These are different saving paths, I had enhanced copy_python_files_from_model_cache to copy_folders before.

auto-round/auto_round/utils/model.py

Line 1742 in 5bbe39b

    
           def copy_python_files_from_model_cache(model, save_path: str, copy_folders: bool | list[str] | tuple[str, ...] = False):

xin3he · 2026-05-14T03:08:16Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-05-14T03:08:26Z

Azure Pipelines successfully started running 1 pipeline(s).

xin3he · 2026-05-14T05:16:02Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-05-14T05:16:12Z

Azure Pipelines successfully started running 1 pipeline(s).

xin3he · 2026-05-14T12:14:41Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-05-14T12:14:51Z

Azure Pipelines successfully started running 1 pipeline(s).

xin3he added 3 commits May 13, 2026 10:54

fix CUDA CI and update document

2182eb9

Signed-off-by: Xin He <xin3.he@intel.com>

fix disable_model_free

7072f27

Signed-off-by: Xin He <xin3.he@intel.com>

fix copy file issue

1fd418b

Signed-off-by: Xin He <xin3.he@intel.com>

Copilot AI review requested due to automatic review settings May 13, 2026 07:38

Copilot started reviewing on behalf of xin3he May 13, 2026 07:39 View session

[pre-commit.ci] auto fixes from pre-commit.com hooks

19c519d

for more information, see https://pre-commit.ci

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread auto_round/compressors/model_free.py

Comment thread README.md

Comment thread README_CN.md

chensuyue added this to the 0.13.0 milestone May 13, 2026

xin3he requested review from n1ck-guo, wenhuach21 and yiliu30 May 14, 2026 01:54

Merge branch 'main' into xinhe/5-13

8f764b5

Conversation

xin3he commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

xin3he commented May 13, 2026

Uh oh!

azure-pipelines Bot commented May 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xin3he commented May 14, 2026

Uh oh!

azure-pipelines Bot commented May 14, 2026

Uh oh!

wenhuach21 commented May 14, 2026

Uh oh!

xin3he commented May 14, 2026

Uh oh!

xin3he commented May 14, 2026

Uh oh!

azure-pipelines Bot commented May 14, 2026

Uh oh!

xin3he commented May 14, 2026

Uh oh!

azure-pipelines Bot commented May 14, 2026

Uh oh!

xin3he commented May 14, 2026

Uh oh!

azure-pipelines Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xin3he commented May 13, 2026 •

edited

Loading