Skip to content

[bug fix] model free: fix copy folder of diffusion model, fix unresponsive disable_model_free, and cuda UT#1809

Open
xin3he wants to merge 5 commits into
mainfrom
xinhe/5-13
Open

[bug fix] model free: fix copy folder of diffusion model, fix unresponsive disable_model_free, and cuda UT#1809
xin3he wants to merge 5 commits into
mainfrom
xinhe/5-13

Conversation

@xin3he
Copy link
Copy Markdown
Contributor

@xin3he xin3he commented May 13, 2026

Description

This pull request introduces several improvements and fixes related to model-free quantization, especially for diffusion models, and enhances test coverage for metadata copying. The most significant changes include updating the documentation to clarify model-free quantization defaults, refactoring how model-free mode is triggered in the main entry point, improving metadata and subdirectory copying for diffusion models, and expanding the test suite to ensure correct behavior. Additionally, a minor fix is made to a Gemma3 model test.

Model-free quantization improvements:

  • Updated documentation in both README.md and README_CN.md to clarify that model-free quantization is now the default when using --iters 0 --disable_opt_rtn, with links to relevant documentation. [1] [2]
  • Refactored auto_round/__main__.py to remove the explicit model-free routing logic, as model-free mode is now handled internally by AutoRound. The main entry point now simply passes the relevant flags. [1] [2]

Diffusion model metadata handling:

  • Enhanced the _copy_metadata_files method in auto_round/compressors/model_free.py to copy both root-level files and all sub-component directories (e.g., vae, scheduler, tokenizer) for diffusion models, ensuring the output directory is a faithful replica without overwriting the quantized transformer.

Testing improvements:

  • Added comprehensive tests in test/test_cpu/quantization/test_model_free.py to verify correct copying of metadata and subfolders for both standard and diffusion models, and to ensure the quantized transformer is not overwritten during the process.

Evaluation and model handling:

  • Moved the model.eval() call from the main quantization routine to the evaluation phase to ensure the model is in evaluation mode only when needed. [1] [2]

Minor fixes:

  • Updated the Gemma3 model test to use the correct number of layers, improving test reliability.

Type of Change

Bug fix

Related Issues

Fixes or relates to #1800, #1778

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.
  • The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

xin3he added 3 commits May 13, 2026 10:54
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
Copilot AI review requested due to automatic review settings May 13, 2026 07:38
@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented May 13, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes and improves “model-free” quantization behavior (especially for diffusion models), simplifies CLI routing by delegating model-free selection to AutoRound, and expands unit tests to cover metadata/subfolder copying and CUDA model block-name detection.

Changes:

  • Refactors auto_round/__main__.py to remove explicit model-free routing and instead pass model_free / disable_model_free through to AutoRound.
  • Improves diffusion model metadata copying in model-free mode to include non-transformer subcomponent directories while preserving the quantized transformer/.
  • Adds/updates tests for diffusion/non-diffusion metadata copying and adjusts Gemma3 CUDA test expectations for tiny-model layer counts.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
auto_round/__main__.py Removes CLI-side model-free routing logic; forwards flags to AutoRound.
auto_round/compressors/model_free.py Copies diffusion root metadata + subcomponent directories to output without overwriting quantized transformer.
auto_round/eval/evaluation.py Moves model.eval() to evaluation entry point.
test/test_cpu/quantization/test_model_free.py Adds tests for metadata/subfolder copying (diffusion + non-diffusion) and non-overwrite of quantized transformer.
test/test_cuda/models/test_get_block_name.py Fixes Gemma3 tiny-model layer count expectations.
README.md Documents updated model-free default behavior for --iters 0 --disable_opt_rtn.
README_CN.md Chinese translation of the same model-free default documentation update.

Comment thread auto_round/compressors/model_free.py
Comment thread README.md
Comment thread README_CN.md
@chensuyue chensuyue added this to the 0.13.0 milestone May 13, 2026
@xin3he xin3he requested review from n1ck-guo, wenhuach21 and yiliu30 May 14, 2026 01:54
@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented May 14, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@wenhuach21
Copy link
Copy Markdown
Contributor

is it possible to copy the folders regardless of whether it's model free or not

@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented May 14, 2026

is it possible to copy the folders regardless of whether it's model free or not

These are different saving paths, I had enhanced copy_python_files_from_model_cache to copy_folders before.

def copy_python_files_from_model_cache(model, save_path: str, copy_folders: bool | list[str] | tuple[str, ...] = False):

@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented May 14, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented May 14, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented May 14, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants