Skip to content

fix save_quantized log conflict#1845

Open
WeiweiZhang1 wants to merge 5 commits into
mainfrom
fix_save_quantized_logging_conflict
Open

fix save_quantized log conflict#1845
WeiweiZhang1 wants to merge 5 commits into
mainfrom
fix_save_quantized_logging_conflict

Conversation

@WeiweiZhang1
Copy link
Copy Markdown
Contributor

@WeiweiZhang1 WeiweiZhang1 commented May 22, 2026

#1841

Description

Please briefly describe your main changes, the motivation.

Type of Change

Bug fix

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.
  • The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>
Copilot AI review requested due to automatic review settings May 22, 2026 07:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses the noisy “output_dir already exists, this may cause model conflict” warning that occurs when immediate saving (ShardWriter) creates the output directory during quantization, making the subsequent export step think it’s an overwrite scenario.

Changes:

  • Adds an “immediate saving mode” detection helper and propagates an immediate_saving flag into export/save paths.
  • Suppresses “already exists” conflict warnings when the directory existence is expected due to immediate saving.
  • Adds a CPU test to validate immediate-saving exports don’t emit the spurious warning and produce loadable output artifacts.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/test_cpu/export/test_export.py Adds a test for immediate-saving export behavior and log noise.
auto_round/export/utils.py Introduces immediate-saving detection and extends save_model() with an immediate_saving option.
auto_round/export/export_to_llmcompressor/export.py Passes immediate-saving status into save_model().
auto_round/export/export_to_llmcompressor/export_to_static_fp.py Suppresses “already exists” warning when immediate saving is active; passes flag to save_model().
auto_round/export/export_to_llmcompressor/export_to_fp.py Suppresses “already exists” warning when immediate saving is active; passes flag to save_model().
auto_round/export/export_to_awq/export.py Suppresses “already exists” warning when immediate saving is active; passes flag to save_model().
auto_round/export/export_to_autoround/export.py Adds conflict warning gated by immediate saving; passes flag to save_model().
auto_round/export/export_to_autoround/export_to_nvfp_mx.py Suppresses “already exists” warning when immediate saving is active; passes flag to save_model().
auto_round/export/export_to_autoround/export_to_fp8.py Passes immediate-saving status into save_model().
auto_round/export/export_to_autogptq/export.py Adds immediate-saving detection and passes flag into save_model().
auto_round/compressors/base.py Includes is_immediate_saving in serialization_dict so exporters can reliably detect immediate-saving mode.

Comment thread auto_round/export/utils.py
Comment thread test/test_cpu/export/test_export.py
@WeiweiZhang1
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>
@WeiweiZhang1
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

from auto_round.version import __version__

serialization_dict["autoround_version"] = __version__
serialization_dict["is_immediate_saving"] = getattr(self.compress_context, "is_immediate_saving", False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd better not put is_immediate_saving to serialization_dict as serialization_dict should only save args that may affect the accuracy or formats, which will be dumped to config.json

Comment thread docs/step_by_step.md
AutoScheme automatically generates adaptive mixed-bit and mixed-data-type quantization recipes. For accuracy results, see [AutoScheme Accuracy Report](./auto_scheme_acc.md).

**Please note that mixed data types are supported during tuning, but cannot be exported to real models at this time..**
**Note:** Mixed data-type recipes (e.g., MXFP4/MXFP8, W2/W4/W8) are supported for both tuning and export.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better revert/refine this line of change as the meaning is different

Comment thread docs/step_by_step.md
#### CLI Usage
use `iters=200`for tuning.

Use `--iters 0` for RTN-based scheme search (fastest). Add `--iters 200` if you want tuning-aware scheme selection.
Copy link
Copy Markdown
Contributor

@wenhuach21 wenhuach21 May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please have a refinement, not easy to follow

@chensuyue chensuyue added this to the 0.13.0 milestone May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants