fix bug of qwen and gguf export by n1ck-guo · Pull Request #1846 · intel/auto-round

n1ck-guo · 2026-05-22T09:23:25Z

Description

This PR updates AutoRound’s GGUF export integration to match the latest llama.cpp converter architecture.
Why
llama.cpp moved model-specific GGUF conversion logic out of convert_hf_to_gguf.py and into the new conversion/ package. AutoRound previously depended on symbols from convert_hf_to_gguf.py, which is now only a CLI wrapper and no longer sufficient for GGUF export.

What Changed

Added a bundled llama.cpp conversion/ snapshot for out-of-the-box GGUF export.
Added llama_cpp_conversion.py as the single adapter for GGUF conversion loading.
Removed AutoRound’s dependency on convert_hf_to_gguf.py.
Added support for LLAMA_CPP_ROOT to use a local llama.cpp checkout.
Added optional AUTO_ROUND_GGUF_AUTO_UPDATE=1 support to dynamically download the minimal required conversion files when a model is unsupported by the bundled snapshot.
Added a sync script to refresh the bundled conversion/ snapshot from a pinned llama.cpp commit.
Excluded bundled upstream conversion files from formatting/lint/autofix hooks to keep them close to upstream.

New Behavior
By default, users can continue using GGUF export without extra setup. For newer models not supported by the bundled converter, users can either set LLAMA_CPP_ROOT or enable AUTO_ROUND_GGUF_AUTO_UPDATE=1 to try the latest llama.cpp conversion logic dynamically.

Type of Change

Bug fix

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.
The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot

Pull request overview

This PR updates AutoRound’s GGUF export and quantization pipeline to support additional quantization/export behaviors (notably ModelOpt NVFP4 repacking, optional MoE expert tensor fusion, and some broader model/config handling tweaks), plus a small API-compatibility improvement in the SignRound quantizer.

Changes:

Add ModelOpt NVFP4 tensor repacking/writing logic and related metadata handling in the HF→GGUF converter.
Add optional MoE gate/up expert tensor fusion during GGUF tensor emission.
Improve robustness/compatibility in a few spots (safe checkpoint mapping access, config aliasing, extra tensor filtering, SignRound kwargs passthrough, log message cleanup).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
auto_round/export/export_to_gguf/convert.py	Avoids assuming `_checkpoint_conversion_mapping` exists on models during tensor enumeration.
auto_round/export/export_to_gguf/convert_hf_to_gguf.py	Adds NVFP4 repacking + scale tensor writing, optional expert fusion, and several model/config export tweaks.
auto_round/compressors/entry.py	Simplifies one-time mode-selection log messages.
auto_round/algorithms/quantization/sign_round/quantizer.py	Allows passing through kwargs (e.g. `disable_opt_rtn`) when quantizing layers outside blocks.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

wenhuach21 · 2026-05-25T06:56:27Z

check gguf version>=xxx

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo · 2026-05-26T05:20:22Z