Skip to content

fix bug of qwen and gguf export#1846

Open
n1ck-guo wants to merge 10 commits into
mainfrom
hengguo/bug_fix_522
Open

fix bug of qwen and gguf export#1846
n1ck-guo wants to merge 10 commits into
mainfrom
hengguo/bug_fix_522

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

@n1ck-guo n1ck-guo commented May 22, 2026

Description

This PR updates AutoRound’s GGUF export integration to match the latest llama.cpp converter architecture.
Why
llama.cpp moved model-specific GGUF conversion logic out of convert_hf_to_gguf.py and into the new conversion/ package. AutoRound previously depended on symbols from convert_hf_to_gguf.py, which is now only a CLI wrapper and no longer sufficient for GGUF export.

What Changed

  • Added a bundled llama.cpp conversion/ snapshot for out-of-the-box GGUF export.
  • Added llama_cpp_conversion.py as the single adapter for GGUF conversion loading.
  • Removed AutoRound’s dependency on convert_hf_to_gguf.py.
  • Added support for LLAMA_CPP_ROOT to use a local llama.cpp checkout.
  • Added optional AUTO_ROUND_GGUF_AUTO_UPDATE=1 support to dynamically download the minimal required conversion files when a model is unsupported by the bundled snapshot.
  • Added a sync script to refresh the bundled conversion/ snapshot from a pinned llama.cpp commit.
  • Excluded bundled upstream conversion files from formatting/lint/autofix hooks to keep them close to upstream.

New Behavior
By default, users can continue using GGUF export without extra setup. For newer models not supported by the bundled converter, users can either set LLAMA_CPP_ROOT or enable AUTO_ROUND_GGUF_AUTO_UPDATE=1 to try the latest llama.cpp conversion logic dynamically.

Type of Change

Bug fix

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.
  • The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates AutoRound’s GGUF export and quantization pipeline to support additional quantization/export behaviors (notably ModelOpt NVFP4 repacking, optional MoE expert tensor fusion, and some broader model/config handling tweaks), plus a small API-compatibility improvement in the SignRound quantizer.

Changes:

  • Add ModelOpt NVFP4 tensor repacking/writing logic and related metadata handling in the HF→GGUF converter.
  • Add optional MoE gate/up expert tensor fusion during GGUF tensor emission.
  • Improve robustness/compatibility in a few spots (safe checkpoint mapping access, config aliasing, extra tensor filtering, SignRound kwargs passthrough, log message cleanup).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
auto_round/export/export_to_gguf/convert.py Avoids assuming _checkpoint_conversion_mapping exists on models during tensor enumeration.
auto_round/export/export_to_gguf/convert_hf_to_gguf.py Adds NVFP4 repacking + scale tensor writing, optional expert fusion, and several model/config export tweaks.
auto_round/compressors/entry.py Simplifies one-time mode-selection log messages.
auto_round/algorithms/quantization/sign_round/quantizer.py Allows passing through kwargs (e.g. disable_opt_rtn) when quantizing layers outside blocks.

Comment thread auto_round/export/export_to_gguf/convert_hf_to_gguf.py Outdated
Comment thread auto_round/export/export_to_gguf/convert_hf_to_gguf.py Outdated
Comment thread auto_round/export/export_to_gguf/convert_hf_to_gguf.py Outdated
@n1ck-guo n1ck-guo changed the title refactor and support for multi algs fusion fix bug of qwen and gguf export May 25, 2026
@wenhuach21
Copy link
Copy Markdown
Contributor

check gguf version>=xxx

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Comment thread auto_round/export/export_to_gguf/convert.py Outdated
@chensuyue chensuyue added this to the 0.13.0 milestone May 25, 2026
n1ck-guo added 2 commits May 25, 2026 16:31
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
n1ck-guo added 2 commits May 26, 2026 08:34
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines failed to run 1 pipeline(s).

@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants