Skip to content

gguf better support for transformers5.0 and fix bug of Qwen3Next#1474

Merged
n1ck-guo merged 5 commits intomainfrom
hengguo/gguf_transformers5.0
Mar 3, 2026
Merged

gguf better support for transformers5.0 and fix bug of Qwen3Next#1474
n1ck-guo merged 5 commits intomainfrom
hengguo/gguf_transformers5.0

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

@n1ck-guo n1ck-guo commented Feb 27, 2026

Description

gguf better support for transformers5.0 and fix bug of Qwen3Next

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #1454

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

…oder-Next

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve GGUF export compatibility with Transformers v5.0 and address a Qwen3Next GGUF export failure reported in #1454.

Changes:

  • Re-enable GGUF-related CUDA/CPU tests for Transformers ≥ 5.0 by removing version-based skips.
  • Adjust GGUF export conversion logic to handle Qwen3Next tensor handling and tweak memory-clearing behavior.
  • Improve test fixture model saving by copying tokenizer.model when present, and reduce repeated calibration warnings via warning_once.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/test_cuda/export/test_gguf.py Removes Transformers v5 skip to run GGUF export tests on v5+.
test/test_cuda/advanced/test_fp8_input.py Removes Transformers v5 skip for GGUF-related FP8 tests.
test/test_cpu/export/test_gguf_format.py Removes Transformers v5 skip for CPU GGUF format tests.
test/helpers.py Copies tokenizer.model into the saved tiny model directory for GGUF/tokenizer compatibility.
auto_round/export/export_to_gguf/convert.py Adds Qwen3Next-specific tensor modification hook and changes memory clearing device selection.
auto_round/compressors/utils.py Wraps gguf-py architecture detection to provide a clearer “upgrade gguf-py” error path.
auto_round/compressors/base.py Switches an “insufficient samples” warning to logger.warning_once.

Comment thread auto_round/export/export_to_gguf/convert.py Outdated
n1ck-guo and others added 4 commits February 27, 2026 15:03
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo n1ck-guo merged commit ab14698 into main Mar 3, 2026
29 checks passed
@n1ck-guo n1ck-guo deleted the hengguo/gguf_transformers5.0 branch March 3, 2026 07:43
lvliang-intel pushed a commit that referenced this pull request Mar 3, 2026
WeiweiZhang1 pushed a commit that referenced this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants