Skip to content

fix qwen3.6 vllm infer bug#1746

Open
n1ck-guo wants to merge 9 commits into
mainfrom
hengguo/fix_qwen_bug
Open

fix qwen3.6 vllm infer bug#1746
n1ck-guo wants to merge 9 commits into
mainfrom
hengguo/fix_qwen_bug

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

Description

Please briefly describe your main changes, the motivation.

Type of Change

Bug fix

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.
  • The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

Copilot AI review requested due to automatic review settings April 27, 2026 09:11
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the special-casing logic for the W4A16_MIXED quantization recipe when running on multimodal (MLLM) models, likely to address an inference incompatibility observed with Qwen3.6 + vLLM.

Changes:

  • In W4A16_MIXED, skip applying the previous MLLM-specific per-layer override (previously forcing non-lm_head layers to bits=16).
  • Leave affected layers to be handled by the downstream/default layer-config normalization logic.

Comment thread auto_round/schemes.py
Comment thread auto_round/schemes.py Outdated
Comment thread auto_round/schemes.py Outdated
@wenhuach21 wenhuach21 added this to the 0.13.0 milestone May 9, 2026
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo n1ck-guo added the ready only add when the PR is ready to merge label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready only add when the PR is ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants