fix qwen3.6 vllm infer bug by n1ck-guo · Pull Request #1746 · intel/auto-round

n1ck-guo · 2026-04-27T09:11:43Z

Description

Please briefly describe your main changes, the motivation.

Type of Change

Bug fix

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.
The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot

Pull request overview

This PR adjusts the special-casing logic for the W4A16_MIXED quantization recipe when running on multimodal (MLLM) models, likely to address an inference incompatibility observed with Qwen3.6 + vLLM.

Changes:

In W4A16_MIXED, skip applying the previous MLLM-specific per-layer override (previously forcing non-lm_head layers to bits=16).
Leave affected layers to be handled by the downstream/default layer-config normalization logic.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…round into hengguo/fix_qwen_bug

n1ck-guo · 2026-05-12T00:44:54Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-05-12T00:45:04Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI review requested due to automatic review settings April 27, 2026 09:11

fix qwen3.6 vllm infer bug

8812101

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot started reviewing on behalf of n1ck-guo April 27, 2026 09:12 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread auto_round/schemes.py

Comment thread auto_round/schemes.py Outdated

Comment thread auto_round/schemes.py Outdated

n1ck-guo and others added 2 commits May 7, 2026 09:23

Merge branch 'main' into hengguo/fix_qwen_bug

1a90dca

Merge branch 'main' into hengguo/fix_qwen_bug

0265899

wenhuach21 added this to the 0.13.0 milestone May 9, 2026

wenhuach21 and others added 4 commits May 9, 2026 16:17

Merge branch 'main' into hengguo/fix_qwen_bug

a28fd9d

update

f00f680

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'hengguo/fix_qwen_bug' of https://github.com/intel/auto-…

d9cd1f4

…round into hengguo/fix_qwen_bug

Merge branch 'main' into hengguo/fix_qwen_bug

9534e54

n1ck-guo added the ready only add when the PR is ready to merge label May 12, 2026

n1ck-guo and others added 2 commits May 13, 2026 10:36

Merge branch 'main' into hengguo/fix_qwen_bug

f962397

Merge branch 'main' into hengguo/fix_qwen_bug

053077f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix qwen3.6 vllm infer bug#1746

fix qwen3.6 vllm infer bug#1746
n1ck-guo wants to merge 9 commits into
mainfrom
hengguo/fix_qwen_bug

n1ck-guo commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n1ck-guo commented May 12, 2026

Uh oh!

azure-pipelines Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

n1ck-guo commented Apr 27, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n1ck-guo commented May 12, 2026

Uh oh!

azure-pipelines Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants