Qwen3.5 slow with CUDA on 1.109.2 and Win11

**Describe the Issue**
Using Qwen3.5 Models via CUDA on version 1.109.2 is significantly slower than on version 1.108.2 (Windows 11). However, performance via Vulkan remains quite fast - strangely faster than CUDA.

**Additional Information:**
Windows 11, Ryzen 8700G, Nvidia RTX3090
All benchmarked with translation of the same around 5000 token long text.

Mistral Small 24B behaves as expected - Vulkan is a bit slower than CUDA
1.108.2 CUDA
PP 1713 TG 40

1.109.2 CUDA
PP 1758 TG 40
1.109.2 Vulkan
PP 1593 TG 37

Qwen3.5-35B-A3B-Q4_K_M
1.108.2 CUDA
PP 2200 TG 76
1.108.2 Vulkan
PP 2541 TG 88

1.109.2 CUDA
PP 2131 TG 51 (TG is much lower)
1.109.2 Vulkan
PP 2534 TG 87

Qwen3.5-9B-Q5_K_M
1.108.2 CUDA
PP 2992 TG 67

1.109.2 CUDA
PP 3173 TG 52 (TG is much lower)
1.109.2 Vulkan
PP 2895 TG 70

I have similar results on a second computer with Win11, Ryzen 5900X, Nvidia RTX3090
Especially token generation with the Qwen 3.5 Models and CUDA on the 1.109 versions is slower.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.5 slow with CUDA on 1.109.2 and Win11 #2014

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Qwen3.5 slow with CUDA on 1.109.2 and Win11 #2014

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions