Commit ef8005d
fix: Qwen3.5-MoE uses standard RMS norm (not residual), fix garbled output
The Qwen3.5-MoE model stores full RMS norm weights (centered ~1.0), not
residuals. With residual_rms_norm=true, cake was adding 1.0 to the weights,
doubling the norm output and causing activations to explode through 40 layers.
Also removes debug logging from GPTQ dequant paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 225396a commit ef8005d
2 files changed
Lines changed: 12 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
132 | | - | |
| 132 | + | |
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
| 215 | + | |
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
| |||
0 commit comments