Skip to content

Commit ef8005d

Browse files
evilsocketclaude
andcommitted
fix: Qwen3.5-MoE uses standard RMS norm (not residual), fix garbled output
The Qwen3.5-MoE model stores full RMS norm weights (centered ~1.0), not residuals. With residual_rms_norm=true, cake was adding 1.0 to the weights, doubling the norm output and causing activations to explode through 40 layers. Also removes debug logging from GPTQ dequant paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 225396a commit ef8005d

2 files changed

Lines changed: 12 additions & 2 deletions

File tree

Cargo.lock

Lines changed: 10 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

cake-core/src/models/qwen3_5_moe/config.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ impl Qwen3_5MoeConfig {
129129
head_dim: Some(tc.head_dim),
130130
partial_rotary_factor,
131131
linear_attn,
132-
residual_rms_norm: true,
132+
residual_rms_norm: false,
133133
use_qk_norm: false,
134134
pre_reshape_qk_norm: false,
135135
sliding_window: None,
@@ -212,7 +212,7 @@ mod tests {
212212
assert_eq!(c.shared_expert_intermediate_size, Some(8192));
213213
assert!(c.attn_output_gate);
214214
assert!(c.norm_topk_prob);
215-
assert!(c.residual_rms_norm);
215+
assert!(!c.residual_rms_norm);
216216
assert_eq!(c.model_prefix, "model.language_model");
217217
assert_eq!(c.head_dim, Some(256));
218218
assert_eq!(c.rope_theta, 10_000_000.0);

0 commit comments

Comments
 (0)