Just reporting that https://github.com/LostRuins/koboldcpp/actions/runs/21299565080 introduced a bug causing garbled thinking/output at least on GLM 4.7 Flash and GLM 4.5 Air models regardless of whether Flash Attention is on or off. Rolling back to https://github.com/LostRuins/koboldcpp/actions/runs/21210595986 fixes the issue, so it was likely something merged in between. Could it be b70d251 ?
Just reporting that https://github.com/LostRuins/koboldcpp/actions/runs/21299565080 introduced a bug causing garbled thinking/output at least on GLM 4.7 Flash and GLM 4.5 Air models regardless of whether Flash Attention is on or off. Rolling back to https://github.com/LostRuins/koboldcpp/actions/runs/21210595986 fixes the issue, so it was likely something merged in between. Could it be b70d251 ?