fix: Q1_0_g128 CPU kernel - correct output and AVX-512 SIMD#3
fix: Q1_0_g128 CPU kernel - correct output and AVX-512 SIMD#3jordankzf wants to merge 1 commit intoPrismML-Eng:prismfrom
Conversation
|
Worked as a charm in my Ryzen 5700U. rafaelfrequiao@ideapad:~$ echo "=== 1. Preparando o repositório com o PR #3 ==="
cd ~/ai-lab echo "=== 1. Preparando o repositório com o PR #3 ==="
cd ~/ai-laba.cpp-bonsai
rm -rf llama.cpp-bonsaib.com/PrismML-Eng/llama.cpp.git llama.cpp-bonsai
git clone https://github.com/PrismML-Eng/llama.cpp.git llama.cpp-bonsai
cd llama.cpp-bonsai
# Aqui está a mágica: baixando a correção exata do Pull Request 3
# Aqui está a mágica: baixando a correção exata do Pull Request 3
git fetch origin pull/3/head:correcao-cpu
git checkout correcao-cpu
echo "=== 2. Compilando com a correção ==="
echo "=== 2. Compilando com a correção ==="
cmake -B buildbuild -j$(nproc) --target llama-cli llama-server
cmake --build build -j$(nproc) --target llama-cli llama-server
echo "=== 3. Verificando o modelo 8B ==="
echo "=== 3. Verificando o modelo 8B ==="f/8B/Bonsai-8B.gguf ]; then
if [ ! -f ~/ai-lab/Bonsai-demo/models/gguf/8B/Bonsai-8B.gguf ]; then
echo "Modelo não encontrado. Baixando o Bonsai 8B..."
mkdir -p ~/ai-lab/Bonsai-demo/models/gguf/8B8B/Bonsai-8B.gguf "https://huggingface.co/prism-ml/Bonsai-8B-gguf/re curl -L -o ~/ai-lab/Bonsai-demo/models/gguf/8B/Bonsai-8B.gguf "https://huggingface.co/prism-ml/Bonsai-8B-gguf/resolve/main/Bonsai-8B.gguf"
fi
echo "=== 4. O Teste de Fogo ==="
echo "=== 4. O Teste de Fogo ==="
./build/bin/llama-cli \mo/models/gguf/8B/*.gguf \
-m ~/ai-lab/Bonsai-demo/models/gguf/8B/*.gguf \
-p "A capital do Brasil é " \
-n 50 \
-t 8
=== 1. Preparando o repositório com o PR #3 ===
Clonando en 'llama.cpp-bonsai'...
remote: Enumerating objects: 66862, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 66862 (delta 8), reused 2 (delta 2), pack-reused 66823 (from 2)
Recibiendo objetos: 100% (66862/66862), 307.07 MiB | 3.32 MiB/s, listo.
Resolviendo deltas: 100% (47439/47439), listo.
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 9 (delta 0), reused 0 (delta 0), pack-reused 2 (from 1)
Desempaquetando objetos: 100% (9/9), 31.66 KiB | 810.00 KiB/s, listo.
Desde https://github.com/PrismML-Eng/llama.cpp
* [nueva referencia] refs/pull/3/head -> correcao-cpu
Cambiado a rama 'correcao-cpu'
=== 2. Compilando com a correção ===
-- The C compiler identification is GNU 14.2.0
-- The CXX compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMAKE_BUILD_TYPE=Release
-- Found Git: /usr/bin/git (found version "2.47.3")
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- ggml version: 0.9.7
-- ggml commit: aec184c6b
-- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libcrypto.so (found version "3.5.4")
-- Performing Test OPENSSL_VERSION_SUPPORTED
-- Performing Test OPENSSL_VERSION_SUPPORTED - Success
-- OpenSSL found: 3.5.4
-- Generating embedded license file for target: common
-- Configuring done (3.9s)
-- Generating done (0.3s)
-- Build files have been written to: /home/rafaelfrequiao/ai-lab/llama.cpp-bonsai/build
[ 0%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o
[ 1%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o
[ 1%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o
[ 1%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o
[ 1%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml.cpp.o
[ 3%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o
[ 3%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o
[ 3%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o
[ 3%] Building CXX object vendor/cpp-httplib/CMakeFiles/cpp-httplib.dir/httplib.cpp.o
[ 3%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 3%] Built target build_info
[ 3%] Linking CXX static library libcpp-httplib.a
[ 3%] Built target cpp-httplib
[ 3%] Linking CXX shared library ../../bin/libggml-base.so
[ 3%] Built target ggml-base
[ 5%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/unary-ops.cpp.o
[ 5%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o
[ 5%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/quants.c.o
[ 7%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
[ 7%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/repack.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/vec.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/traits.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ops.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o
[ 9%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/quants.c.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/binary-ops.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/hbm.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 9%] Linking CXX shared library ../../bin/libggml-cpu.so
[ 9%] Built target ggml-cpu
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-backend-dl.cpp.o
[ 11%] Linking CXX shared library ../../bin/libggml.so
[ 11%] Built target ggml
[ 13%] Building CXX object src/CMakeFiles/llama.dir/llama.cpp.o
[ 13%] Building CXX object src/CMakeFiles/llama.dir/llama-arch.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-chat.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-batch.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-cparams.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-adapter.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-context.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-kv-cache.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-io.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-hparams.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-graph.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-impl.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-kv-cache-iswa.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-grammar.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-memory.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-hybrid.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-quant.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-mmap.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-hybrid-iswa.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-model.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-model-saver.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-recurrent.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-sampler.cpp.o
[ 23%] Building CXX object src/CMakeFiles/llama.dir/llama-vocab.cpp.o
[ 23%] Building CXX object src/CMakeFiles/llama.dir/llama-model-loader.cpp.o
[ 23%] Building CXX object src/CMakeFiles/llama.dir/unicode.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/arcee.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/apertus.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/unicode-data.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/arctic.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/afmoe.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/arwkv7.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/baichuan.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/bert.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/bailingmoe2.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/bailingmoe.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/chameleon.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/chatglm.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/bitnet.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/codeshell.cpp.o
[ 28%] Building CXX object src/CMakeFiles/llama.dir/models/bloom.cpp.o
[ 30%] Building CXX object src/CMakeFiles/llama.dir/models/cohere2-iswa.cpp.o
[ 30%] Building CXX object src/CMakeFiles/llama.dir/models/cogvlm.cpp.o
[ 30%] Building CXX object src/CMakeFiles/llama.dir/models/dbrx.cpp.o
[ 30%] Building CXX object src/CMakeFiles/llama.dir/models/command-r.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/deci.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/deepseek.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/deepseek2.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/delta-net-base.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/dots1.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/dream.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/ernie4-5-moe.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/ernie4-5.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/exaone4.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/eurobert.cpp.o
[ 36%] Building CXX object src/CMakeFiles/llama.dir/models/exaone-moe.cpp.o
[ 36%] Building CXX object src/CMakeFiles/llama.dir/models/falcon-h1.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gemma-embedding.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/exaone.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gemma2-iswa.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gemma3.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gemma.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/falcon.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gpt2.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/glm4.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/glm4-moe.cpp.o
[ 40%] Building CXX object src/CMakeFiles/llama.dir/models/gemma3n-iswa.cpp.o
[ 40%] Building CXX object src/CMakeFiles/llama.dir/models/grok.cpp.o
[ 40%] Building CXX object src/CMakeFiles/llama.dir/models/grovemoe.cpp.o
[ 42%] Building CXX object src/CMakeFiles/llama.dir/models/gptneox.cpp.o
[ 44%] Building CXX object src/CMakeFiles/llama.dir/models/granite.cpp.o
[ 44%] Building CXX object src/CMakeFiles/llama.dir/models/hunyuan-dense.cpp.o
[ 44%] Building CXX object src/CMakeFiles/llama.dir/models/granite-hybrid.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/jais.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/jais2.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/internlm2.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/kimi-linear.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/jamba.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/hunyuan-moe.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/lfm2.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/llada-moe.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/llada.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/maincoder.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/llama.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/llama-iswa.cpp.o
[ 50%] Building CXX object src/CMakeFiles/llama.dir/models/mamba-base.cpp.o
[ 50%] Building CXX object src/CMakeFiles/llama.dir/models/mamba.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/mimo2-iswa.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/minicpm3.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/minimax-m2.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/mpt.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/modern-bert.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/nemotron.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/mistral3.cpp.o
[ 53%] Building CXX object src/CMakeFiles/llama.dir/models/nemotron-h.cpp.o
[ 53%] Building CXX object src/CMakeFiles/llama.dir/models/neo-bert.cpp.o
[ 53%] Building CXX object src/CMakeFiles/llama.dir/models/olmo2.cpp.o
[ 53%] Building CXX object src/CMakeFiles/llama.dir/models/olmo.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/olmoe.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/openelm.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/openai-moe-iswa.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/pangu-embedded.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/orion.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/plamo.cpp.o
[ 57%] Building CXX object src/CMakeFiles/llama.dir/models/paddleocr.cpp.o
[ 57%] Building CXX object src/CMakeFiles/llama.dir/models/phi2.cpp.o
[ 57%] Building CXX object src/CMakeFiles/llama.dir/models/phi3.cpp.o
[ 59%] Building CXX object src/CMakeFiles/llama.dir/models/plamo2.cpp.o
[ 59%] Building CXX object src/CMakeFiles/llama.dir/models/plm.cpp.o
[ 59%] Building CXX object src/CMakeFiles/llama.dir/models/qwen.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/qwen2moe.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/qwen2.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/qwen2vl.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/plamo3.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3.cpp.o
[ 63%] Building CXX object src/CMakeFiles/llama.dir/models/qwen35.cpp.o
[ 63%] Building CXX object src/CMakeFiles/llama.dir/models/qwen35moe.cpp.o
[ 63%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3moe.cpp.o
[ 63%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3next.cpp.o
[ 65%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3vl-moe.cpp.o
[ 65%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3vl.cpp.o
[ 65%] Building CXX object src/CMakeFiles/llama.dir/models/refact.cpp.o
[ 65%] Building CXX object src/CMakeFiles/llama.dir/models/rnd1.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv6.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv6-base.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv6qwen2.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv7.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv7-base.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/seed-oss.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/smallthinker.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/stablelm.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/step35-iswa.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/starcoder.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/smollm3.cpp.o
[ 71%] Building CXX object src/CMakeFiles/llama.dir/models/starcoder2.cpp.o
[ 71%] Building CXX object src/CMakeFiles/llama.dir/models/t5-dec.cpp.o
[ 71%] Building CXX object src/CMakeFiles/llama.dir/models/t5-enc.cpp.o
[ 73%] Building CXX object src/CMakeFiles/llama.dir/models/xverse.cpp.o
[ 73%] Building CXX object src/CMakeFiles/llama.dir/models/wavtokenizer-dec.cpp.o
[ 73%] Linking CXX shared library ../bin/libllama.so
[ 73%] Built target llama
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/glm4v.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/conformer.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/nemotron-v2-vl.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-audio.cpp.o
[ 75%] Building CXX object common/CMakeFiles/common.dir/arg.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/cogvlm.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/kimik25.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/internvl.cpp.o
[ 76%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/llava.cpp.o
[ 76%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-helper.cpp.o
[ 76%] Building CXX object common/CMakeFiles/common.dir/chat-parser-xml-toolcall.cpp.o
[ 76%] Building CXX object common/CMakeFiles/common.dir/chat-parser.cpp.o
[ 78%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/kimivl.cpp.o
[ 78%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/llama4.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/minicpmv.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/chat.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/chat-peg-parser.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/paddleocr.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/pixtral.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen3vl.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/siglip.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/whisper-enc.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/mobilenetv5.cpp.o
[ 82%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/youtuvl.cpp.o
[ 82%] Building CXX object common/CMakeFiles/common.dir/json-partial.cpp.o
[ 82%] Building CXX object common/CMakeFiles/common.dir/download.cpp.o
[ 84%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen2vl.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/debug.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/llguidance.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/ngram-map.cpp.o
[ 88%] Building CXX object common/CMakeFiles/common.dir/log.cpp.o
[ 88%] Building CXX object common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 88%] Building CXX object common/CMakeFiles/common.dir/preset.cpp.o
[ 88%] Building CXX object common/CMakeFiles/common.dir/ngram-mod.cpp.o
[ 90%] Building CXX object common/CMakeFiles/common.dir/peg-parser.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/speculative.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/unicode.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 92%] Linking CXX shared library ../../bin/libmtmd.so
[ 92%] Building CXX object common/CMakeFiles/common.dir/jinja/lexer.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/regex-partial.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/jinja/runtime.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/jinja/parser.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/jinja/caps.cpp.o
[ 94%] Building CXX object common/CMakeFiles/common.dir/jinja/value.cpp.o
[ 94%] Building CXX object common/CMakeFiles/common.dir/__/license.cpp.o
[ 94%] Building CXX object common/CMakeFiles/common.dir/jinja/string.cpp.o
[ 96%] Linking CXX static library libcommon.a
[ 96%] Built target mtmd
[ 96%] Built target common
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-queue.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-task.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-context.cpp.o
[ 98%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-common.cpp.o
[ 98%] Linking CXX static library libserver-context.a
[ 98%] Built target server-context
[100%] Building CXX object tools/cli/CMakeFiles/llama-cli.dir/cli.cpp.o
[100%] Linking CXX executable ../../bin/llama-cli
[100%] Built target llama-cli
[ 0%] Built target build_info
[ 0%] Built target cpp-httplib
[ 3%] Built target ggml-base
[ 9%] Built target ggml-cpu
[ 11%] Built target ggml
[ 73%] Built target llama
[ 82%] Built target mtmd
[ 96%] Built target common
[ 98%] Built target server-context
[ 98%] Generating index.html.gz.hpp
[ 98%] Generating loading.html.hpp
[ 98%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server-http.cpp.o
[100%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server.cpp.o
[100%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server-models.cpp.o
[100%] Linking CXX executable ../../bin/llama-server
[100%] Built target llama-server
=== 3. Verificando o modelo 8B ===
=== 4. O Teste de Fogo ===
Loading model...
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b8195-aec184c6b
model : Bonsai-8B.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file
> A capital do Brasil é
A capital do Brasil é **Brasília**.
[ Prompt: 0,2 t/s | Generation: 0,2 t/s ]
>
Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - Host | 10627 = 1099 + 9216 + 312 |
rafaelfrequiao@ideapad:~/ai-lab/llama.cpp-bonsai$ |
|
This look great thanks, there was a few CPU kernel fixes and did not see them until I pushed my changes. For now removed the buggy x86, will merge one of the correct AVX ones. Could you run the KL divergence tests described here: #8 |
|
@khosravipasha I can't run the KL divergence tests. I have The Note that |
The Q1_0_g128 vec_dot kernel had a bug where `sumi` was declared as `int` but accumulated `float` partial products (`d1 * sumi_block`), causing float-to-int truncation that destroyed dot product results and produced gibberish output on CPU. Additionally, the x86 kernel was purely scalar (one bit at a time). This adds an AVX-512BW path that processes 32 elements per iteration using mask_sub + madd + fma, with a single horizontal reduction at the end. Benchmarks (Bonsai-8B, CPU-only, AVX-512): Before: 0.73 t/s prompt, 0.65 t/s generation (gibberish output) After: 23.2 t/s prompt, 13.5 t/s generation (coherent output)
aec184c to
082e830
Compare
|
The f16 GGUF isn't available on HuggingFace so I converted it from prism-ml/Bonsai-1.7B-unpacked (safetensors) using convert_hf_to_gguf.py --outtype f16. Setup: f16 reference: converted from prism-ml/Bonsai-1.7B-unpacked safetensors Same top p: 0.075 +/- 0.017 % |
|
@jordankzf Might mean some issue with the kernels, can you run the same command without your changes? In the meantime I will check the 1.7B unpacked weights to see if they are good. Also might not need to do 100 chunks for this test, few chunks are okay (at least until you get close to 0). Is the output from the mode cohesive? Try few complicated to see if the kernels are working. Two options to get the fp16-gguf:
|
|
Made the changes @khosravipasha Please have a look! KL divergence and coherence test results. Setup: f16 reference: dequantized from Bonsai-1.7B.gguf via llama-quantize --allow-requantize ... F16 Same top p: 97.843 +/- 0.288 % Coherence test (Bonsai-1.7B, complex prompts): Q: Explain the difference between TCP and UDP in networking Q: Write a haiku about programming Q: What causes ocean tides? Explain briefly. All responses are coherent and factually correct. 1.7B model runs at 33-40 t/s on CPU (AVX-512). |
There was a problem hiding this comment.
Pull request overview
Fixes incorrect CPU results for Q1_0_g128 dot products by correcting float accumulation, and adds an AVX-512BW SIMD implementation to improve performance.
Changes:
- Fix float-to-int truncation in the scalar/generic
q1_0_g128dot-product fallback by accumulating intofloat. - Add an AVX-512BW-accelerated SIMD path for
ggml_vec_dot_q1_0_g128_q8_0on x86.
Reviewed changes
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| ggml/src/ggml-cpu/arch/x86/quants.c | Replaces generic call with x86 implementation and adds AVX-512BW SIMD + corrected scalar fallback accumulation. |
| ggml/src/ggml-cpu/quants.c | Updates generic q1_0_g128 dot-product fallback to accumulate in float (fixes truncation). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| __m256 h = _mm256_add_ps(_mm512_extractf32x8_ps(acc, 0), | ||
| _mm512_extractf32x8_ps(acc, 1)); | ||
| __m128 q = _mm_add_ps(_mm256_extractf128_ps(h, 0), | ||
| _mm256_extractf128_ps(h, 1)); | ||
| q = _mm_add_ps(q, _mm_movehl_ps(q, q)); | ||
| q = _mm_add_ss(q, _mm_movehdup_ps(q)); | ||
| *s = _mm_cvtss_f32(q); |
There was a problem hiding this comment.
The AVX-512 path is guarded only by __AVX512BW__, but it uses _mm512_extractf32x8_ps for the horizontal reduction, which depends on additional AVX-512 subsets (typically __AVX512DQ__/__AVX512VL__). In builds where BW is enabled without those subsets, this block can fail to compile. Consider tightening the preprocessor guard to include the required feature macros, or switch the reduction to _mm512_reduce_add_ps (guarded by __AVX512F__) to avoid the extra subset dependency.
| __m256 h = _mm256_add_ps(_mm512_extractf32x8_ps(acc, 0), | |
| _mm512_extractf32x8_ps(acc, 1)); | |
| __m128 q = _mm_add_ps(_mm256_extractf128_ps(h, 0), | |
| _mm256_extractf128_ps(h, 1)); | |
| q = _mm_add_ps(q, _mm_movehl_ps(q, q)); | |
| q = _mm_add_ss(q, _mm_movehdup_ps(q)); | |
| *s = _mm_cvtss_f32(q); | |
| *s = _mm512_reduce_add_ps(acc); |
|
Good new our first CPU PR just got merged int llama.cpp master branch now, if you are still working on this please rebase with PrismML's master (just pulled the main llama.cpp) Changes: Q1_0_g128 naming is gone now, the original Q1_0 with group size 32 was deleted and Q1_0_g128 was renamed to Q1_0 now by default has group size 128. https://github.com/PrismML-Eng/llama.cpp/tree/master This one only has generic cpu (slow), and ARM NEON path, planning to gather the best x86 kernels from here and to send a PR there (and tag all the contributers). |
|
There is a lot of CPU PRs, planning to gether all in one and then send to the main llama.cpp |
Summary
Bug
sumiwas declaredintbut accumulatedfloatpartial products (d1 * sumi_block), silently truncating to zero for small scale values. Affects both the x86 and generic fallback kernels.Changes
ggml/src/ggml-cpu/arch/x86/quants.c
int sumi->float sumiin scalar fallback#if defined(__AVX512BW__)path: sign-extend int8->int16, mask-negate via_mm512_mask_sub_epi16, pairwise reduce via_mm512_madd_epi16, float accumulate via_mm512_fmadd_ps, single horizontal sum at the endggml/src/ggml-cpu/quants.c
int sumi->float sumiin generic fallbackBenchmarks (Bonsai-8B, CPU-only, Intel Ice Lake AVX-512)