llama-bee | 0.54.850.645 I srv init: init: chat template, thinking = 1
llama-bee | 0.54.850.829 I srv llama_server: model loaded
llama-bee | 0.54.850.856 I srv llama_server: server is listening on http://0.0.0.0:8001
llama-bee | 0.54.850.915 I srv update_slots: all slots are idle
llama-bee | 1.46.472.187 I srv params_from_: Chat format: peg-native
llama-bee | 1.46.659.935 I slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1
llama-bee | 1.46.660.105 I slot get_availabl: id 0 | task -1 | adaptive dm: reset state for LRU slot selection
llama-bee | 1.46.660.140 I srv get_availabl: updating prompt cache
llama-bee | 1.46.660.221 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000
llama-bee | 1.46.660.266 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 0.000 MiB, 100096 tokens, 100096 est)
llama-bee | 1.46.660.354 I srv get_availabl: prompt cache update took 0.14 ms
llama-bee | 1.46.664.695 I reasoning-budget: activated, budget=4096 tokens
llama-bee | 1.46.664.838 I slot launch_slot_: id 0 | task 0 | processing task, is_child = 0
llama-bee | 1.50.895.420 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 4096, progress = 0.13, t = 4.23 s / 968.23 tokens per second
llama-bee | 1.53.062.929 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 6144, progress = 0.19, t = 6.40 s / 960.32 tokens per second
llama-bee | 1.55.270.468 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 8192, progress = 0.25, t = 8.61 s / 951.96 tokens per second
llama-bee | 1.57.532.053 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 10240, progress = 0.32, t = 10.87 s / 942.30 tokens per second
llama-bee | 1.59.853.100 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 12288, progress = 0.38, t = 13.19 s / 931.75 tokens per second
llama-bee | 2.02.226.026 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 14336, progress = 0.44, t = 15.56 s / 921.28 tokens per second
llama-bee | 2.04.660.271 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 16384, progress = 0.50, t = 18.00 s / 910.46 tokens per second
llama-bee | 2.07.136.469 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 18432, progress = 0.57, t = 20.47 s / 900.38 tokens per second
llama-bee | 2.09.678.344 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 20480, progress = 0.63, t = 23.01 s / 889.92 tokens per second
llama-bee | 2.12.277.242 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 22528, progress = 0.69, t = 25.61 s / 879.58 tokens per second
llama-bee | 2.14.923.241 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 24576, progress = 0.76, t = 28.26 s / 869.69 tokens per second
llama-bee | 2.17.637.824 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 26624, progress = 0.82, t = 30.97 s / 859.59 tokens per second
llama-bee | 2.20.397.841 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 28672, progress = 0.88, t = 33.73 s / 849.97 tokens per second
llama-bee | 2.23.206.951 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 30720, progress = 0.95, t = 36.54 s / 840.68 tokens per second
llama-bee | 2.24.921.990 I dflash: drafter K/V projection cache enabled (1024-token window)
llama-bee | 2.24.923.988 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 31930, progress = 0.98, t = 38.26 s / 834.58 tokens per second
llama-bee | 2.25.672.119 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 32421, progress = 1.00, t = 39.01 s / 831.16 tokens per second
llama-bee | 2.25.782.860 I slot create_check: id 0 | task 0 | created context checkpoint 1 of 128 (pos_min = 32420, pos_max = 32420, n_tokens = 32421, size = 247.185 MiB)
llama-bee | 2.25.871.815 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 32442, progress = 1.00, t = 39.21 s / 827.46 tokens per second
llama-bee | 2.28.425.830 I slot create_check: id 0 | task 0 | created context checkpoint 2 of 128 (pos_min = 32441, pos_max = 32441, n_tokens = 32442, size = 249.236 MiB)
llama-bee | 2.28.782.907 I slot operator(): id 0 | task 0 | adaptive dm profit: cur=0 recommended=4 score=12.0 action=apply
llama-bee | /src/tools/server/server-context.cpp:5124: speculative recurrent rollback requires backup sequences when bounded snapshots are unavailable
llama-bee |
llama-bee | /usr/local/lib/libggml-base.so.0(+0x1d44b)[0x7f499970d44b]
llama-bee | /usr/local/lib/libggml-base.so.0(ggml_print_backtrace+0x21c)[0x7f499970d8cc]
llama-bee | /usr/local/lib/libggml-base.so.0(ggml_abort+0x15b)[0x7f499970daab]
llama-bee | /usr/local/lib/libllama-server-impl.so(_ZN19server_context_impl12update_slotsEv+0x103de)[0x7f499a7ca03e]
llama-bee | /usr/local/lib/libllama-server-impl.so(_ZN12server_queue10start_loopEl+0x221)[0x7f499a864541]
llama-bee | /usr/local/lib/libllama-server-impl.so(_Z12llama_serveriPPc+0x250c)[0x7f499a703c9c]
llama-bee | /usr/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f499a1901ca]
llama-bee | /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f499a19028b]
llama-bee | llama-server(+0x12a5)[0x5620ec79e2a5]
command:
# === Model ===
- -m
- /cache/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf
- --alias
- "qwen"
- --mmproj
- /cache/mmproj-Qwen-Qwen3.6-27B-Q6_K.gguf
- --no-mmproj-offload
- --spec-ngram-mod-n-match
- "24"
- --spec-ngram-mod-n-min
- "48"
- --spec-ngram-mod-n-max.
- "64"
- --spec-draft-model.
- /cache/Qwen3.6-27B-DFlash-IQ4_XS.gguf
- --spec-type.
- "dflash,ngram-mod"
- --spec-dflash-cross-ctx.
- "1024"
- --spec-draft-ngl.
- "all"
- --cache-ram.
- "-1"
- --host
- "0.0.0.0"
- --port
- "8001"
- -ngl
- "all"
- --ctx-size
- "100000"
- --fit
- "off"
- --no-context-shift
- --checkpoint-min-step
- "0"
- --ctx-checkpoints
- "128"
- --no-warmup
- --swa-full
- --temp
- "1.0" #"0.7" #"0.7" "1.0"
- --top-p.
- "0.6" #"0.8" #"0.8" "0.95"
- --top-k
- "20"
- --min-p
- "0.1"
- --batch-size
- "2048"
- --ubatch-size
- "512"
- --threads
- "10"
- --threads-batch
- "14"
- --no-host
- -ctk
- "q5_0"
- -ctv
- "q4_1"
- --flash-attn
- "on"
- --kv-unified
- --cache-reuse
- "512"
- --perf
- --slot-prompt-similarity
- "0.1"
- --no-mmap
- --mlock
- --parallel
- "1"
- --prio
- "2"
- --jinja
- --chat-template-file
- /cache/qwen-3.6-chat-template-thinking.jinja
- --chat-template-kwargs
- '{"preserve_thinking": true}'
- --reasoning-budget
- "4096"
- --reasoning-budget-message
- "Budżet myślenia wyczerpany. Prawdopodobnie utkwiłem w pętli albo zbyt komplikuję. Muszę natychmiast przestać i przejść do odpowiedzi."
- --reasoning
- "on"
Name and Version
llama-cli --version
version: 10144 (9f1bcc9)
built with GNU 13.3.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 4000 ADA 20GB
Models
No response
Problem description & steps to reproduce
llama-server crashes at prompt processing
First Bad Commit
No response
Relevant log output