0.30.664.461 I srv llama_server: model loaded 0.30.664.474 I srv llama_server: server is listening on http://0.0.0.0:8010 0.30.664.495 I srv update_slots: all slots are idle 0.39.673.814 I srv params_from_: Chat format: peg-native 0.39.685.962 I slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1 0.40.331.044 I srv recurrent_sh: shrunk recurrent state to 1 cells for prompt cache (before prompt cache save/load, removed 1 backup cells) 0.40.331.049 I srv get_availabl: updating prompt cache 0.40.331.057 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 0.40.331.063 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 262144.000 MiB, 256000 tokens, 274877906944 est) 0.40.928.309 I srv recurrent_ex: expanded recurrent state to 2 cells after prompt cache (after prompt cache save/load) 0.40.928.313 I srv get_availabl: prompt cache update took 597.26 ms 0.40.929.138 I reasoning-budget: activated, budget=2147483647 tokens 0.40.929.166 I slot launch_slot_: id 0 | task 0 | processing task, is_child = 0 /app/ggml/src/ggml-backend-meta.cpp:814: GGML_ASSERT(src_ss[i].axis != GGML_BACKEND_SPLIT_AXIS_UNKNOWN) failed libggml-base.so.0(+0x1c9bb)[0x7fdf367789bb] libggml-base.so.0(ggml_print_backtrace+0x21f)[0x7fdf36778e3f] libggml-base.so.0(ggml_abort+0x152)[0x7fdf36779012] libggml-base.so.0(+0x49c49)[0x7fdf367a5c49] libggml-base.so.0(+0x40a99)[0x7fdf3679ca99] libggml-base.so.0(+0x49dbf)[0x7fdf367a5dbf] libggml-base.so.0(ggml_gallocr_alloc_graph+0x474)[0x7fdf367902e4] libggml-base.so.0(ggml_backend_sched_alloc_graph+0x111)[0x7fdf367966e1] libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xd4)[0x7fdf36942344] libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0xb19)[0x7fdf36959389] libllama.so.0(llama_decode+0x10)[0x7fdf3695e1f0] libllama-common.so.0(_ZN33common_speculative_impl_draft_mtp7processERK11llama_batch+0x2dd)[0x7fdf36fce83d] libllama-common.so.0(_Z26common_speculative_processP18common_speculativeRK11llama_batch+0x2d)[0x7fdf36fbd18d] libllama-server-impl.so(_ZN19server_context_impl12update_slotsEv+0x3d98)[0x7fdf37796208] libllama-server-impl.so(_ZN12server_queue10start_loopEl+0x221)[0x7fdf37824641] libllama-server-impl.so(_Z12llama_serveriPPc+0x29c7)[0x7fdf376d52b7] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fdf371bbd90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fdf371bbe40] ./llama-server(+0x12b5)[0x5b6bd46a92b5]
Name and Version
compiled to use --sm tensor, but with qwen3.6 27b crashes.
Operating systems
Linux
GGML backends
CUDA
Hardware
2xXeon8628 + 4x3090
Models
No response
Problem description & steps to reproduce
compile with cuda_nccl on, and use --sm tensor
First Bad Commit
No response
Relevant log output
Logs