I rkllm: rkllm-toolkit version: 1.2.1b1, max_context_limit: 16384, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8_G256
I rkllm: Enabled cpus: [0, 1, 2, 3, 4, 5, 6, 7]
I rkllm: Enabled cpus num: 8
2025-10-26 06:53:48,920 - rkllama.worker - INFO - Worker for model qwen3-4b-16k:g256-o1 created and running...
2025-10-26 06:53:50,002 - rkllama.worker - INFO - Running inference for model qwen3-4b-16k:g256-o1...
I'm Terry, your tech assistant. Let me help you update Lobe Chat.
Would you like me to proceed with the update? If so, I'll call the appropriate function to update your Lobe Chat instance.
2025-10-26 06:54:06,903 - werkzeug - INFO - 172.18.0.3 - - [26/Oct/2025 06:54:06] "POST /api/chat HTTP/1.1" 200 -
2025-10-26 06:54:46,015 - werkzeug - INFO - 172.18.0.3 - - [26/Oct/2025 06:54:46] "GET /api/tags HTTP/1.1" 200 -
FROM: Qwen3-8B-rk3588-w8a8_g512-opt-1-hybrid-ratio-1.0.rkllm
HuggingFace Path: dulimov/Qwen3-8B-rk3588-1.2.1-unsloth-16k
I rkllm: rkllm-runtime version: 1.2.2, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/qwen3-8b-16k/Qwen3-8B-rk3588-w8a8_g512-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.1b1, max_context_limit: 16384, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8_G512
E RKNN: [06:55:12.197] failed to allocate handle, ret: -1, errno: 14, errstr: Bad address
E RKNN: [06:55:12.197] failed to malloc npu memory, size: 3900702720, flags: 0x2
E RKNN: [06:55:12.227] load model file error!
E rkllm: rkllm_init failed2025-10-26 06:55:12,321 - rkllama.worker - ERROR - Failed creating the worker for model 'qwen3-8b-16k': Failed to initialize RKLLM model: -1
Use case:
loaded and used qwen3-4b-16k
Sending another request with: Qwen3-8B
Device:
Orangepi 5 max - 16gb - nvme drive
Error: