While --mmprojcpu is a nice option, but could we offload the mmproj to another GPU instead?
Running it on the CPU can be quiet slow, using an older GPU that's just lying around sounds like a better option.
It would be great if the offloading GPU doesn't need to use the same framework e.g. using cuda for the LLM and vulkan for mmproj.
For example I was playing around with --visionmintokens at 2048 to improve OCR for screenshots and it's really slow on the CPU.
Huihui-Qwen3.6-27B-abliterated.mmproj-f16.gguf is only around 900 MB so even a 4 GB GPU might work?
While
--mmprojcpuis a nice option, but could we offload the mmproj to another GPU instead?Running it on the CPU can be quiet slow, using an older GPU that's just lying around sounds like a better option.
It would be great if the offloading GPU doesn't need to use the same framework e.g. using cuda for the LLM and vulkan for mmproj.
For example I was playing around with
--visionmintokensat 2048 to improve OCR for screenshots and it's really slow on the CPU.Huihui-Qwen3.6-27B-abliterated.mmproj-f16.ggufis only around 900 MB so even a 4 GB GPU might work?