[Feature Request] Allow offloading mmproj to another GPU

While `--mmprojcpu` is a nice option, but could we offload the mmproj to another GPU instead?
Running it on the CPU can be quiet slow, using an older GPU that's just lying around sounds like a better option.
It would be great if the offloading GPU doesn't need to use the same framework e.g. using cuda for the LLM and vulkan for mmproj.

For example I was playing around with `--visionmintokens` at 2048 to improve OCR for screenshots and it's really slow on the CPU.
`Huihui-Qwen3.6-27B-abliterated.mmproj-f16.gguf` is only around 900 MB so even a 4 GB GPU might work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Allow offloading mmproj to another GPU #2229

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request] Allow offloading mmproj to another GPU #2229

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions