Skip to content

[Feature Request] Allow offloading mmproj to another GPU #2229

@whocares0101

Description

@whocares0101

While --mmprojcpu is a nice option, but could we offload the mmproj to another GPU instead?
Running it on the CPU can be quiet slow, using an older GPU that's just lying around sounds like a better option.
It would be great if the offloading GPU doesn't need to use the same framework e.g. using cuda for the LLM and vulkan for mmproj.

For example I was playing around with --visionmintokens at 2048 to improve OCR for screenshots and it's really slow on the CPU.
Huihui-Qwen3.6-27B-abliterated.mmproj-f16.gguf is only around 900 MB so even a 4 GB GPU might work?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions