Is there an existing issue for this?
What would your feature do ?
Wondering if it is possible to incorporate metal-flash-attention into this project. Webui currently uses the mps backend, meanwhile metal flash attention is an open source alternative to mps based off of dao AI labs flashattention v2. It's mainly built for apple silicon gpus and is much faster and less resource hungry than mps. Not sure about amd radeon gpus on macos though. I used it on the draw things app and it's really good. I know that it can't do fp64 calculations and some other stuff I don't understand but I though I should share. The linked repo has more information and benchmarks.
https://github.com/philipturner/metal-flash-attention
This article article talks more in detail about mfa: https://engineering.drawthings.ai/integrating-metal-flashattention-accelerating-the-heart-of-image-generation-in-the-apple-ecosystem-16a86142eb18
Proposed workflow
- Go to ....
- Press ....
- ...
Additional information
No response
Is there an existing issue for this?
What would your feature do ?
Wondering if it is possible to incorporate metal-flash-attention into this project. Webui currently uses the mps backend, meanwhile metal flash attention is an open source alternative to mps based off of dao AI labs flashattention v2. It's mainly built for apple silicon gpus and is much faster and less resource hungry than mps. Not sure about amd radeon gpus on macos though. I used it on the draw things app and it's really good. I know that it can't do fp64 calculations and some other stuff I don't understand but I though I should share. The linked repo has more information and benchmarks.
https://github.com/philipturner/metal-flash-attention
This article article talks more in detail about mfa: https://engineering.drawthings.ai/integrating-metal-flashattention-accelerating-the-heart-of-image-generation-in-the-apple-ecosystem-16a86142eb18
Proposed workflow
Additional information
No response