[Feature Request]: new attention optimization

### Is there an existing issue for this?

- [X] I have searched the existing issues and checked the recent builds/commits

### What would your feature do ?

Wondering if it is possible to incorporate metal-flash-attention into this project. Webui currently uses the mps backend, meanwhile metal flash attention is an open source alternative to mps based off of dao AI labs flashattention v2. It's mainly built for apple silicon gpus and is much faster and less resource hungry than mps.  Not sure about amd radeon gpus on macos though. I used it on the draw things app and it's really good. I know that it can't do fp64 calculations and some other stuff I don't understand but I though I should share. The linked repo has more information and benchmarks. 
https://github.com/philipturner/metal-flash-attention
This article article talks more in detail about mfa: https://engineering.drawthings.ai/integrating-metal-flashattention-accelerating-the-heart-of-image-generation-in-the-apple-ecosystem-16a86142eb18

### Proposed workflow

1. Go to .... 
2. Press ....
3. ...


### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: new attention optimization #39

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request]: new attention optimization #39

Description

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions