[optimization] help us know which kernels we should integrate in Diffusers

This issue is for knowing which kernels we should integrate into the library through [`kernels`](https://github.com/huggingface/kernels/).

Currently, we leverage `kernels` for different attention backends (FA2, FA3, and SAGE). However, other layers can be optimized as well (RMS Norm, for example), depending on the model size and input payload being used to benchmark that.

I did take a crack at this once, i.e., replacing the norm layers with their optimized counterparts, but didn't realize any noticeable gains. But maybe this is different now.

## Resources / notes

* There's a bunch of kernels we maintain in https://huggingface.co/kernels-community, which could be repurposed in this case.
* We can also work together with the community to port impactful kernels and host them through https://huggingface.co/kernels-community and make sure they are `kernels` compatible.
* `transformers` gain benefits from this paradigm in terms of latency improvements (@MekkCyber can provide details). Some relevant PRs can be found here: https://github.com/huggingface/transformers/commits?author=MekkCyber (look for PRs with titles starting with "[kernels]")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[optimization] help us know which kernels we should integrate in Diffusers #12990

Resources / notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[optimization] help us know which kernels we should integrate in Diffusers #12990

Description

Resources / notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions