-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Open
Labels
performanceAnything related to performance improvements, profiling and benchmarkingAnything related to performance improvements, profiling and benchmarking
Description
This issue is for knowing which kernels we should integrate into the library through kernels.
Currently, we leverage kernels for different attention backends (FA2, FA3, and SAGE). However, other layers can be optimized as well (RMS Norm, for example), depending on the model size and input payload being used to benchmark that.
I did take a crack at this once, i.e., replacing the norm layers with their optimized counterparts, but didn't realize any noticeable gains. But maybe this is different now.
Resources / notes
- There's a bunch of kernels we maintain in https://huggingface.co/kernels-community, which could be repurposed in this case.
- We can also work together with the community to port impactful kernels and host them through https://huggingface.co/kernels-community and make sure they are
kernelscompatible. transformersgain benefits from this paradigm in terms of latency improvements (@MekkCyber can provide details). Some relevant PRs can be found here: https://github.com/huggingface/transformers/commits?author=MekkCyber (look for PRs with titles starting with "[kernels]")
J4BEZ
Metadata
Metadata
Assignees
Labels
performanceAnything related to performance improvements, profiling and benchmarkingAnything related to performance improvements, profiling and benchmarking