[Common] Use specialized unfused MXFP8 cast kernels by default#2958
[Common] Use specialized unfused MXFP8 cast kernels by default#2958Oleg-Goncharov wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Greptile SummaryThis PR makes the specialized unfused MXFP8 cast kernels the default by removing the
Confidence Score: 5/5Safe to merge — the change is a straightforward promotion of already-tested specialized kernels from opt-in to default. The specialized kernels were previously exercisable via an env var, so the code paths themselves are not new. The COLWISE exclusion guard is logically correct (no COLWISE specialization exists), and removing the dead switch branch eliminates a misleading warning. No behavioral change occurs for COLWISE or GEMM-swizzled-scales paths, which continue to use the original kernels unchanged. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[quantize called] --> B{hasSpec AND\nnot GEMM swizzled\nnot COLWISE?}
B -- YES --> C{scaling_type?}
B -- NO --> G[Generic kernel path]
C -- ROWWISE --> D[specialized cast-only kernel\nCastTraits rowwise=true colwise=false]
C -- BIDIMENSIONAL --> E[specialized cast-only kernel\nCastTraits rowwise=true colwise=true]
G --> H{scaling_type?}
H -- ROWWISE --> I[Original ROWWISE kernel]
H -- COLWISE --> J[Original COLWISE kernel]
H -- BIDIMENSIONAL --> K[Original BIDIMENSIONAL kernel]
Reviews (2): Last reviewed commit: "Removed dead code" | Re-trigger Greptile |
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
|
/te-ci |
Description
This PR enables the fast unfused MXFP8 cast kernels by default.
Previously, these kernels were gated behind an environment variable and therefore were not used unless explicitly enabled. This change makes the specialized cast-only path the default behavior.
Type of change
Changes
Checklist: