Feature Description
INT8 W8A8 scheme support was requested. We plan to start from adding RTN based quantization scheme first. Smoothing algo is likely needed for accuracy, will track another issue seperately
considering deployment, compressed-tensor format should be supported
Motivation and Use Case
Target for current Xeon CPU, like GNR.
Alternatives Considered
No response
Definition of Done
No response
Additional Context
No response
Feature Description
INT8 W8A8 scheme support was requested. We plan to start from adding RTN based quantization scheme first. Smoothing algo is likely needed for accuracy, will track another issue seperately
considering deployment, compressed-tensor format should be supported
Motivation and Use Case
Target for current Xeon CPU, like GNR.
Alternatives Considered
No response
Definition of Done
No response
Additional Context
No response