Skip to content

Commit c563899

Browse files
author
Basu Jindal
committed
add stats
1 parent a018231 commit c563899

6 files changed

Lines changed: 1448 additions & 1 deletion

File tree

blogs/quantization.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,10 @@ FP8 is only supported in H100 GPUs but storing approximations in fp8 can be accu
170170

171171
Biases are not converted because to preserve the accuracy of a typical addmm operation, they must be converted with a scale that is equal to the product of the input and weight scales, which leads to a ridiculously small scale, and conversely requires a very high bitwidth to avoid clipping.
172172

173+
## Latency and Bandwidth analysis
174+
175+
https://www.youtube.com/watch?v=adA9AMu4_Kc
176+
173177

174178
## Recommended Reading & References
175179

0 commit comments

Comments
 (0)