docs: add energy efficiency considerations to bitsandbytes quantization guide#44407
docs: add energy efficiency considerations to bitsandbytes quantization guide#44407hongping-zh wants to merge 1 commit intohuggingface:mainfrom
Conversation
|
Hi @SunMarc, Thank you for the suggestion in bitsandbytes-foundation/bitsandbytes#1882 to update the transformers documentation. I've added a concise "Energy Efficiency Considerations" section based on systematic benchmarking across multiple GPU architectures. I would greatly appreciate your feedback on this documentation addition. Additionally, I have more comprehensive research on this topic that I'm preparing for academic publication. If you're interested, I'd be happy to discuss potential collaboration opportunities or seek your guidance on the work. Looking forward to your thoughts! |
|
Thank you for reviewing! Please let me know if any changes are needed for the CI checks, happy to address them. |
|
Hi @SunMarc, thank you for the approval! Just checking — is there anything else needed on my end before this can be merged? Happy to fix any CI issues if needed. Thanks! |
|
Quantization_Energy_Crossover_Analysis.pdf Hi Marc! Hope you're doing well. Following up on the energy efficiency guidance that was merged from this PR, I've expanded the study significantly and discovered something I think you'll find interesting: a quantization-energy crossover effect. Key finding: NF4 quantization actually increases energy consumption for models below ~3.3B parameters (up to +29%) due to dequantization overhead, while providing expected savings for larger models. This holds across 5 GPU generations including RTX-5090. I've uploaded the crossover analysis figure showing this effect. Would you be open to reviewing it and potentially providing a brief endorsement for the paper? The findings extend the guidance already in the Optimum docs with hardware-specific thresholds. Happy to share the full details via email if you're interested. Thanks again for your support on the original PR! |
Summary
Adds an "Energy Efficiency Considerations" section to the bitsandbytes quantization documentation, providing practical guidance on the energy implications of different quantization configurations.
Motivation
This addresses the suggestion from @SunMarc in bitsandbytes-foundation/bitsandbytes#1882 to update transformers documentation with energy efficiency insights based on systematic benchmarking.
Changes
Added a new section covering:
llm_int8_threshold=6.0as a justified accuracy trade-offthreshold=0.0is not recommended (only 3% energy savings vs 25% perplexity degradation)Testing
References