docs: add energy efficiency considerations to bitsandbytes quantization guide by hongping-zh · Pull Request #44407 · huggingface/transformers

hongping-zh · 2026-03-03T04:42:57Z

Summary

Adds an "Energy Efficiency Considerations" section to the bitsandbytes quantization documentation, providing practical guidance on the energy implications of different quantization configurations.

Motivation

This addresses the suggestion from @SunMarc in bitsandbytes-foundation/bitsandbytes#1882 to update transformers documentation with energy efficiency insights based on systematic benchmarking.

Changes

Added a new section covering:

INT8 mixed-precision trade-offs: Explains the 17-33% energy overhead of default llm_int8_threshold=6.0 as a justified accuracy trade-off
Threshold configuration guidance: Documents why threshold=0.0 is not recommended (only 3% energy savings vs 25% perplexity degradation)
Model size considerations: NF4 efficiency crossover point at ~5B parameters
Batch size impact: 84-96% energy reduction from batch_size=1 to batch_size=8-64

Testing

Documentation follows existing markdown formatting
Content is concise and actionable for users
Links verified

References

Related bitsandbytes PR: docs: add quantization and energy efficiency guide bitsandbytes-foundation/bitsandbytes#1882
Benchmark data: https://github.com/hongping-zh/ecocompute-ai
Interactive dashboard: https://hongping-zh.github.io/ecocompute-dynamic-eval/

…on guide

hongping-zh · 2026-03-03T04:50:44Z

Hi @SunMarc,

Thank you for the suggestion in bitsandbytes-foundation/bitsandbytes#1882 to update the transformers documentation. I've added a concise "Energy Efficiency Considerations" section based on systematic benchmarking across multiple GPU architectures.

I would greatly appreciate your feedback on this documentation addition. Additionally, I have more comprehensive research on this topic that I'm preparing for academic publication. If you're interested, I'd be happy to discuss potential collaboration opportunities or seek your guidance on the work.

Looking forward to your thoughts!

SunMarc

Thanks !

hongping-zh · 2026-03-07T09:03:26Z

Thank you for reviewing! Please let me know if any changes are needed for the CI checks, happy to address them.

hongping-zh · 2026-03-09T03:51:31Z

Hi @SunMarc, thank you for the approval! Just checking — is there anything else needed on my end before this can be merged? Happy to fix any CI issues if needed. Thanks!

hongping-zh · 2026-03-25T11:53:49Z

Quantization_Energy_Crossover_Analysis.pdf
@SunMarc

Hi Marc! Hope you're doing well.

Following up on the energy efficiency guidance that was merged from this PR, I've expanded the study significantly and discovered something I think you'll find interesting: a quantization-energy crossover effect.

Key finding: NF4 quantization actually increases energy consumption for models below ~3.3B parameters (up to +29%) due to dequantization overhead, while providing expected savings for larger models. This holds across 5 GPU generations including RTX-5090.

I've uploaded the crossover analysis figure showing this effect. Would you be open to reviewing it and potentially providing a brief endorsement for the paper? The findings extend the guidance already in the Optimum docs with hardware-specific thresholds.

Happy to share the full details via email if you're interested. Thanks again for your support on the original PR!

docs: add energy efficiency considerations to bitsandbytes quantizati…

67967e8

…on guide

hongping-zh mentioned this pull request Mar 3, 2026

docs: add quantization and energy efficiency guide bitsandbytes-foundation/bitsandbytes#1882

Open

SunMarc approved these changes Mar 3, 2026

View reviewed changes

This was referenced Mar 9, 2026

docs: add empirical energy efficiency data to quantization concept guide huggingface/optimum#2410

Merged

[Feature] Add energy consumption metrics to benchmark suite vllm-project/vllm#36440

Open

hongping-zh mentioned this pull request Mar 9, 2026

[Discussion] Adding energy consumption metrics to MLPerf Inference Benchmark mlcommons/inference#2558

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add energy efficiency considerations to bitsandbytes quantization guide#44407

docs: add energy efficiency considerations to bitsandbytes quantization guide#44407
hongping-zh wants to merge 1 commit intohuggingface:mainfrom
hongping-zh:docs/bitsandbytes-energy-efficiency

hongping-zh commented Mar 3, 2026

Uh oh!

hongping-zh commented Mar 3, 2026

Uh oh!

SunMarc left a comment

Uh oh!

hongping-zh commented Mar 7, 2026

Uh oh!

hongping-zh commented Mar 9, 2026

Uh oh!

hongping-zh commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hongping-zh commented Mar 3, 2026

Summary

Motivation

Changes

Testing

References

Uh oh!

hongping-zh commented Mar 3, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

hongping-zh commented Mar 7, 2026

Uh oh!

hongping-zh commented Mar 9, 2026

Uh oh!

hongping-zh commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants