Skip to content

[Discussion] Adding energy consumption metrics to MLPerf Inference Benchmark #2558

@hongping-zh

Description

@hongping-zh

Discussion: Energy Metrics for MLPerf Inference

Context

MLPerf Inference currently reports throughput and latency metrics. As AI sustainability becomes a key concern, standardized energy efficiency metrics would complement existing benchmarks.

Observation

Through systematic benchmarking of quantized LLM inference (NF4, INT8, FP16) across NVIDIA Ada Lovelace and Blackwell architectures, we found that:

  1. Quantization's energy impact is non-trivial and model-size dependent
  2. For models <3B parameters, NF4 quantization increases energy by 25-56%
  3. INT8 mixed-precision adds 17-33% energy overhead vs FP16
  4. These trade-offs are not captured by throughput/latency alone

Suggestion

Consider adding optional energy reporting to the MLPerf Inference benchmark:

  • Energy per query/token (J)
  • Average power draw (W)
  • Energy efficiency (tokens/J)

This would enable apples-to-apples energy comparison across hardware and quantization configurations.

Data

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions