MLPerf Inference Benchmark Results

This repository contains our MLPerf Inference v6.0 benchmark results and setup documentation.

Results Summary

NVIDIA Hardware Results

Model Category	Model	GPU Configuration	Offline Scenario (throughput)	Server Scenario (throughput)	Software Stack
Vision Model	Qwen3-VL-235B-A22B	8x H200	18.02 samples/sec	11.05 samples/sec	RHEL, vLLM
		8x B200	79.04 samples/sec	67.86 samples/sec	RHEL, vLLM
Reasoning Model	gpt-oss-120b	8x H200	28,680 tokens/sec	24,103.19 tokens/sec	OpenShift, llm-d, vLLM
		8x B200 (180 GB)	93,070.70 tokens/sec	71,588.13 tokens/sec	OpenShift, llm-d, vLLM
Speech2Text	Whisper	2x L40S	3,646.91 tokens/sec	N/A	RHEL, vLLM
		8x H200	36,395.70 tokens/sec	N/A	RHEL, vLLM

AMD Hardware Results

Model Category	Model	GPU Configuration	Offline Scenario (throughput)	Server Scenario (throughput)	Software Stack
Dense Model	llama-2-70b	8x MI350x (with SMC)	91,933.10 tokens/sec	89,019.65 tokens/sec	vLLM, RHEL, AMD
Reasoning Model	gpt-oss-120b	8x MI350x (with SMC)	64,293.30 tokens/sec	58,373.27 tokens/sec	vLLM, RHEL, AMD

Submission Results

For detailed submission results, see: MLCommons Inference Results

Setup Documentation

Detailed setup and configuration instructions for each benchmark:

GPT-OSS-120B: See harness/README.md for harness setup and configuration
Whisper: See speech2text/Whisper_Setup.md for setup instructions
Qwen3-VL: See multimodal/qwen3-vl/README.md for setup instructions

Repository Structure

.
├── README.md (this file)
└── mlperf-inference-6.0-redhat/
    ├── harness/              # GPT-OSS-120B harness and configuration
    ├── speech2text/          # Whisper benchmark setup
    ├── multimodal/           # Qwen3-VL vision model setup
    └── language/             # Language model benchmarks

About MLPerf Inference

MLPerf Inference is a benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios. For more information, visit MLCommons.

Scenarios

Offline: Batch inference maximizing throughput
Server: Online serving under TTFT (Time To First Token) and TPOT (Time Per Output Token) constraints

Name		Name	Last commit message	Last commit date
Latest commit History 2,135 Commits
.github		.github
automotive/3d-object-detection		automotive/3d-object-detection
calibration		calibration
compliance		compliance
docs		docs
graph/R-GAT		graph/R-GAT
harness		harness
language		language
loadgen		loadgen
lon		lon
multimodal/qwen3-vl		multimodal/qwen3-vl
profiling		profiling
recommendation		recommendation
retired_benchmarks		retired_benchmarks
setup		setup
speech2text		speech2text
text_to_image		text_to_image
text_to_video/wan-2.2-t2v-a14b		text_to_video/wan-2.2-t2v-a14b
tools		tools
vision		vision
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
DEPS		DEPS
LICENSE.md		LICENSE.md
README.md		README.md
SubmissionExample.ipynb		SubmissionExample.ipynb
Submission_Guidelines.md		Submission_Guidelines.md
main.py		main.py
mkdocs.yml		mkdocs.yml
mlperf.conf		mlperf.conf
tests.md		tests.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLPerf Inference Benchmark Results

Results Summary

NVIDIA Hardware Results

AMD Hardware Results

Submission Results

Setup Documentation

Repository Structure

About MLPerf Inference

Scenarios

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLPerf Inference Benchmark Results

Results Summary

NVIDIA Hardware Results

AMD Hardware Results

Submission Results

Setup Documentation

Repository Structure

About MLPerf Inference

Scenarios

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages