Skip to content

performance of device_local_copy does not meet expectations on NVIDIA H20-3e #48

@yoqiu-amd

Description

@yoqiu-amd

hi @esitaridi

Recently I tried to do device_local_copy test on NVIDIA H20-3e, according to official data indicators, GPU memory bandwidth should be 4.8TB/s.
But I only get 1TB/s, is there a problem somewhere? here are my test results and basic information about the server.

thanks,
henry.

Image
root@sglang-host-cuda:/workspace/yongjie/nvbandwidth# ./nvbandwidth -t device_local_copy
nvbandwidth Version: v0.8
Built from Git version: v0.8

CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 550.127.08

sglang-host-cuda
Device 0: NVIDIA H20-3e (00000000:18:00)
Device 1: NVIDIA H20-3e (00000000:38:00)
Device 2: NVIDIA H20-3e (00000000:49:00)
Device 3: NVIDIA H20-3e (00000000:59:00)
Device 4: NVIDIA H20-3e (00000000:9b:00)
Device 5: NVIDIA H20-3e (00000000:bb:00)
Device 6: NVIDIA H20-3e (00000000:ca:00)
Device 7: NVIDIA H20-3e (00000000:da:00)

Running device_local_copy.
memcpy local GPU(column) bandwidth (GB/s)
           0         1         2         3         4         5         6         7
 0   1116.30   1116.30   1115.72   1116.16   1116.16   1116.88   1117.17   1116.30

SUM device_local_copy 8930.99

NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.

root@sglang-host-cuda:/workspace/yongjie/nvbandwidth# nvidia-smi
Wed Sep 17 02:03:25 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08             Driver Version: 550.127.08     CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H20-3e                  Off |   00000000:18:00.0 Off |                    0 |
| N/A   38C    P0            120W /  500W |  137557MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H20-3e                  Off |   00000000:38:00.0 Off |                    0 |
| N/A   33C    P0            114W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H20-3e                  Off |   00000000:49:00.0 Off |                    0 |
| N/A   38C    P0            119W /  500W |  137683MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H20-3e                  Off |   00000000:59:00.0 Off |                    0 |
| N/A   33C    P0            117W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H20-3e                  Off |   00000000:9B:00.0 Off |                    0 |
| N/A   33C    P0            117W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H20-3e                  Off |   00000000:BB:00.0 Off |                    0 |
| N/A   39C    P0            118W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H20-3e                  Off |   00000000:CA:00.0 Off |                    0 |
| N/A   33C    P0            118W /  500W |  137703MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H20-3e                  Off |   00000000:DA:00.0 Off |                    0 |
| N/A   39C    P0            122W /  500W |  136983MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+


Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions