NVIDIA Open GPU Kernel Modules Version
595.71.05
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 24.04.4 LTS
Kernel Release
Linux saraoamd 6.17.0-20-generic #20~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 19 01:28:37 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA RTX PRO 5000 Blackwell
Describe the bug
I'm trying to do GPUDirect RDMA to have a ConnectX-7 NIC transmit data from GPU memory allocated by Vulkan, by exporting a dma-buf fd from Vulkan and importing it to ibverbs using ibv_reg_dmabuf_mr. The latter fails because it can't obtain the sg_table: it fails out at this line. I am using Vulkan memory with the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT set, so I'm guessing it must be in the BAR1 space and theoretically possible to map.
This sounds like it might be related to #1037, which is also about trying to use memory allocated on an NVIDIA GPU from other drivers.
To Reproduce
- RDMA-capable GPU, with driver 595.71.05
- mlx5-based NIC e.g. ConnectX-7, with DOCA 3.3.0. Mine is in Ethernet mode and configured with an IP address etc, but I doubt that makes a difference.
- Vulkan libraries installed.
- Compile this code with
gcc -o nvidia-vulkan-dmabuf nvidia-vulkan-dmabuf.c -Wall -g -lvulkan -libverbs
- Run it.
Output I get (the FD may change from run to run):
Successfully obtained dma-buf: fd = 49
Using IBV device mlx5_0
ibv_reg_dmabuf_mr failed: Cannot allocate memory
Expected output: "Successfully created MR from dmabuf"
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
The nvidia-bug-report output might show that I'm using a modified version of the kernel module; that's just pr_debug statements I added to narrow down the code flow. The problem was first observed with an unmodified version (and on 595.58.03).
I ticked the box to say this only happens on the open driver; I didn't actually test because AFAIK the VK_EXT_external_memory_dma_buf is only available on the open driver.
NVIDIA Open GPU Kernel Modules Version
595.71.05
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 24.04.4 LTS
Kernel Release
Linux saraoamd 6.17.0-20-generic #20~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 19 01:28:37 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA RTX PRO 5000 Blackwell
Describe the bug
I'm trying to do GPUDirect RDMA to have a ConnectX-7 NIC transmit data from GPU memory allocated by Vulkan, by exporting a dma-buf fd from Vulkan and importing it to ibverbs using ibv_reg_dmabuf_mr. The latter fails because it can't obtain the sg_table: it fails out at this line. I am using Vulkan memory with the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT set, so I'm guessing it must be in the BAR1 space and theoretically possible to map.
This sounds like it might be related to #1037, which is also about trying to use memory allocated on an NVIDIA GPU from other drivers.
To Reproduce
gcc -o nvidia-vulkan-dmabuf nvidia-vulkan-dmabuf.c -Wall -g -lvulkan -libverbsOutput I get (the FD may change from run to run):
Expected output: "Successfully created MR from dmabuf"
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
The nvidia-bug-report output might show that I'm using a modified version of the kernel module; that's just pr_debug statements I added to narrow down the code flow. The problem was first observed with an unmodified version (and on 595.58.03).
I ticked the box to say this only happens on the open driver; I didn't actually test because AFAIK the VK_EXT_external_memory_dma_buf is only available on the open driver.