Skip to content

Skip vfio-pci unbind when GPUs already bound in VFIO mode#146

Closed
karthikvetrivel wants to merge 1 commit intoNVIDIA:mainfrom
karthikvetrivel:fix/skip-vfio-unbind-when-already-bound
Closed

Skip vfio-pci unbind when GPUs already bound in VFIO mode#146
karthikvetrivel wants to merge 1 commit intoNVIDIA:mainfrom
karthikvetrivel:fix/skip-vfio-unbind-when-already-bound

Conversation

@karthikvetrivel
Copy link
Copy Markdown
Member

@karthikvetrivel karthikvetrivel commented Jan 7, 2026

Relevant PR: NVIDIA/gpu-operator#2079

Description

Prevents unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet. Currently, k8s-driver-manager unconditionally unbinds all GPUs from vfio-pci on startup, even when the desired state is already vfio-pci. This disrupts active VM workloads using GPU passthrough (KubeVirt, Kata Containers).

Solution

Check for GPU_WORKLOAD_CONFIG environment variable to determine if the pod is running in VFIO workload mode. When set to vm-passthrough, skip the vfio-pci unbind operation since GPUs should remain bound to vfio-pci.

Behavior by DaemonSet Context

DaemonSet GPU_WORKLOAD_CONFIG Unbind vfio-pci? Reason
nvidia-driver (not set) Yes Prepares GPUs for NVIDIA kernel driver
vfio-manager vm-passthrough No GPUs should remain bound to vfio-pci
vgpu-manager (not set) Yes Prepares GPUs for vGPU manager

@karthikvetrivel karthikvetrivel marked this pull request as draft January 7, 2026 17:09
@karthikvetrivel karthikvetrivel marked this pull request as ready for review January 7, 2026 20:56
Comment thread cmd/driver-manager/main.go Outdated
Comment thread cmd/driver-manager/main.go Outdated
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
@karthikvetrivel karthikvetrivel force-pushed the fix/skip-vfio-unbind-when-already-bound branch from 750db0f to 263b723 Compare January 29, 2026 15:09
@karthikvetrivel
Copy link
Copy Markdown
Member Author

@cdesiniotis @tariq1890

What do you think about this approach? Are there any scenarios where the workload config is vm-passthrough and we'd want to still unbind vfio-pci?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants