My GPU is NVIDIA Corporation TU104GL [Quadro RTX 4000], the GPU have 3 aux dev
When I set up the GPU for the use of kubevirt vm pass through, the script vfio-mageme.sh cannot bind all aux dev to the vfio-pci driver
Bug In https://github.com/NVIDIA/gpu-operator/blob/main/assets/state-vfio-manager/0400_configmap.yaml#L128
The function get_grapcs_aux_dev should not use if ls "/sys/bus/pci/devices/$aux_dev/" as a criterion for judgment, and should return a string array. In the functions bind_device and unbind_device, loop through this array and perform judgment and corresponding operations
lspci -Dnnkv -d 10de:
0000:52:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104GL [Quadro RTX 4000] [10de:1eb1] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 16, NUMA node 0, IOMMU group 35
Memory at b3000000 (32-bit, non-prefetchable) [size=16M]
Memory at 20ffe0000000 (64-bit, prefetchable) [size=256M]
Memory at 20fff0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 7000 [size=128]
Expansion ROM at b4000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Kernel driver in use: vfio-pci
Kernel modules: nouveau
0000:52:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 0, IOMMU group 35
Memory at b4080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
0000:52:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1) (prog-if 30 [XHCI])
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 64, NUMA node 0, IOMMU group 35
Memory at 20fff2000000 (64-bit, prefetchable) [size=256K]
Memory at 20fff2040000 (64-bit, prefetchable) [size=64K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: xhci_hcd
0000:52:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1ad9] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 255, NUMA node 0, IOMMU group 35
Memory at b4084000 (32-bit, non-prefetchable) [disabled] [size=4K]
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
My GPU is NVIDIA Corporation TU104GL [Quadro RTX 4000], the GPU have 3 aux dev
When I set up the GPU for the use of kubevirt vm pass through, the script vfio-mageme.sh cannot bind all aux dev to the vfio-pci driver
Bug In https://github.com/NVIDIA/gpu-operator/blob/main/assets/state-vfio-manager/0400_configmap.yaml#L128
The function get_grapcs_aux_dev should not use if ls "/sys/bus/pci/devices/$aux_dev/" as a criterion for judgment, and should return a string array. In the functions bind_device and unbind_device, loop through this array and perform judgment and corresponding operations
lspci -Dnnkv -d 10de:
0000:52:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104GL [Quadro RTX 4000] [10de:1eb1] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 16, NUMA node 0, IOMMU group 35
Memory at b3000000 (32-bit, non-prefetchable) [size=16M]
Memory at 20ffe0000000 (64-bit, prefetchable) [size=256M]
Memory at 20fff0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 7000 [size=128]
Expansion ROM at b4000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Kernel driver in use: vfio-pci
Kernel modules: nouveau
0000:52:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 0, IOMMU group 35
Memory at b4080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
0000:52:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1) (prog-if 30 [XHCI])
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 64, NUMA node 0, IOMMU group 35
Memory at 20fff2000000 (64-bit, prefetchable) [size=256K]
Memory at 20fff2040000 (64-bit, prefetchable) [size=64K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: xhci_hcd
0000:52:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1ad9] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 255, NUMA node 0, IOMMU group 35
Memory at b4084000 (32-bit, non-prefetchable) [disabled] [size=4K]
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting