Fix NULL pointer dereferences in DisplayPort DSC capability checks#1040
Fix NULL pointer dereferences in DisplayPort DSC capability checks#1040olejaco wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Fixes two race conditions that cause kernel panics during DisplayPort hotplug disconnection: 1. compoundQueryAttachMSTIsDscPossible() - Guard outer conditional with dev->parent check before dereferencing. Falls back to checking device's own FEC capability when parent is NULL. 2. compoundQueryAttachMSTDsc() - Add NULL check for dev->devDoingDscDecompression before calling populateDscCaps(). These race conditions occur when the device hierarchy is torn down during hotplug events. The parent device destructor sets children[i]->parent = 0, while the child's devDoingDscDecompression may still reference the parent. Tested on NVIDIA driver 590.48.01 with DKMS.
|
Thanks for the change. We are actually tackling the same problem you encountered in nvbug 5871511. I tried reading your bug report capture for this information but unfortunately had the following. Public forum post: https://forums.developer.nvidia.com/t/kernel-null-pointer-dereference-in-nvidia-modeset-during-thunderbolt-dock-disconnect/359280 |
| if (dev->parent && | ||
| ((dev->devDoingDscDecompression != dev) || | ||
| ((dev->devDoingDscDecompression == dev) && | ||
| (dev->isLogical() && dev->parent)))) |
There was a problem hiding this comment.
If you change the code to the following, is the issue averted? If not, can you dump dev->devDoingDscDecompression right before the crash using DP_PRINTF? If using a release build, you will need to set nvidia_modeset.debug=1 to see the log.
| if (dev->parent && | |
| ((dev->devDoingDscDecompression != dev) || | |
| ((dev->devDoingDscDecompression == dev) && | |
| (dev->isLogical() && dev->parent)))) | |
| if (((dev->devDoingDscDecompression != NULL && | |
| dev->devDoingDscDecompression != dev) || | |
| ((dev->devDoingDscDecompression == dev) && | |
| (dev->isLogical() && dev->parent)))) |
There was a problem hiding this comment.
Yes, so I have tested your suggestions.
- No NULL ptr crash when only one external monitor connected. Regardless of which dp-port.
- Check for
dev->devDoingDscDecompression != NULLstill caused a nullptr crash. The following last entries of the log:
Feb 26 23:23:53 omarchy kernel: pci_bus 0000:3d: busn_res: [bus 3d-6c] is released
Feb 26 23:23:53 omarchy kernel: pci_bus 0000:3b: busn_res: [bus 3b-6c] is released
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: WARNING: DP> AuxChCtl Failing, if a device is connected you shouldn't be seeing this
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: ERROR: DPCONN> Lost device 0
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: WARNING: DPCONN> Zombie? : 1 000000001f025eb9
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: WARNING: DPCONN> Zombie? : 1 00000000a8ddd29a
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: DP-CONN> dev->devDoingDscDecompression: 00000000969a6008 addr=0 peerDevice=1 plugged=0 multistream=1 videoSink=0 audioSink=0 bDSCPossible=1 bFECSupported=1
Feb 26 23:23:53 omarchy kernel: BUG: kernel NULL pointer dereference, address: 0000000000000409
Feb 26 23:23:53 omarchy kernel: #PF: supervisor read access in kernel mode
Feb 26 23:23:53 omarchy kernel: #PF: error_code(0x0000) - not-present page
Full kerneldump:
kerneldump.txt
There was a problem hiding this comment.
The code used for the debug-print:
DP_USED(sb);
DP_PRINTF(DP_INFO, "DP-CONN> dev->devDoingDscDecompression: %p addr=%s peerDevice=%u plugged=%d multistream=%d videoSink=%d audioSink=%d bDSCPossible=%d bFECSupported=%d",
dev->devDoingDscDecompression,
dev->devDoingDscDecompression ? dev->devDoingDscDecompression->address.toString(sb) : "NULL",
dev->devDoingDscDecompression ? dev->devDoingDscDecompression->peerDevice : 0,
dev->devDoingDscDecompression ? dev->devDoingDscDecompression->plugged : 0,
dev->devDoingDscDecompression ? dev->devDoingDscDecompression->multistream : 0,
dev->devDoingDscDecompression ? dev->devDoingDscDecompression->videoSink : 0,
dev->devDoingDscDecompression ? dev->devDoingDscDecompression->audioSink : 0,
dev->devDoingDscDecompression ? dev->devDoingDscDecompression->bDSCPossible : 0,
dev->devDoingDscDecompression ? dev->devDoingDscDecompression->bFECSupported : 0);|
Hey @Binary-Eater, thanks for the review. I`ll test your suggestions once I get home from work 👍 I have 2 external monitors on display port in my setup. I must have run the bug report capture with my laptop disconnected from thunderbolt 🤔 |
Summary
This PR fixes two NULL pointer dereference race conditions that cause kernel panics during DisplayPort hotplug disconnection in the DSC (Display Stream Compression) capability checking code.
Changes
1.
compoundQueryAttachMSTIsDscPossible()(line 1453)dev->parentNULL guard to the outer conditional before dereferencing2.
compoundQueryAttachMSTDsc()(line 1551)dev->devDoingDscDecompressionbefore callingpopulateDscCaps()Root Cause
During DisplayPort hotplug disconnection, a race condition occurs:
children[i]->parent = 0(dp_deviceimpl.cpp:72)devDoingDscDecompressionmay still reference the parentdiscoveryLostDevice()setsdevDoingDscDecompression = NULL(dp_connectorimpl.cpp:903)Testing
Impact