Skip to content

Fix NULL pointer dereferences in DisplayPort DSC capability checks#1040

Open
olejaco wants to merge 1 commit intoNVIDIA:mainfrom
olejaco:main
Open

Fix NULL pointer dereferences in DisplayPort DSC capability checks#1040
olejaco wants to merge 1 commit intoNVIDIA:mainfrom
olejaco:main

Conversation

@olejaco
Copy link

@olejaco olejaco commented Feb 24, 2026

Summary

This PR fixes two NULL pointer dereference race conditions that cause kernel panics during DisplayPort hotplug disconnection in the DSC (Display Stream Compression) capability checking code.

Changes

1. compoundQueryAttachMSTIsDscPossible() (line 1453)

  • Added dev->parent NULL guard to the outer conditional before dereferencing
  • Falls back to checking the device's own FEC capability when parent is NULL
  • Consistent with existing pattern at line 1575

2. compoundQueryAttachMSTDsc() (line 1551)

  • Added NULL check for dev->devDoingDscDecompression before calling populateDscCaps()
  • Protects against teardown race where the pointer could be NULL between the capability check and function call

Root Cause

During DisplayPort hotplug disconnection, a race condition occurs:

  1. Parent device destructor sets children[i]->parent = 0 (dp_deviceimpl.cpp:72)
  2. Child device's devDoingDscDecompression may still reference the parent
  3. discoveryLostDevice() sets devDoingDscDecompression = NULL (dp_connectorimpl.cpp:903)
  4. If DSC capability queries execute during this teardown window, NULL dereferences occur

Testing

  • Tested on NVIDIA driver 590.48.01 with DKMS
  • DisplayPort hotplug stress testing (multiple connect/disconnect cycles)
  • No kernel panics observed after fix
  • Display functionality verified (resolution, refresh rate, hotplug detection)

Impact

  • Prevents kernel crashes during DisplayPort hotplug events
  • Particularly affects systems with MST (Multi-Stream Transport) topologies
  • Improves stability for devices with DSC-capable displays

Fixes two race conditions that cause kernel panics during DisplayPort
hotplug disconnection:

1. compoundQueryAttachMSTIsDscPossible() - Guard outer conditional with
   dev->parent check before dereferencing. Falls back to checking device's
   own FEC capability when parent is NULL.

2. compoundQueryAttachMSTDsc() - Add NULL check for
   dev->devDoingDscDecompression before calling populateDscCaps().

These race conditions occur when the device hierarchy is torn down during
hotplug events. The parent device destructor sets children[i]->parent = 0,
while the child's devDoingDscDecompression may still reference the parent.

Tested on NVIDIA driver 590.48.01 with DKMS.
@CLAassistant
Copy link

CLAassistant commented Feb 24, 2026

CLA assistant check
All committers have signed the CLA.

@Binary-Eater
Copy link
Collaborator

Thanks for the change. We are actually tackling the same problem you encountered in nvbug 5871511.
Before being able to review, would mind sharing with me how many monitors you have connected to the dock? If it's more than one, does having only a single monitor connected to the dock not reproduce the issue on hot unplug of the dock?

I tried reading your bug report capture for this information but unfortunately had the following.

*** /proc/driver/nvidia-modeset/dpys
*** ls: -r--r--r-- 1 root root 0 2026-02-06 18:54:24.392145253 +0100 /proc/driver/nvidia-modeset/dpys
deviceId                     : 00
 connector                   : DP-0
  dpy                        : (not connected)
 connector                   : DP-1
  dpy                        : (not connected)
 connector                   : DP-2
  dpy                        : (not connected)
 connector                   : DP-3
  dpy                        : (not connected)
 connector                   : DP-4
  dpy                        : (not connected)
 connector                   : DP-5
  dpy                        : (not connected)

____________________________________________

*** /proc/driver/nvidia-modeset/heads
*** ls: -r--r--r-- 1 root root 0 2026-02-06 18:54:24.392145253 +0100 /proc/driver/nvidia-modeset/heads
deviceId                     : 00
 (not yet initialized)

Public forum post: https://forums.developer.nvidia.com/t/kernel-null-pointer-dereference-in-nvidia-modeset-during-thunderbolt-dock-disconnect/359280

Comment on lines +1453 to +1456
if (dev->parent &&
((dev->devDoingDscDecompression != dev) ||
((dev->devDoingDscDecompression == dev) &&
(dev->isLogical() && dev->parent))))
Copy link
Collaborator

@Binary-Eater Binary-Eater Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change the code to the following, is the issue averted? If not, can you dump dev->devDoingDscDecompression right before the crash using DP_PRINTF? If using a release build, you will need to set nvidia_modeset.debug=1 to see the log.

Suggested change
if (dev->parent &&
((dev->devDoingDscDecompression != dev) ||
((dev->devDoingDscDecompression == dev) &&
(dev->isLogical() && dev->parent))))
if (((dev->devDoingDscDecompression != NULL &&
dev->devDoingDscDecompression != dev) ||
((dev->devDoingDscDecompression == dev) &&
(dev->isLogical() && dev->parent))))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so I have tested your suggestions.

  1. No NULL ptr crash when only one external monitor connected. Regardless of which dp-port.
  2. Check for dev->devDoingDscDecompression != NULL still caused a nullptr crash. The following last entries of the log:

Feb 26 23:23:53 omarchy kernel: pci_bus 0000:3d: busn_res: [bus 3d-6c] is released
Feb 26 23:23:53 omarchy kernel: pci_bus 0000:3b: busn_res: [bus 3b-6c] is released
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: WARNING: DP> AuxChCtl Failing, if a device is connected you shouldn't be seeing this
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: ERROR: DPCONN> Lost device 0
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: WARNING: DPCONN> Zombie? : 1 000000001f025eb9
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: WARNING: DPCONN> Zombie? : 1 00000000a8ddd29a
Feb 26 23:23:53 omarchy kernel: nvidia-modeset: DP-CONN> dev->devDoingDscDecompression: 00000000969a6008 addr=0 peerDevice=1 plugged=0 multistream=1 videoSink=0 audioSink=0 bDSCPossible=1 bFECSupported=1
Feb 26 23:23:53 omarchy kernel: BUG: kernel NULL pointer dereference, address: 0000000000000409
Feb 26 23:23:53 omarchy kernel: #PF: supervisor read access in kernel mode
Feb 26 23:23:53 omarchy kernel: #PF: error_code(0x0000) - not-present page

Full kerneldump:
kerneldump.txt

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code used for the debug-print:

            DP_USED(sb);
            DP_PRINTF(DP_INFO, "DP-CONN> dev->devDoingDscDecompression: %p addr=%s peerDevice=%u plugged=%d multistream=%d videoSink=%d audioSink=%d bDSCPossible=%d bFECSupported=%d",
                      dev->devDoingDscDecompression,
                      dev->devDoingDscDecompression ? dev->devDoingDscDecompression->address.toString(sb) : "NULL",
                      dev->devDoingDscDecompression ? dev->devDoingDscDecompression->peerDevice : 0,
                      dev->devDoingDscDecompression ? dev->devDoingDscDecompression->plugged : 0,
                      dev->devDoingDscDecompression ? dev->devDoingDscDecompression->multistream : 0,
                      dev->devDoingDscDecompression ? dev->devDoingDscDecompression->videoSink : 0,
                      dev->devDoingDscDecompression ? dev->devDoingDscDecompression->audioSink : 0,
                      dev->devDoingDscDecompression ? dev->devDoingDscDecompression->bDSCPossible : 0,
                      dev->devDoingDscDecompression ? dev->devDoingDscDecompression->bFECSupported : 0);

@olejaco
Copy link
Author

olejaco commented Feb 26, 2026

Hey @Binary-Eater, thanks for the review. I`ll test your suggestions once I get home from work 👍

I have 2 external monitors on display port in my setup. I must have run the bug report capture with my laptop disconnected from thunderbolt 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants