Skip to content

libnvme: guard against NULL transport handle in discovery path#3240

Open
martin-belanger wants to merge 1 commit intolinux-nvme:masterfrom
martin-belanger:enoent
Open

libnvme: guard against NULL transport handle in discovery path#3240
martin-belanger wants to merge 1 commit intolinux-nvme:masterfrom
martin-belanger:enoent

Conversation

@martin-belanger
Copy link
Copy Markdown

While stress testing with nvme-stas using repeated nvmet create/delete cycles, a segmentation fault was observed during teardown when running nvme connect-all.

The crash occurs in nvme_get_log() due to a NULL handle:

nvme_get_log(hdl=0x0, ...)

Root cause is that the return value of nvme_ctrl_get_transport_handle() is not validated before use. Under certain conditions, a race can occur where the udev-triggered nvmf-connect@.service attempts to operate on a controller (e.g. nvme1) that has already been removed.

Fix this by checking that the transport handle is non-NULL before issuing commands that depend on it.

This prevents a potential SIGSEGV during discovery in transient device removal scenarios.

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Apr 7, 2026

Looks good. Just one question, do we want to log this?

@martin-belanger
Copy link
Copy Markdown
Author

Looks good. Just one question, do we want to log this?

Good idea. I'll add a log message.

@martin-belanger
Copy link
Copy Markdown
Author

Done. The log message follows the same "%s: \n" pattern used throughout the function, with LOG_DEBUG since this is an expected transient condition (device removed mid-operation), not an error worth surfacing at higher severity.

While stress testing with nvme-stas using repeated nvmet create/delete cycles,
a segmentation fault was observed during teardown when running nvme connect-all.

The crash occurs in nvme_get_log() due to a NULL hdl:

nvme_get_log(hdl=0x0, ...)

Root cause is that the return value of nvme_ctrl_get_transport_handle() is
not validated before use. Under certain conditions, a race can occur where
the udev-triggered nvmf-connect@.service attempts to operate on a controller
(e.g. nvme1) that has already been removed.

Fix this by checking that the transport handle is non-NULL before issuing
commands that depend on it.

This prevents a potential SIGSEGV during discovery in transient device
removal scenarios.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Apr 7, 2026

rebase due to merge conflict from the libnvme prefix change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants