Skip to content

daos_server_helper SPDK bindings leak memory during NVMe discovery and initialisation#18427

Open
knard38 wants to merge 2 commits into
masterfrom
ckochhof/fix/master/daos-19019/patch-001
Open

daos_server_helper SPDK bindings leak memory during NVMe discovery and initialisation#18427
knard38 wants to merge 2 commits into
masterfrom
ckochhof/fix/master/daos-19019/patch-001

Conversation

@knard38
Copy link
Copy Markdown
Contributor

@knard38 knard38 commented Jun 3, 2026

Description

Fix two memory leaks in the SPDK C bindings used by daos_server_helper, detected by running the NVMe control unit tests under AddressSanitizer (ASAN_OPTIONS=detect_leaks=1).

Leak 1 — opts.pci_allowed (8 bytes, nvme_control.c)

daos_spdk_init() builds a PCI allowlist by calling opts_add_pci_addr(), which allocates opts.pci_allowed via realloc(). This buffer was never freed after the call to spdk_env_init(). The fix adds free(opts.pci_allowed) at the out: label, which is reached by both the success and error paths.

Leak 2 — ctrlr->pci_addr (257 bytes per controller, nvme_control_common.c)

_collect() allocates a PCI address string per controller via calloc(1, SPDK_NVMF_TRADDR_MAX_LEN + 1) and stores it in ctrlr->pci_addr. free_ctrlr_fields() freed all other heap fields (model, serial, fw_rev, vendor_id, pci_type) but omitted pci_addr. The fix adds the missing free().

Tests

Two new unit tests are added to nvme_control_ut.c, both intended to be run with ASan:

  • test_daos_spdk_init_pci_freed: calls daos_spdk_init() with one PCI address and lets ASan verify that opts.pci_allowed is freed on return. spdk_env_init() is expected to fail in a unit-test context (no hugepages); this is acceptable since the leak check is ASan-driven.

  • test_collect_pci_addr_freed: drives _collect() via mock function pointers (no real SPDK or hardware required), then lets teardown call clean_ret()free_ctrlr_fields() so ASan can verify that pci_addr is freed for each controller.

The mock_spdk_nvme_ctrlr_get_pci_device() mock is also fixed: it previously leaked a heap-allocated spdk_pci_device per call, which was invisible without ASan. It now returns entries from a static local pool (sized by MAX_MOCK_PCI_DEVS), with a module-level counter reset in teardown.

The fix was locally validated with using the following command:

ASAN_OPTIONS=detect_leaks=1 ./nvme_control_ctests

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

kanard38 added 2 commits June 3, 2026 15:57
opts_add_pci_addr() allocates opts.pci_allowed via realloc() but
daos_spdk_init() never frees it after calling spdk_env_init().
Add free(opts.pci_allowed) at the out: label so both the success
and error paths release the buffer.

Add test_daos_spdk_init_pci_freed to nvme_control_ut.c to verify
under ASan that opts.pci_allowed is freed on return. Also fix the
mock_spdk_nvme_ctrlr_get_pci_device() mock to use a static local
pool instead of calloc(), which was itself leaking under ASan.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
_collect() allocates ctrlr->pci_addr via calloc() for each
controller, but free_ctrlr_fields() freed all other heap fields
(model, serial, fw_rev, vendor_id, pci_type) while omitting
pci_addr. Add the missing free().

Add test_collect_pci_addr_freed to nvme_control_ut.c to verify
under ASan that pci_addr is freed by teardown via
clean_ret() -> free_ctrlr_fields().

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
@knard38 knard38 marked this pull request as ready for review June 3, 2026 16:06
@knard38 knard38 requested a review from tanabarr June 3, 2026 16:06
@knard38 knard38 self-assigned this Jun 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Errors are component not formatted correctly,Title of PR is too long,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/daos_server_helper

@knard38 knard38 requested review from Nasf-Fan and NiuYawei June 3, 2026 16:09
@knard38 knard38 changed the title Ckochhof/fix/master/daos 19019/patch 001 daos_server_helper SPDK bindings leak memory during NVMe discovery and initialisation Jun 3, 2026
@daosbuild3
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants