daos_server_helper SPDK bindings leak memory during NVMe discovery and initialisation#18427
Open
knard38 wants to merge 2 commits into
Open
daos_server_helper SPDK bindings leak memory during NVMe discovery and initialisation#18427knard38 wants to merge 2 commits into
knard38 wants to merge 2 commits into
Conversation
opts_add_pci_addr() allocates opts.pci_allowed via realloc() but daos_spdk_init() never frees it after calling spdk_env_init(). Add free(opts.pci_allowed) at the out: label so both the success and error paths release the buffer. Add test_daos_spdk_init_pci_freed to nvme_control_ut.c to verify under ASan that opts.pci_allowed is freed on return. Also fix the mock_spdk_nvme_ctrlr_get_pci_device() mock to use a static local pool instead of calloc(), which was itself leaking under ASan. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
_collect() allocates ctrlr->pci_addr via calloc() for each controller, but free_ctrlr_fields() freed all other heap fields (model, serial, fw_rev, vendor_id, pci_type) while omitting pci_addr. Add the missing free(). Add test_collect_pci_addr_freed to nvme_control_ut.c to verify under ASan that pci_addr is freed by teardown via clean_ret() -> free_ctrlr_fields(). Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
|
Errors are component not formatted correctly,Title of PR is too long,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
Collaborator
|
Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18427/1/testReport/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fix two memory leaks in the SPDK C bindings used by
daos_server_helper, detected by running the NVMe control unit tests under AddressSanitizer (ASAN_OPTIONS=detect_leaks=1).Leak 1 —
opts.pci_allowed(8 bytes,nvme_control.c)daos_spdk_init()builds a PCI allowlist by callingopts_add_pci_addr(), which allocatesopts.pci_allowedviarealloc(). This buffer was never freed after the call tospdk_env_init(). The fix addsfree(opts.pci_allowed)at theout:label, which is reached by both the success and error paths.Leak 2 —
ctrlr->pci_addr(257 bytes per controller,nvme_control_common.c)_collect()allocates a PCI address string per controller viacalloc(1, SPDK_NVMF_TRADDR_MAX_LEN + 1)and stores it inctrlr->pci_addr.free_ctrlr_fields()freed all other heap fields (model,serial,fw_rev,vendor_id,pci_type) but omittedpci_addr. The fix adds the missingfree().Tests
Two new unit tests are added to
nvme_control_ut.c, both intended to be run with ASan:test_daos_spdk_init_pci_freed: callsdaos_spdk_init()with one PCI address and lets ASan verify thatopts.pci_allowedis freed on return.spdk_env_init()is expected to fail in a unit-test context (no hugepages); this is acceptable since the leak check is ASan-driven.test_collect_pci_addr_freed: drives_collect()via mock function pointers (no real SPDK or hardware required), then lets teardown callclean_ret()→free_ctrlr_fields()so ASan can verify thatpci_addris freed for each controller.The
mock_spdk_nvme_ctrlr_get_pci_device()mock is also fixed: it previously leaked a heap-allocatedspdk_pci_deviceper call, which was invisible without ASan. It now returns entries from a static local pool (sized byMAX_MOCK_PCI_DEVS), with a module-level counter reset in teardown.The fix was locally validated with using the following command:
Steps for the author:
After all prior steps are complete: