libnvme: NVMe-oF controller ownership registry#3400
Conversation
|
just a quick thoughs as discussed on the matrix channel:
|
|
@igaw - Thanks for the review! On the context/owner naming: agreed, they are the same concept. I already have a follow-up planned that addresses this, so I'd like to handle it there rather than here to avoid a mid-flight rename. On the parallel access test and plugin conversion: both are good ideas. I'd like to keep this PR focused on the core feature introduction and handle those in a follow-up as well, if that's acceptable to you. The concurrent registry access should be safe, but I agree that a test that proves it is worth having. I see that there are conflicts with the PR that I need to resolve. Once I'm done fixing the conflicts, would you be OK merging this as-is with those items tracked as follow-up work? |
Introduce a cooperative registry that records which orchestrator (e.g. nvme-stas, nbft, nvme-discoverd) owns each connected NVMe-oF controller. The registry lives under /run/nvme/registry/ as one JSON file per live controller (e.g. nvme3.json). New public API (fabrics + json-c only): libnvmf_ctx_set_owner() - set the owner name on the global context libnvmf_registry_retrieve() - read a field from a controller's entry libnvmf_registry_update() - write/update a field atomically libnvmf_registry_delete() - remove a controller's entry Internal API (to be called from the connect path in fabrics.c): libnvmf_registry_create() - create a fresh entry on connect Writes are atomic: a temp file is created with mkstemp(), synced, then renamed over the destination in a single syscall. registry_dir() is a weak symbol so test binaries can redirect I/O to a mkdtemp() directory without any runtime conditionals in production code. The owner field on struct libnvme_global_ctx is guarded by CONFIG_FABRICS and freed in libnvme_free_global_ctx(). Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
When an owner name is set in the global context, a registry entry is automatically written to /run/nvme/registry/ after every successful fabrics connect. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
Add four new built-in commands for managing the NVMeoF controller ownership registry: registry-list List all live registry entries registry-retrieve Read a field from a controller's entry registry-update Write a field to a controller's entry registry-delete Remove a controller's entry Modify disconnect-all to be ownership-aware by default: only controllers with no registry entry are disconnected. --owner NAME limits disconnection to controllers owned by NAME. --force restores the original unconditional behavior. --force and --owner are mutually exclusive. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
Extend GlobalCtx with an owner parameter and read-only property. Add four standalone module-level functions: registry_retrieve(device) read entry as a dict; None if absent registry_update(device, key, value) write a field (creates entry if absent) registry_delete(device) remove an entry registry_entries() generator yielding live entries as dicts Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
Add an Orchestrator column to the controllers section of 'nvme list -v'. The column shows which orchestrator manages each controller: kernel PCIe and apple-nvme controllers (kernel-enumerated) stas/nbft/… Fabrics controllers registered in the ownership registry - Unowned fabrics controllers (not in the registry) Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
When the kernel removes an NVMe-oF controller, we need a way to remove entries from the registry to prevent accumulation of stale entries. The rule is gated on want_fabrics: it is absent in no-fabrics builds since only NVMe-oF controllers ever have registry entries. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
Add man pages for the four new registry subcommands: nvme-registry-list(1) nvme-registry-retrieve(1) nvme-registry-update(1) nvme-registry-delete(1) Update nvme-disconnect-all(1) to document the new ownership-aware default behavior and the --owner, --force, and --transport options. The --transport option was already implemented but undocumented. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
|
yes, I don't have an issue to go ahead as soon the core concept and API are sound. Adding tests cases later shouldn't really be a problem. |
MichaelRabek
left a comment
There was a problem hiding this comment.
Other than that, I like it!
|
Hi @igaw — all review comments are now addressed and the strdup thread is resolved. Anything else remaining before merging? As discussed yesterday, all remaining items will be addressed in a follow up PR. Thanks. |
|
One thing I think we should also add from the beginning is a schema for the new json file and also a verifier for it. |
@igaw — before you ask for a JSON schema and verifier, I want to be upfront: I'm planning to replace the JSON backend with a sysfs-inspired layout in the follow-up PR. The schema would end up as throwaway code. The directory layout looks like this: One directory per controller, one plain text file per attribute — same convention the kernel uses for sysfs. The advantages over JSON:
The public API is unchanged by this; it's a purely internal storage format change. I'd rather not invest in a JSON schema that will be deleted in the next PR. Would you be OK merging the current PR as-is and letting me land the format switch in the follow-up? |
|
Sounds reasonable to me. The path lookup and then read and write is not trivial, e.g TOCTOU. Though it's doable and if this is done properly in the library it should be fine. |
This PR addresses: #2913
Multiple independent orchestrators (nvme-stas, NBFT, nvme-discoverd, manual nvme-cli) can establish NVMe-oF controller connections on the same host with no coordination.
nvme disconnect-allis indiscriminate and dangerous in such environments.This series introduces a cooperative userspace ownership registry so orchestrators can declare and respect ownership boundaries. Full design rationale and decisions are documented in: nvme-registry-plan.md.
What's new
libnvmf_registry_retrieve/update/delete/for_each), automatic registration on connect whenlibnvmf_context_set_owner()is set, Python bindings, unit tests70-nvmf-registry.rulesremoves stale entries on controller removal--ownerand--forceoptions"nbft"Commits
libnvme: add NVMe controller ownership registry(78c9055)libnvmf: register controller ownership on successful connect(6892470)nvme: add registry CLI commands and ownership-aware disconnect-all(1c4dbdf)libnvme: expose registry API and owner in Python bindings(43818fb)nvme: show orchestrator ownership in verbose list output(dc0823e)libnvme: install udev rule to clean up registry on controller removal(fa9e34b)nvme: add man pages for registry commands and update disconnect-all(e5f500c)