DAOS-18431 bio: Set power management register on NVMe#17355
DAOS-18431 bio: Set power management register on NVMe#17355daltonbohning merged 13 commits intomasterfrom
Conversation
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Ticket title is 'NVMe power control feature' |
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17355/1/testReport/ |
5e0062d to
0be43f7
Compare
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
0be43f7 to
8b31c49
Compare
kjacque
left a comment
There was a problem hiding this comment.
Overall looks reasonable. Just trying to understand the right way to interact with the spdk stuff.
If we're letting users configure this generally, it might be good to add an explicit parameter to the config file with more intuitively named values, based on whatever the 0-4 stand for. Not necessary in this PR, but something to think about.
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
| bb->bb_dev_health.bdh_io_channel = channel; | ||
|
|
||
| /* Set NVMe power management */ | ||
| rc = bio_set_power_mgmt(bb->bb_dev, channel); |
There was a problem hiding this comment.
I think we don't want this being skipped when health monitor is disabled? (see bypass_health_collect() at the beginning of this function).
There was a problem hiding this comment.
is it okay to move this function and the channel fetch to before the bypass check?
There was a problem hiding this comment.
Hmm, this function uses the io_channel created by bio_init_health_monitoring().
| cmd.cdw10 = SPDK_NVME_FEAT_POWER_MANAGEMENT; | ||
| cmd.cdw11 = bio_spdk_power_mgmt_val; | ||
|
|
||
| rc = spdk_bdev_nvme_admin_passthru(d_bdev->bb_desc, channel, &cmd, NULL, 0, |
There was a problem hiding this comment.
This uses bdh_io_channel but doesn't hold reference count, that could cause issue when bio_dev_health being finalized before the completion. See bio_fini_health_monitoring().
There was a problem hiding this comment.
I've decoupled implementation from health monitoring, is it better now?
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
cde98e1 to
2b38adf
Compare
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17355/7/execution/node/1334/log |
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17355/8/testReport/ |
| struct spdk_nvme_cmd cmd; | ||
| int rc = 0; | ||
|
|
||
| D_INFO("Power management 1\n"); |
There was a problem hiding this comment.
All these "Power management x" info messages in this function should be removed.
| } | ||
|
|
||
| int | ||
| bio_set_power_mgmt(const char *bdev_name, struct spdk_bdev *bdev) |
There was a problem hiding this comment.
It looks redundant to pass in both "bdev_name" & "bdev".
| } | ||
|
|
||
| if (!spdk_bdev_io_type_supported(bdev, SPDK_BDEV_IO_TYPE_NVME_ADMIN)) { | ||
| D_ERROR("Bdev NVMe admin passthru not supported for %s\n", bdev_name); |
| } | ||
|
|
||
| /* Writable descriptor required for applying power management settings */ | ||
| rc = spdk_bdev_open_ext(bdev_name, true, bdev_event_cb, NULL, &pm_ctx.bdev_desc); |
There was a problem hiding this comment.
I think the bio_bdev_event_cb() should be used as the event callback.
|
|
||
| D_INFO("Power management 3\n"); | ||
| rc = spdk_bdev_nvme_admin_passthru(pm_ctx.bdev_desc, pm_ctx.bdev_io_channel, &cmd, NULL, 0, | ||
| set_power_mgmt_completion, &pm_ctx); |
There was a problem hiding this comment.
bio_xsctxt_alloc() isn't completed at this moment, and NVMe polling ULT isn't created yet, you need to explicitly poll completion by your self (see xs_poll_completion())
So you don't need ABT_eventual for such self polling. You could refer to common_fini_cb() & xs_poll_completion().
|
|
||
| spdk_bdev_free_io(bdev_io); | ||
| spdk_put_io_channel(pm_ctx->bdev_io_channel); | ||
| spdk_bdev_close(pm_ctx->bdev_desc); |
There was a problem hiding this comment.
It's not callback driven model anymore. The io_channel & dev_desc could be put/closed by caller.
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
@NiuYawei I tried to address your comments, I'm now getting SPDK_NVME_SC_INVALID_FIELD if I try to set value to 0x01, fine if setting to 0x00 but I think this maybe to do with the SSD not supporting changes. @Michael-Hennecke please could you verify on your side?
|
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17355/9/testReport/ |
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17355/9/testReport/ |
| } | ||
|
|
||
| struct power_mgmt_context_t { | ||
| ABT_eventual eventual; |
There was a problem hiding this comment.
This isn't useful, can be removed.
There was a problem hiding this comment.
has been addressed
| D_DEBUG(DB_MGMT, "Bdev NVMe admin passthru not supported for %s\n", bdev_name); | ||
| rc = -DER_NOTSUPPORTED; | ||
| goto out_desc; | ||
| } |
There was a problem hiding this comment.
'bdev' can be acquired by spdk_bdev_get_by_name(), these checking should be done before spdk_bdev_open_ext()
There was a problem hiding this comment.
has been addressed
kjacque
left a comment
There was a problem hiding this comment.
A couple minor comments. Nothing I'd block on for this PR.
|
@Michael-Hennecke can you please verify the PR and then we can land the change? TIA |
There was a problem hiding this comment.
in the src/bio/bio_device.c:1096 set_power_mgmt_completion() Power management value set on device Nvme_N0003_0_1_7n1 messages, can we print the actual value that it was set to? like .. value set to 0 on device ...
yes there's the global bio_nvme_init_ext() NVMe power management setting to be applied is 1 message before. but for example you may have different NVMe types in the engine, some may support that value and some may not.
so both the success and the failure messages (src/bio/bio_device.c:1087 set_power_mgmt_completion() Set power management failed for device Nvme_N0003_0_1_7n1, NVMe status code/type: 0x2/0x0) should always print the values, for each NVMe disk operation.
|
I have validated with the Samsung drives (that support power states 0 to 3) that the setting works as expected: but as expected: I also verified that the engines still come up even after the failure to set the power state, which is what we want. |
the supported values (other than 0) and their meaning are vendor-specific, so it's not possible for us to use descriptive names here... |
Michael-Hennecke
left a comment
There was a problem hiding this comment.
looks good, verified to work on Samsung drives. PR can be landed, but see my proposal to print power state values in log messages.
this has been addressed |
|
@tanabarr Any strong reason for not fixing the clang-format? And FWIW if you install it in your dev env the githooks will do it for you |
I didn't want to waste CI cycles on what essentially is an unnecessary reformat of indentation. would you like me to fix and refresh? this has been addressed |
…age-patch Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
3408dc3
|
@Michael-Hennecke do you want to ask about expediating this PR's landing so that we can get it in TB4? @daltonbohning do you think that would be possible? force landing maybe? |
TB4 was tagged an hour ago |
Set NVMe power management values for SSDs by setting the new
engine DAOS_NVME_POWER_MGMT environment variable to an integer
(sets register bits 0-4). Value will be applied by SPDK on devices
attached to an engine process. The value will not be reset on engine
exit.
Steps for the author:
After all prior steps are complete: