Skip to content

DAOS-18431 bio: Set power management register on NVMe#17355

Merged
daltonbohning merged 13 commits intomasterfrom
tanabarr/bio-powmanage-patch
Feb 12, 2026
Merged

DAOS-18431 bio: Set power management register on NVMe#17355
daltonbohning merged 13 commits intomasterfrom
tanabarr/bio-powmanage-patch

Conversation

@tanabarr
Copy link
Copy Markdown
Contributor

@tanabarr tanabarr commented Jan 8, 2026

Set NVMe power management values for SSDs by setting the new
engine DAOS_NVME_POWER_MGMT environment variable to an integer
(sets register bits 0-4). Value will be applied by SPDK on devices
attached to an engine process. The value will not be reset on engine
exit.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr self-assigned this Jan 8, 2026
@tanabarr tanabarr requested review from a team as code owners January 8, 2026 12:25
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 8, 2026

Ticket title is 'NVMe power control feature'
Status is 'Awaiting Verification'
https://daosio.atlassian.net/browse/DAOS-18431

@Michael-Hennecke Michael-Hennecke self-requested a review January 8, 2026 12:51
@daosbuild3
Copy link
Copy Markdown
Collaborator

@tanabarr tanabarr force-pushed the tanabarr/bio-powmanage-patch branch from 5e0062d to 0be43f7 Compare January 8, 2026 17:26
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr requested review from NiuYawei and kjacque January 8, 2026 17:35
@tanabarr tanabarr force-pushed the tanabarr/bio-powmanage-patch branch from 0be43f7 to 8b31c49 Compare January 8, 2026 17:38
Copy link
Copy Markdown
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable. Just trying to understand the right way to interact with the spdk stuff.

If we're letting users configure this generally, it might be good to add an explicit parameter to the config file with more intuitively named values, based on whatever the 0-4 stand for. Not necessary in this PR, but something to think about.

Comment thread src/bio/bio_device.c
Comment thread src/bio/bio_device.c Outdated
Comment thread src/bio/bio_monitor.c Outdated
Comment thread src/bio/bio_device.c Outdated
Comment thread src/bio/bio_xstream.c
@tanabarr tanabarr requested a review from kjacque January 12, 2026 12:50
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Comment thread src/bio/bio_monitor.c Outdated
bb->bb_dev_health.bdh_io_channel = channel;

/* Set NVMe power management */
rc = bio_set_power_mgmt(bb->bb_dev, channel);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want this being skipped when health monitor is disabled? (see bypass_health_collect() at the beginning of this function).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it okay to move this function and the channel fetch to before the bypass check?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this function uses the io_channel created by bio_init_health_monitoring().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/bio/bio_device.c Outdated
cmd.cdw10 = SPDK_NVME_FEAT_POWER_MANAGEMENT;
cmd.cdw11 = bio_spdk_power_mgmt_val;

rc = spdk_bdev_nvme_admin_passthru(d_bdev->bb_desc, channel, &cmd, NULL, 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses bdh_io_channel but doesn't hold reference count, that could cause issue when bio_dev_health being finalized before the completion. See bio_fini_health_monitoring().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decoupled implementation from health monitoring, is it better now?

@tanabarr tanabarr requested a review from NiuYawei January 16, 2026 15:44
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr force-pushed the tanabarr/bio-powmanage-patch branch from cde98e1 to 2b38adf Compare January 16, 2026 15:47
@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17355/7/execution/node/1334/log

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@daosbuild3
Copy link
Copy Markdown
Collaborator

Comment thread src/bio/bio_device.c Outdated
struct spdk_nvme_cmd cmd;
int rc = 0;

D_INFO("Power management 1\n");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these "Power management x" info messages in this function should be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/bio/bio_device.c Outdated
}

int
bio_set_power_mgmt(const char *bdev_name, struct spdk_bdev *bdev)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks redundant to pass in both "bdev_name" & "bdev".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/bio/bio_device.c Outdated
}

if (!spdk_bdev_io_type_supported(bdev, SPDK_BDEV_IO_TYPE_NVME_ADMIN)) {
D_ERROR("Bdev NVMe admin passthru not supported for %s\n", bdev_name);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to debug

Comment thread src/bio/bio_device.c Outdated
}

/* Writable descriptor required for applying power management settings */
rc = spdk_bdev_open_ext(bdev_name, true, bdev_event_cb, NULL, &pm_ctx.bdev_desc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bio_bdev_event_cb() should be used as the event callback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/bio/bio_device.c

D_INFO("Power management 3\n");
rc = spdk_bdev_nvme_admin_passthru(pm_ctx.bdev_desc, pm_ctx.bdev_io_channel, &cmd, NULL, 0,
set_power_mgmt_completion, &pm_ctx);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bio_xsctxt_alloc() isn't completed at this moment, and NVMe polling ULT isn't created yet, you need to explicitly poll completion by your self (see xs_poll_completion())

So you don't need ABT_eventual for such self polling. You could refer to common_fini_cb() & xs_poll_completion().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/bio/bio_device.c Outdated

spdk_bdev_free_io(bdev_io);
spdk_put_io_channel(pm_ctx->bdev_io_channel);
spdk_bdev_close(pm_ctx->bdev_desc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not callback driven model anymore. The io_channel & dev_desc could be put/closed by caller.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr
Copy link
Copy Markdown
Contributor Author

tanabarr commented Jan 24, 2026

@NiuYawei I tried to address your comments, I'm now getting SPDK_NVME_SC_INVALID_FIELD if I try to set value to 0x01, fine if setting to 0x00 but I think this maybe to do with the SSD not supporting changes. @Michael-Hennecke please could you verify on your side?
Logging example:

2026/01/25 23:05:46.136270 edaos-15 DAOS[241271/0/3] bio DBUG src/bio/bio_device.c:1166 bio_set_power_mgmt() Setting power state 0 on device Nvme_edaos-15.daos.hpc.amslabs.hpecorp.net_0_1_7n1

2026/01/25 23:05:46.136306 edaos-15 DAOS[241271/0/3] bio INFO src/bio/bio_device.c:1095 set_power_mgmt_completion() Power management value set on device Nvme_edaos-15.daos.hpc.amslabs.hpecorp.net_0_1_7n1

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17355/9/testReport/

NiuYawei
NiuYawei previously approved these changes Jan 26, 2026
Comment thread src/bio/bio_device.c Outdated
}

struct power_mgmt_context_t {
ABT_eventual eventual;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't useful, can be removed.

Copy link
Copy Markdown
Contributor Author

@tanabarr tanabarr Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been addressed

Comment thread src/bio/bio_device.c
D_DEBUG(DB_MGMT, "Bdev NVMe admin passthru not supported for %s\n", bdev_name);
rc = -DER_NOTSUPPORTED;
goto out_desc;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'bdev' can be acquired by spdk_bdev_get_by_name(), these checking should be done before spdk_bdev_open_ext()

Copy link
Copy Markdown
Contributor Author

@tanabarr tanabarr Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been addressed

kjacque
kjacque previously approved these changes Jan 26, 2026
Copy link
Copy Markdown
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor comments. Nothing I'd block on for this PR.

Comment thread src/bio/bio_device.c Outdated
Comment thread src/bio/bio_device.c Outdated
@tanabarr
Copy link
Copy Markdown
Contributor Author

@Michael-Hennecke can you please verify the PR and then we can land the change? TIA

Copy link
Copy Markdown
Collaborator

@Michael-Hennecke Michael-Hennecke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the src/bio/bio_device.c:1096 set_power_mgmt_completion() Power management value set on device Nvme_N0003_0_1_7n1 messages, can we print the actual value that it was set to? like .. value set to 0 on device ...

yes there's the global bio_nvme_init_ext() NVMe power management setting to be applied is 1 message before. but for example you may have different NVMe types in the engine, some may support that value and some may not.

so both the success and the failure messages (src/bio/bio_device.c:1087 set_power_mgmt_completion() Set power management failed for device Nvme_N0003_0_1_7n1, NVMe status code/type: 0x2/0x0) should always print the values, for each NVMe disk operation.

@Michael-Hennecke Michael-Hennecke self-requested a review February 9, 2026 12:06
@Michael-Hennecke
Copy link
Copy Markdown
Collaborator

Michael-Hennecke commented Feb 9, 2026

I have validated with the Samsung drives (that support power states 0 to 3) that the setting works as expected:

2026/02/09 05:00:44.712366 N0003 DAOS[111568/-1/0] bio  INFO src/bio/bio_xstream.c:277 bio_nvme_init_ext() NVMe power management setting to be applied is 3
2026/02/09 05:00:46.288579 N0003 DAOS[111568/0/1] bio  INFO src/bio/bio_device.c:1096 set_power_mgmt_completion() Power management value set on device Nvme_N0003_0_1_7n1

but as expected:

2026/02/09 04:57:17.216038 N0003 DAOS[107878/-1/0] bio  INFO src/bio/bio_xstream.c:277 bio_nvme_init_ext() NVMe power management setting to be applied is 5
2026/02/09 04:57:18.764518 N0003 DAOS[107878/0/1] bio  ERR  src/bio/bio_device.c:1087 set_power_mgmt_completion() Set power management failed for device Nvme_N0003_0_1_7n1, NVMe status code/type: 0x2/0x0
2026/02/09 04:57:18.764563 N0003 DAOS[107878/0/1] bio  ERR  src/bio/bio_device.c:1091 set_power_mgmt_completion()  - INVALID_FIELD: Device may not support requested power state

I also verified that the engines still come up even after the failure to set the power state, which is what we want.

@Michael-Hennecke
Copy link
Copy Markdown
Collaborator

Overall looks reasonable. Just trying to understand the right way to interact with the spdk stuff.

If we're letting users configure this generally, it might be good to add an explicit parameter to the config file with more intuitively named values, based on whatever the 0-4 stand for. Not necessary in this PR, but something to think about.

the supported values (other than 0) and their meaning are vendor-specific, so it's not possible for us to use descriptive names here...

Copy link
Copy Markdown
Collaborator

@Michael-Hennecke Michael-Hennecke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, verified to work on Samsung drives. PR can be landed, but see my proposal to print power state values in log messages.

@tanabarr
Copy link
Copy Markdown
Contributor Author

tanabarr commented Feb 9, 2026

in the src/bio/bio_device.c:1096 set_power_mgmt_completion() Power management value set on device Nvme_N0003_0_1_7n1 messages, can we print the actual value that it was set to? like .. value set to 0 on device ...

yes there's the global bio_nvme_init_ext() NVMe power management setting to be applied is 1 message before. but for example you may have different NVMe types in the engine, some may support that value and some may not.

so both the success and the failure messages (src/bio/bio_device.c:1087 set_power_mgmt_completion() Set power management failed for device Nvme_N0003_0_1_7n1, NVMe status code/type: 0x2/0x0) should always print the values, for each NVMe disk operation.

this has been addressed

@tanabarr tanabarr requested a review from a team February 9, 2026 13:49
@tanabarr tanabarr added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Feb 9, 2026
@daltonbohning
Copy link
Copy Markdown
Contributor

@tanabarr Any strong reason for not fixing the clang-format? And FWIW if you install it in your dev env the githooks will do it for you
https://github.com/daos-stack/daos/tree/master/utils/githooks#2-install-the-tools

@tanabarr
Copy link
Copy Markdown
Contributor Author

tanabarr commented Feb 9, 2026

@tanabarr Any strong reason for not fixing the clang-format? And FWIW if you install it in your dev env the githooks will do it for you https://github.com/daos-stack/daos/tree/master/utils/githooks#2-install-the-tools

I didn't want to waste CI cycles on what essentially is an unnecessary reformat of indentation. would you like me to fix and refresh?
I have the Hook applied and not sure why it didn't work.

this has been addressed

…age-patch

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr dismissed stale reviews from Michael-Hennecke, kjacque, and NiuYawei via 3408dc3 February 10, 2026 14:22
@tanabarr tanabarr requested review from Michael-Hennecke, NiuYawei and kjacque and removed request for a team February 10, 2026 14:23
@tanabarr
Copy link
Copy Markdown
Contributor Author

@Michael-Hennecke do you want to ask about expediating this PR's landing so that we can get it in TB4? @daltonbohning do you think that would be possible? force landing maybe?

@daltonbohning
Copy link
Copy Markdown
Contributor

@Michael-Hennecke do you want to ask about expediating this PR's landing so that we can get it in TB4? @daltonbohning do you think that would be possible? force landing maybe?

TB4 was tagged an hour ago
103e8b3

@tanabarr tanabarr removed the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Feb 12, 2026
@tanabarr tanabarr requested a review from a team February 12, 2026 10:21
@daltonbohning daltonbohning merged commit 32c1281 into master Feb 12, 2026
57 of 75 checks passed
@daltonbohning daltonbohning deleted the tanabarr/bio-powmanage-patch branch February 12, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants