Skip to content

DAOS-18633 rebuild: throttle rebuild status logs to reduce overhead (…#17960

Open
wangshilong wants to merge 1 commit intorelease/2.6from
shilongw/DAOS-18633-reduce-log-2.6
Open

DAOS-18633 rebuild: throttle rebuild status logs to reduce overhead (…#17960
wangshilong wants to merge 1 commit intorelease/2.6from
shilongw/DAOS-18633-reduce-log-2.6

Conversation

@wangshilong
Copy link
Copy Markdown
Contributor

#17696)

Currently, each target dumps its rebuild progress to the log every 2 seconds unconditionally. In a large-scale scenario where a system has 100 pools rebuilding concurrently across 16 targets per rank, running for 10 hours can generate massive amounts of log data (several GBs per rank). This continuous, high-frequency logging (around 50 logs per second per xstream) causes severe I/O contention and negatively impacts overall I/O performance and ULT scheduling.

There is no necessary reason to print background progress logs this frequently. This patch throttles the rebuild status log dumping from 2 seconds to 5 minutes. The final status will still be printed immediately if a rebuild completes or aborts, ensuring that we retain sufficient visibility for debugging while avoiding log storms.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

…17696)

Currently, each target dumps its rebuild progress to the log every 2
seconds unconditionally. In a large-scale scenario where a system has
100 pools rebuilding concurrently across 16 targets per rank, running
for 10 hours can generate massive amounts of log data (several GBs
per rank). This continuous, high-frequency logging (around 50 logs
per second per xstream) causes severe I/O contention and negatively
impacts overall I/O performance and ULT scheduling.

There is no necessary reason to print background progress logs this
frequently. This patch throttles the rebuild status log dumping
from 2 seconds to 5 minutes. The final status will still be printed
immediately if a rebuild completes or aborts, ensuring that we
retain sufficient visibility for debugging while avoiding log storms.

Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
@wangshilong wangshilong requested review from a team as code owners April 9, 2026 14:43
@wangshilong wangshilong requested review from kccain and liuxuezhao April 9, 2026 14:46
@wangshilong wangshilong added the unclean-cherry-pick Indicates that a cherry-pick had merge conflicts that needed resolving. label Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Errors are Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-18633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unclean-cherry-pick Indicates that a cherry-pick had merge conflicts that needed resolving.

Development

Successfully merging this pull request may close these issues.

2 participants