Skip to content

HDDS-14591. Cache HBASE_SUPPORT finalization status for putBlock#9892

Open
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:HDDS-14591
Open

HDDS-14591. Cache HBASE_SUPPORT finalization status for putBlock#9892
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:HDDS-14591

Conversation

@weimingdiit
Copy link
Contributor

What changes were proposed in this pull request?

Avoid checking VersionedDatanodeFeatures.isFinalized(HBASE_SUPPORT) on every putBlock request. Cache the finalized state in BlockManagerImpl, and once finalization is observed, reuse the cached value for subsequent requests.

  • Keep existing behavior before finalization: partial chunk list requests are rejected with UNSUPPORTED_REQUEST.
  • Add a unit test to verify the finalization status is cached after it is first detected as finalized.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14591

How was this patch tested?

unit tests

@weimingdiit weimingdiit marked this pull request as ready for review March 10, 2026 14:24
@weimingdiit
Copy link
Contributor Author

@smengcl @xichen01 @adoroszlai Hi, could you help review this PR?

Copy link

@yandrey321 yandrey321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@errose28
Copy link
Contributor

@yandrey321 what is the motivation for this change? Was this causing a performance issue?

@yandrey321
Copy link

@yandrey321 what is the motivation for this change? Was this causing a performance issue?

Yes, its a performance issue, we check for feature finalization on every putBlock request instead of checking it only during upgrade/finalization.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is used in many places throughout Ozone, and they all use the same AbstractVersionManager#isAllowed method, for which the only potentially expensive operation I can see is a read lock whose corresponding write lock is only held during finalization. If this method has a performance issue it needs to be fixed in the version manager for all checks, not on a per-feature basis like this.

@yandrey321 please provide more details on the issue. Simply referring to something as a "performance issue" is not enough to justify a change. Include what tests were run, what numbers we got before and after, and what specific part of this operation is causing the hang up.

@errose28
Copy link
Contributor

See also #9840

@yandrey321
Copy link

#9840

this one is aimed to make feature check less expensive. But in general IO path should not call feature checks outside of upgrade window unless these features could be altered in the runtime.

@errose28
Copy link
Contributor

But in general IO path should not call feature checks outside of upgrade window unless these features could be altered in the runtime.

Finalization alters what features are available at runtime. Caching in general is a common source of bugs, and the version check is supposed to be so cheap it makes no difference. That is why I'm asking for numbers and not hypotheticals. Microbenchmarks risk adding complexity without real value.

I don't know if #9840 is going to change performance in any meaningful way, but it looks like a good change since it concentrates the concurrency handling to a central point instead of having a lock spread around to each method.

@yandrey321
Copy link

But in general IO path should not call feature checks outside of upgrade window unless these features could be altered in the runtime.

Finalization alters what features are available at runtime. Caching in general is a common source of bugs, and the version check is supposed to be so cheap it makes no difference. That is why I'm asking for numbers and not hypotheticals. Microbenchmarks risk adding complexity without real value.

I don't know if #9840 is going to change performance in any meaningful way, but it looks like a good change since it concentrates the concurrency handling to a central point instead of having a lock spread around to each method.

the idea is that we should avoid locks on the IO path. If we can improve Feature validation code to use atomic operations instead of concurrent maps/list it would be a huge win for code that check enabled/finalized features on IO path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants