Skip to content

HBASE-30134: Improve CacheAwareLoadBalancer to consider low cache ratio when calculating imbalance (#8197)#8218

Open
wchevreuil wants to merge 3 commits into
apache:branch-2from
wchevreuil:HBASE-30134-branch-2
Open

HBASE-30134: Improve CacheAwareLoadBalancer to consider low cache ratio when calculating imbalance (#8197)#8218
wchevreuil wants to merge 3 commits into
apache:branch-2from
wchevreuil:HBASE-30134-branch-2

Conversation

@wchevreuil
Copy link
Copy Markdown
Contributor

No description provided.

…io when calculating imbalance (apache#8197)

Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
Change-Id: Id388f670f32fafb2ac28037746cfedae0f22ab78
@wchevreuil wchevreuil force-pushed the HBASE-30134-branch-2 branch from d17ce31 to 21ec9f1 Compare May 11, 2026 10:38
Copy link
Copy Markdown
Contributor

@VladRodionov VladRodionov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch (or approach) scoped potential issue, which should be addressed, I think: Risk of balancing churn. If cache ratio changes slowly and free cache fluctuates, the balancer may repeatedly identify “better” placements. The patch is adding a cost signal, not necessarily a stable state transition.

This is a common problem of all balancers, I am not very familiar with StochasticBalancer internals, but if it is missing there, some stabilization algorithm should be added to prevent frequent rebalancing - the simplest is a cooling off period. But it is probably should be a separate ticket.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentialBestWeightedFromFreeCache(...) seems to consider any server with enough free cache as evidence that the region has a better possible cache score. Should this be limited to servers that are valid/reasonable candidate destinations for this specific region, considering locality/rack/table skew/other constraints?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CacheAwareLoadBalancer was originally thought for use cases where optimal cache allocation is pivotal for overall performance, so it only combine skewness and cache allocation costs when weighting in an imbalance. For use cases where locality/rack/load are more critical, the StochasticLoadBalancer should be used.

updateRegionLoad();
}

protected Map<ServerName, Long> getServerBlockCacheFreeBytes() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Free cache capacity does not necessarily mean the moved region will become warm. What mechanism is expected to actually populate the cache after the move — prefetch, time-based priority, HBASE-30135, or normal client reads?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these changes are only calculating if there's imbalance considering cache space and uncached regions. HBASE-30135 will deal with new plans generation. Once regions are moved, prefetch will run at region opening and it's the mean for caching the regions. Maybe we should add extra validation for the prefetch property.

* #potentialCacheRatioAfterMove} * region MB) so placement is not considered optimal solely
* from low ratios when capacity exists somewhere in the cluster.
*/
private double potentialBestWeightedFromFreeCache(BalancerClusterState cluster, int region,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The required free cache appears to be based on total region HFile size. Should this instead use cacheable/hot data size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me fix this in HBASE-30135.

*/
public static final String POTENTIAL_CACHE_RATIO_AFTER_MOVE_KEY =
"hbase.master.balancer.cacheaware.potentialCacheRatioAfterMove";
public static final float POTENTIAL_CACHE_RATIO_AFTER_MOVE_DEFAULT = 0.95f;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the defaults for potentialCacheRatioAfterMove (0.95) and minFreeCacheSpaceFactor (1.0) based on observed data? They seem important enough to justify in comments or tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For potentialCacheRatioAfterMove, I haven't formally measured. For root dirs on cloud storage, I've seen performance degradation starts to be noticeable when cache ratio falls below 90%.

For minFreeCacheSpaceFactor, I thought originally of only allow moving if cache free space can accommodate the whole region, but decided to make this configurable for flexibility.

/**
* The available cache space on this region server (bytes), if reported in the server load.
*/
default long getCacheFreeSize() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServerMetrics#getCacheFreeSize() defaults to 0L, and getRegionColdDataSize() defaults to empty map. That is safe for compatibility, but it means missing metrics are indistinguishable from “no free cache” or “no cold data.” Should missing cache-free-size be represented differently from actual zero? Would 0L make old / partially upgraded servers look like they have no free cache and bias balancing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since there would be no mean to determine the free cache space on those, it would exclude those as potential candidates, reducing the odds of finding a better assignment plan, so it may perform less moves, or worst case, no moves at all.

// don't want to move regions with low cache ratio due to data classified as cold.
float regionCacheRatioOnOldServer =
regionSizeMB == 0 ? 0.0f : (float) regionSizeInCache / regionSizeMB;
regionSizeMB
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding cold data size to the numerator makes the value less clear. It turns the metric into an “effective satisfaction ratio” rather than an actual cache ratio.

Would it be better to exclude cold data from the denominator instead?

For example:

cacheRatio = cachedDataSize / (totalRegionSize - coldDataSize)

That would measure how much of the cache-relevant/hot data is actually cached. Cold data is intentionally not expected to be in block cache, so it should not make the region look poorly cached, but it also should not be counted as cached.

We may also need to clamp the result to [0, 1] and handle the case where totalRegionSize <= coldDataSize.

Change-Id: I0f715c3ff0703eeb56cdf5033242cb9e8234fba5
Change-Id: I07c0d5ba8b95d9036424e8360b685e21d5babfa2
Copy link
Copy Markdown
Contributor

@VladRodionov VladRodionov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants