HBASE-30134: Improve CacheAwareLoadBalancer to consider low cache ratio when calculating imbalance (#8197)#8218
Conversation
…io when calculating imbalance (apache#8197) Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org> Change-Id: Id388f670f32fafb2ac28037746cfedae0f22ab78
d17ce31 to
21ec9f1
Compare
VladRodionov
left a comment
There was a problem hiding this comment.
The patch (or approach) scoped potential issue, which should be addressed, I think: Risk of balancing churn. If cache ratio changes slowly and free cache fluctuates, the balancer may repeatedly identify “better” placements. The patch is adding a cost signal, not necessarily a stable state transition.
This is a common problem of all balancers, I am not very familiar with StochasticBalancer internals, but if it is missing there, some stabilization algorithm should be added to prevent frequent rebalancing - the simplest is a cooling off period. But it is probably should be a separate ticket.
There was a problem hiding this comment.
potentialBestWeightedFromFreeCache(...) seems to consider any server with enough free cache as evidence that the region has a better possible cache score. Should this be limited to servers that are valid/reasonable candidate destinations for this specific region, considering locality/rack/table skew/other constraints?
There was a problem hiding this comment.
The CacheAwareLoadBalancer was originally thought for use cases where optimal cache allocation is pivotal for overall performance, so it only combine skewness and cache allocation costs when weighting in an imbalance. For use cases where locality/rack/load are more critical, the StochasticLoadBalancer should be used.
| updateRegionLoad(); | ||
| } | ||
|
|
||
| protected Map<ServerName, Long> getServerBlockCacheFreeBytes() { |
There was a problem hiding this comment.
Free cache capacity does not necessarily mean the moved region will become warm. What mechanism is expected to actually populate the cache after the move — prefetch, time-based priority, HBASE-30135, or normal client reads?
There was a problem hiding this comment.
Yeah, these changes are only calculating if there's imbalance considering cache space and uncached regions. HBASE-30135 will deal with new plans generation. Once regions are moved, prefetch will run at region opening and it's the mean for caching the regions. Maybe we should add extra validation for the prefetch property.
| * #potentialCacheRatioAfterMove} * region MB) so placement is not considered optimal solely | ||
| * from low ratios when capacity exists somewhere in the cluster. | ||
| */ | ||
| private double potentialBestWeightedFromFreeCache(BalancerClusterState cluster, int region, |
There was a problem hiding this comment.
The required free cache appears to be based on total region HFile size. Should this instead use cacheable/hot data size?
| */ | ||
| public static final String POTENTIAL_CACHE_RATIO_AFTER_MOVE_KEY = | ||
| "hbase.master.balancer.cacheaware.potentialCacheRatioAfterMove"; | ||
| public static final float POTENTIAL_CACHE_RATIO_AFTER_MOVE_DEFAULT = 0.95f; |
There was a problem hiding this comment.
Are the defaults for potentialCacheRatioAfterMove (0.95) and minFreeCacheSpaceFactor (1.0) based on observed data? They seem important enough to justify in comments or tests.
There was a problem hiding this comment.
For potentialCacheRatioAfterMove, I haven't formally measured. For root dirs on cloud storage, I've seen performance degradation starts to be noticeable when cache ratio falls below 90%.
For minFreeCacheSpaceFactor, I thought originally of only allow moving if cache free space can accommodate the whole region, but decided to make this configurable for flexibility.
| /** | ||
| * The available cache space on this region server (bytes), if reported in the server load. | ||
| */ | ||
| default long getCacheFreeSize() { |
There was a problem hiding this comment.
ServerMetrics#getCacheFreeSize() defaults to 0L, and getRegionColdDataSize() defaults to empty map. That is safe for compatibility, but it means missing metrics are indistinguishable from “no free cache” or “no cold data.” Should missing cache-free-size be represented differently from actual zero? Would 0L make old / partially upgraded servers look like they have no free cache and bias balancing?
There was a problem hiding this comment.
Yeah, since there would be no mean to determine the free cache space on those, it would exclude those as potential candidates, reducing the odds of finding a better assignment plan, so it may perform less moves, or worst case, no moves at all.
| // don't want to move regions with low cache ratio due to data classified as cold. | ||
| float regionCacheRatioOnOldServer = | ||
| regionSizeMB == 0 ? 0.0f : (float) regionSizeInCache / regionSizeMB; | ||
| regionSizeMB |
There was a problem hiding this comment.
I think adding cold data size to the numerator makes the value less clear. It turns the metric into an “effective satisfaction ratio” rather than an actual cache ratio.
Would it be better to exclude cold data from the denominator instead?
For example:
cacheRatio = cachedDataSize / (totalRegionSize - coldDataSize)
That would measure how much of the cache-relevant/hot data is actually cached. Cold data is intentionally not expected to be in block cache, so it should not make the region look poorly cached, but it also should not be counted as cached.
We may also need to clamp the result to [0, 1] and handle the case where totalRegionSize <= coldDataSize.
No description provided.