Skip to content

feat: enhance U-shape idle prediction for scale-down scenarios#19562

Open
Fly-Style wants to merge 1 commit into
apache:masterfrom
Fly-Style:cba-enhance-ushape
Open

feat: enhance U-shape idle prediction for scale-down scenarios#19562
Fly-Style wants to merge 1 commit into
apache:masterfrom
Fly-Style:cba-enhance-ushape

Conversation

@Fly-Style

@Fly-Style Fly-Style commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Description

The cost-based supervisor autoscaler wouldn't scale down a healthy, over-provisioned supervisor - one above the ideal idle ratio with low lag stayed pinned at its current task count.

Root cause. The idle projection was linear:

rawIdle = 1.0 - busyFraction / taskRatio; // taskRatio = proposed / current

This assumes busy time is fully conserved when work moves onto fewer tasks, so a reasonable consolidation projects negative idle (e.g. 1 − 0.6/0.5 =−0.2). That clamps to 0 (the worst point of the U-shaped idle cost) and turns an overrun into phantom virtual lag — pinning the task count even at ~0 real lag. In reality, busy grows sublinearly (an observed 2× consolidation raised busy ~1.25×, not 2×).

Fix. Redistribute busy sublinearly:

projectedBusy = busyFraction * (currentTaskCount / proposedTaskCount) ^ IDLE_SUBLINEARITY_EXPONENT;  // 0.32
rawIdle = 1.0 - projectedBusy;

IDLE_SUBLINEARITY_EXPONENT = 0.32 (≈ log₂(1.25)) is a tuned constant based on careful testing and theoretical math application.

A healthy consolidation now lands near the ideal idle ratio instead of going negative, so the supervisor scales down; the exponent stays > 0, so extreme over-consolidation still diverges and is broken.

Validation (plots under hood)

Details Optimal task count vs. observed poll-idle ratio, across realistic configs (rate = total cluster throughput, split per-task): cost_based_scaledown_medium_7Mpm

Old version stays pinned at 128 until idle ~0.55, while new version consolidates from ~0.32.

cost_based_scaledown_large_30Mpm

Safe under load: new version consolidates earlier on the high-idle side, but at low idle both still jump to max — lag-driven scale-up is unaffected.

cost_based_v1_vs_v2_large_30Mpm_amp0 35

The existing version is flat (pinned at max by the phantom overrun); new version consolidates and holds more tasks as lag weight rises.

Release note

Fixed an issue where the cost-based supervisor autoscaler would not scale down an over-provisioned supervisor running above its ideal idle ratio with low lag.

  • self-reviewed.
  • added comments explaining the "why".
  • added/updated unit tests.

@Fly-Style Fly-Style self-assigned this Jun 5, 2026
@Fly-Style Fly-Style requested a review from kfaraz June 5, 2026 13:46

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

Reviewed 3 of 3 changed files.


This is an automated review by Codex GPT-5.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants