fix(pod-scaler): defer CPU limit strip until after authoritative decrease#5265
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
📝 WalkthroughWalkthrough
ChangesCPU Limit Deletion Relocation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 15 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deepsm007, hector-vido The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/pod-scaler/admission.go (1)
402-406: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd or point to coverage for the relocated CPU cleanup branch.
This production path now owns the key sequencing contract: authoritative CPU decrease/dry-run must see the configured CPU limit before it is stripped for non-authoritative CPU mode. Please cover at least
mutateResourceLimits=true, authoritativeCPU=falseand an authoritative CPU/dry-run case. As per coding guidelines, “The author should check and ensure the presubmits of the PR run successfully” and reviewers should assess “Tests” and production impact.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/pod-scaler/admission.go` around lines 402 - 406, The relocated CPU cleanup branch in the mutateResourceLimits and authoritativeCPU conditions is missing test coverage for a critical sequencing contract: authoritative CPU decrease/dry-run operations must observe the configured CPU limit before it gets stripped in non-authoritative CPU mode. Add test cases covering the scenario where mutateResourceLimits is true and authoritativeCPU is false (the delete containers[i].Resources.Limits corev1.ResourceCPU code path), and add a test for the authoritative CPU with dry-run scenario. These tests should verify that the CPU limit is visible to authoritative operations before being removed for non-authoritative CPU mode, ensuring the production sequencing contract is properly validated.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cmd/pod-scaler/admission.go`:
- Around line 402-406: The relocated CPU cleanup branch in the
mutateResourceLimits and authoritativeCPU conditions is missing test coverage
for a critical sequencing contract: authoritative CPU decrease/dry-run
operations must observe the configured CPU limit before it gets stripped in
non-authoritative CPU mode. Add test cases covering the scenario where
mutateResourceLimits is true and authoritativeCPU is false (the delete
containers[i].Resources.Limits corev1.ResourceCPU code path), and add a test for
the authoritative CPU with dry-run scenario. These tests should verify that the
CPU limit is visible to authoritative operations before being removed for
non-authoritative CPU mode, ensuring the production sequencing contract is
properly validated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 95737a9a-0777-49e9-9fbf-f4090ca13d29
📒 Files selected for processing (2)
cmd/pod-scaler/admission.gocmd/pod-scaler/admission_test.go
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift/release(manual)openshift/ci-docs(manual)openshift/release-controller(manual)openshift/ci-chat-bot(manual)
|
/test e2e integration |
|
/hold Revision cb34ba2 was retested 3 times: holding |
|
/retest |
|
/unhold |
|
/override ci/prow/e2e |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
@deepsm007: Overrode contexts on behalf of deepsm007: ci/prow/e2e, ci/prow/integration DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest-required |
|
/override ci/prow/integration |
|
@deepsm007: Overrode contexts on behalf of deepsm007: ci/prow/integration DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
/override ci/prow/integration |
|
@deepsm007: Overrode contexts on behalf of deepsm007: ci/prow/integration DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/images |
|
@deepsm007: Overrode contexts on behalf of deepsm007: ci/prow/images DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@deepsm007: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
DPTP-2938 moved authoritative decrease to limits-only and runs it after reconcileLimits, which removes CPU limits but keeps memory limits. CPU authoritative dry-run/apply never fired because the limit was already gone; memory worked fine on the same path.
/cc @hector-vido
pod-scaler: Fix CPU limit removal timing to enable authoritative decrease
This fix corrects the order of operations in the pod-scaler's admission webhook to ensure CPU limits can be properly decreased based on measured usage in authoritative mode.
The problem: A previous change moved the authoritative decrease functionality (which reduces CPU and memory limits based on Prometheus-measured usage) to run after
reconcileLimits. However,reconcileLimitswas unconditionally removing CPU limits while preserving memory limits. This meant that when the authoritative CPU decrease logic executed, the CPU limit had already been stripped from the container, preventing any CPU limit reduction from being applied. Memory authoritative decrease continued to work since memory limits were preserved.The solution:
reconcileLimitsnow only enforces the 200% memory limit threshold and no longer removes CPU limitsapplyAuthoritativeLimitDecreasecompletes, but only when:mutateResourceLimitsis enabled (limit mutation is active)authoritativeCPUis false (CPU authoritative decrease is disabled)This ensures both CPU and memory limits remain available for the authoritative decrease logic to operate on before being stripped. CI operators who have enabled authoritative CPU decrease will now see CPU limits properly reduced according to measured usage data.
Files changed:
cmd/pod-scaler/admission.go: ModifiedreconcileLimitsto skip CPU limit removal; added explicit CPU limit deletion afterapplyAuthoritativeLimitDecreasewhen appropriatecmd/pod-scaler/admission_test.go: Updated test expectations to reflect that CPU limits are no longer removed byreconcileLimits