Skip to content

fix(pod-scaler): defer CPU limit strip until after authoritative decrease#5265

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
deepsm007:fix/authoritative-cpu-limit-decrease
Jun 24, 2026
Merged

fix(pod-scaler): defer CPU limit strip until after authoritative decrease#5265
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
deepsm007:fix/authoritative-cpu-limit-decrease

Conversation

@deepsm007

@deepsm007 deepsm007 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

DPTP-2938 moved authoritative decrease to limits-only and runs it after reconcileLimits, which removes CPU limits but keeps memory limits. CPU authoritative dry-run/apply never fired because the limit was already gone; memory worked fine on the same path.

/cc @hector-vido

pod-scaler: Fix CPU limit removal timing to enable authoritative decrease

This fix corrects the order of operations in the pod-scaler's admission webhook to ensure CPU limits can be properly decreased based on measured usage in authoritative mode.

The problem: A previous change moved the authoritative decrease functionality (which reduces CPU and memory limits based on Prometheus-measured usage) to run after reconcileLimits. However, reconcileLimits was unconditionally removing CPU limits while preserving memory limits. This meant that when the authoritative CPU decrease logic executed, the CPU limit had already been stripped from the container, preventing any CPU limit reduction from being applied. Memory authoritative decrease continued to work since memory limits were preserved.

The solution:

  • reconcileLimits now only enforces the 200% memory limit threshold and no longer removes CPU limits
  • CPU limits are now explicitly deleted after applyAuthoritativeLimitDecrease completes, but only when:
    • mutateResourceLimits is enabled (limit mutation is active)
    • authoritativeCPU is false (CPU authoritative decrease is disabled)

This ensures both CPU and memory limits remain available for the authoritative decrease logic to operate on before being stripped. CI operators who have enabled authoritative CPU decrease will now see CPU limits properly reduced according to measured usage data.

Files changed:

  • cmd/pod-scaler/admission.go: Modified reconcileLimits to skip CPU limit removal; added explicit CPU limit deletion after applyAuthoritativeLimitDecrease when appropriate
  • cmd/pod-scaler/admission_test.go: Updated test expectations to reflect that CPU limits are no longer removed by reconcileLimits

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci openshift-ci Bot requested a review from hector-vido June 22, 2026 17:29
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

reconcileLimits no longer deletes CPU limits; it now only enforces the memory-limit floor (≥200% of memory request). CPU limit deletion is moved into mutatePodResources as a conditional, nil-guarded block that fires only when mutateResourceLimits is true and authoritativeCPU is false. The TestReconcileLimits CPU case is updated to assert the CPU limit is preserved.

Changes

CPU Limit Deletion Relocation

Layer / File(s) Summary
CPU limit cleanup: reconcileLimits, mutatePodResources, and test
cmd/pod-scaler/admission.go, cmd/pod-scaler/admission_test.go
reconcileLimits drops the unconditional ResourceCPU limit deletion and only checks the memory 200% floor. mutatePodResources adds a nil-guarded conditional that deletes ResourceCPU from container limits when mutateResourceLimits is true and authoritativeCPU is false. TestReconcileLimits updated to assert CPU limit value 200 is retained rather than removed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 15 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Coverage For New Features ⚠️ Warning The PR fixes a bug where CPU limits were removed before applyAuthoritativeLimitDecrease could process them. While TestReconcileLimits was updated to verify CPU limits persist through reconcileLimit... Add a test case in TestMutatePodResources with authoritativeCPU=true and mutateResourceLimits=true that verifies CPU limits are processed by applyAuthoritativeLimitDecrease instead of being deleted before it runs.
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: deferring CPU limit removal until after the authoritative decrease operation completes, which is the core fix in this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Go Error Handling ✅ Passed The PR's changes to admission.go follow proper Go error handling patterns: (1) The new CPU limit deletion code (lines 402-406) correctly checks for nil before dereferencing Limits; (2) All fmt.Erro...
Stable And Deterministic Test Names ✅ Passed The test file uses standard Go table-driven tests, not Ginkgo tests. All 59 test case names are static string literals with no dynamic content, timestamps, UUIDs, or environment-dependent values.
Test Structure And Quality ✅ Passed Custom check requires review of Ginkgo test code, but the modified tests use standard Go testing package (testing.T), not Ginkgo. Check is not applicable to this PR.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests added; PR only modifies standard Go unit tests in admission_test.go. Check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR contains only unit tests in cmd/pod-scaler/, not Ginkgo e2e tests. SNO compatibility check applies only to new e2e tests and is not applicable here.
Topology-Aware Scheduling Compatibility ✅ Passed This PR modifies only pod resource limit mutation logic in an admission webhook (reconcileLimits and applyAuthoritativeLimitDecrease). It does not introduce deployment manifests, controllers, or an...
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes introduced. Changes are in webhook handler functions using logrus (stderr by default), with no fmt.Print, fmt.Println, or klog stdout violations.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR contains only unit test changes to cmd/pod-scaler using standard Go testing framework, not Ginkgo e2e tests. Check is not applicable.
No-Weak-Crypto ✅ Passed Changes are to pod resource scaling logic with no cryptographic operations; no weak crypto patterns detected in modified files.
Container-Privileges ✅ Passed PR contains no container privilege escalation or unsafe security configurations. Changes are limited to resource limit mutation logic in admission.go, with no modifications to security contexts, ca...
No-Sensitive-Data-In-Logs ✅ Passed No logging statements that expose sensitive data (passwords, tokens, API keys, PII, etc.) were added. The code changes introduce CPU limit deletion logic without any logging.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2026
@hector-vido

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 22, 2026
@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deepsm007, hector-vido

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [deepsm007,hector-vido]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/pod-scaler/admission.go (1)

402-406: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add or point to coverage for the relocated CPU cleanup branch.

This production path now owns the key sequencing contract: authoritative CPU decrease/dry-run must see the configured CPU limit before it is stripped for non-authoritative CPU mode. Please cover at least mutateResourceLimits=true, authoritativeCPU=false and an authoritative CPU/dry-run case. As per coding guidelines, “The author should check and ensure the presubmits of the PR run successfully” and reviewers should assess “Tests” and production impact.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pod-scaler/admission.go` around lines 402 - 406, The relocated CPU
cleanup branch in the mutateResourceLimits and authoritativeCPU conditions is
missing test coverage for a critical sequencing contract: authoritative CPU
decrease/dry-run operations must observe the configured CPU limit before it gets
stripped in non-authoritative CPU mode. Add test cases covering the scenario
where mutateResourceLimits is true and authoritativeCPU is false (the delete
containers[i].Resources.Limits corev1.ResourceCPU code path), and add a test for
the authoritative CPU with dry-run scenario. These tests should verify that the
CPU limit is visible to authoritative operations before being removed for
non-authoritative CPU mode, ensuring the production sequencing contract is
properly validated.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/pod-scaler/admission.go`:
- Around line 402-406: The relocated CPU cleanup branch in the
mutateResourceLimits and authoritativeCPU conditions is missing test coverage
for a critical sequencing contract: authoritative CPU decrease/dry-run
operations must observe the configured CPU limit before it gets stripped in
non-authoritative CPU mode. Add test cases covering the scenario where
mutateResourceLimits is true and authoritativeCPU is false (the delete
containers[i].Resources.Limits corev1.ResourceCPU code path), and add a test for
the authoritative CPU with dry-run scenario. These tests should verify that the
CPU limit is visible to authoritative operations before being removed for
non-authoritative CPU mode, ensuring the production sequencing contract is
properly validated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 95737a9a-0777-49e9-9fbf-f4090ca13d29

📥 Commits

Reviewing files that changed from the base of the PR and between 2bfca56 and cb34ba2.

📒 Files selected for processing (2)
  • cmd/pod-scaler/admission.go
  • cmd/pod-scaler/admission_test.go
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • openshift/release (manual)
  • openshift/ci-docs (manual)
  • openshift/release-controller (manual)
  • openshift/ci-chat-bot (manual)

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2bfca56 and 2 for PR HEAD cb34ba2 in total

@deepsm007

Copy link
Copy Markdown
Contributor Author

/test e2e integration

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 7f8c694 and 1 for PR HEAD cb34ba2 in total

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD e05bba9 and 0 for PR HEAD cb34ba2 in total

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/hold

Revision cb34ba2 was retested 3 times: holding

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2026
@deepsm007

Copy link
Copy Markdown
Contributor Author

/retest

@deepsm007

Copy link
Copy Markdown
Contributor Author

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2026
@deepsm007

Copy link
Copy Markdown
Contributor Author

/override ci/prow/e2e
/override ci/prow/integration

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@deepsm007: Overrode contexts on behalf of deepsm007: ci/prow/e2e, ci/prow/integration

Details

In response to this:

/override ci/prow/e2e
/override ci/prow/integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD a1a4aa3 and 2 for PR HEAD cb34ba2 in total

@deepsm007

Copy link
Copy Markdown
Contributor Author

/retest-required

@deepsm007

Copy link
Copy Markdown
Contributor Author

/override ci/prow/integration

@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@deepsm007: Overrode contexts on behalf of deepsm007: ci/prow/integration

Details

In response to this:

/override ci/prow/integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 50714d1 and 1 for PR HEAD cb34ba2 in total

@deepsm007

Copy link
Copy Markdown
Contributor Author

/override ci/prow/integration

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@deepsm007: Overrode contexts on behalf of deepsm007: ci/prow/integration

Details

In response to this:

/override ci/prow/integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@deepsm007

Copy link
Copy Markdown
Contributor Author

/override ci/prow/images

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@deepsm007: Overrode contexts on behalf of deepsm007: ci/prow/images

Details

In response to this:

/override ci/prow/images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@deepsm007: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit f3d07fd into openshift:main Jun 24, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants