Skip to content

OCPNODE-4487: replace --system-reserved flags with config drop-in#6044

Open
nispriha wants to merge 1 commit into
openshift:mainfrom
nispriha:njagan/auto-sizing-dropin
Open

OCPNODE-4487: replace --system-reserved flags with config drop-in#6044
nispriha wants to merge 1 commit into
openshift:mainfrom
nispriha:njagan/auto-sizing-dropin

Conversation

@nispriha
Copy link
Copy Markdown
Contributor

@nispriha nispriha commented May 14, 2026

- What I did

Remove EnvironmentFile=/etc/node-sizing.env and the --system-reserved command-line flag from kubelet.service. The auto-sizing script (dynamic-system-reserved-calc.sh) now writes a KubeletConfiguration drop-in file to /etc/openshift/kubelet.conf.d/20-auto-sizing.conf, which kubelet reads via --config-dir flag. Add --config-dir to master and arbiter kubelet.service for consistency with workers.

- How to verify it

  • go test ./pkg/controller/kubelet-config/... -v passes
  • go test ./pkg/controller/template/... -v passes (verifies --config-dir is present on all node roles)
  • e2e test TestKubeletAutoSizingDropIn verifies the auto-sizing drop-in exists with systemReserved values

- Description for the changelog

Kubelet now receives systemReserved values via a KubeletConfiguration drop-in instead of --system-reserved command-line flags.

Summary by CodeRabbit

  • Refactor

    • Kubelet now loads system-reserved settings from an OpenShift kubelet config drop-in directory (created at startup) instead of embedding them on the command line or using an external node-sizing env file; unit templates stop sourcing the old env file and adjust environment sources.
  • Tests

    • New e2e test verifies the kubelet auto-sizing drop-in exists on nodes and contains systemReserved settings.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 14, 2026

@nispriha: This pull request references OCPNODE-4487 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

- What I did

Remove EnvironmentFile=/etc/node-sizing.env and the --system-reserved command-line flag from worker kubelet.service. The auto-sizing script (dynamic-system-reserved-calc.sh) now writes a KubeletConfiguration drop-in file to /etc/openshift/kubelet.conf.d/20-auto-sizing.conf, which kubelet reads via its existing --config-dir flag. This eliminates the round trip through the environment file for worker nodes. Master and arbiter nodes remain unchanged.

- How to verify it

  • go test ./pkg/controller/kubelet-config/... -v passes
  • e2e test TestKubeletAutoSizingDropIn verifies worker kubelet.service does not contain --system-reserved or EnvironmentFile=/etc/node-sizing.env
  • On a running cluster, verify /etc/openshift/kubelet.conf.d/20-auto-sizing.conf exists on worker nodes after boot

- Description for the changelog

Worker kubelet now receives systemReserved values via a KubeletConfiguration drop-in instead of --system-reserved command-line flags.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nispriha
Once this PR has been reviewed and has the lgtm label, please assign pablintino for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Move kubelet system-reserved config from ExecStart flags and /etc/node-sizing.env into an auto-generated drop-in at /etc/openshift/kubelet.conf.d/20-auto-sizing.conf. Unit templates create/use that dir and drop the --system-reserved flag. Tests updated accordingly.

Changes

Kubelet Auto-Sizing Configuration Migration

Layer / File(s) Summary
Auto-sizing script helper and integration
templates/common/_base/files/kubelet-auto-sizing.yaml
Script sets SYSTEM_RESERVED_MEMORY, SYSTEM_RESERVED_CPU, SYSTEM_RESERVED_ES and adds write_kubelet_config_dropin to write /etc/openshift/kubelet.conf.d/20-auto-sizing.conf. dynamic_node_sizing and static_node_sizing call this helper and no longer emit or rely on /etc/node-sizing.env.
Master unit templates (base & on-prem)
templates/master/.../kubelet.service.yaml
Create /etc/openshift/kubelet.conf.d at startup, add --config-dir=/etc/openshift/kubelet.conf.d to ExecStart, remove EnvironmentFile=/etc/node-sizing.env, and remove the --system-reserved flag.
Arbiter unit templates (base & on-prem)
templates/arbiter/.../kubelet.service.yaml
Add ExecStartPre to create /etc/openshift/kubelet.conf.d, add --config-dir=/etc/openshift/kubelet.conf.d, remove EnvironmentFile=/etc/node-sizing.env, and drop --system-reserved from ExecStart.
Worker unit templates (base & on-prem)
templates/worker/.../kubelet.service.yaml
Stop sourcing /etc/node-sizing.env (use /etc/os-release plus optional kubelet env files) and remove --system-reserved=cpu=...,memory=...,ephemeral-storage=... from ExecStart.
E2E and unit-file tests
test/e2e-2of2/kubeletcfg_test.go, pkg/controller/template/kubelet_config_dir_test.go
Add TestKubeletAutoSizingDropIn to verify the generated drop-in contains kind: KubeletConfiguration and systemReserved:; update unit-file test to assert --config-dir=/etc/openshift/kubelet.conf.d across roles.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

lgtm

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning TestKubeletAutoSizingDropIn validates only the systemReserved: section header but not concrete entries (cpu, memory, ephemeral-storage) as requested in review comments. Add assertions for concrete systemReserved entries: cpu:, memory:, and ephemeral-storage: to ensure actual values are written.
Single Node Openshift (Sno) Test Compatibility ⚠️ Warning TestKubeletAutoSizingDropIn iterates over roles [worker, master] using GetRandomNode which fails if a role has no nodes. SNO lacks separate worker nodes. Add [Skipped:SingleReplicaTopology] label to test, or guard with IsSingleNode() check and skip on SNO deployments.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: replacing --system-reserved flags with a config drop-in mechanism across kubelet service units.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test names use stable, static values: TestKubeletAutoSizingDropIn uses literal "worker"/"master", TestKubeletConfigDirParameter uses static platform names. No dynamic identifiers in test titles.
Microshift Test Compatibility ✅ Passed The PR adds only standard Go tests (testing.T), not Ginkgo e2e tests. The custom check applies only to new Ginkgo tests (Describe, Context, It, When), which are not present in this PR.
Topology-Aware Scheduling Compatibility ✅ Passed PR only modifies kubelet configuration delivery (bash script, systemd units, tests); introduces no pod scheduling constraints, affinity rules, or topology assumptions.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations detected. TestMain only calls os.Exit(m.Run()), all test code is in Test* functions, and no process-level code writes to stdout.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed TestKubeletAutoSizingDropIn reads a local node file with no IPv4 hardcoding, IPv4-only assumptions, or external connectivity requirements. Works in IPv6 and disconnected environments.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nispriha
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@nispriha
Copy link
Copy Markdown
Contributor Author

/retest

@haircommander
Copy link
Copy Markdown
Member

is the control plane unchanged just because we're anxious about breaking something? I feel it'd be better to set consistently TBH

@nispriha nispriha force-pushed the njagan/auto-sizing-dropin branch from 8c2d700 to de0039f Compare May 21, 2026 10:24
@nispriha
Copy link
Copy Markdown
Contributor Author

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/e2e-2of2/kubeletcfg_test.go (1)

361-377: ⚖️ Poor tradeoff

Test implementation doesn't match PR description.

The PR verification steps state: "e2e test TestKubeletAutoSizingDropIn verifies worker kubelet.service does not contain --system-reserved or EnvironmentFile=/etc/node-sizing.env."

However, the test only verifies that the drop-in file exists and contains the expected YAML structure. It does not verify that --system-reserved flags or EnvironmentFile=/etc/node-sizing.env have been removed from the kubelet.service unit file.

Consider either updating the PR description to match the actual test behavior, or enhancing the test to verify the removal of the old mechanism.

Optional: Enhanced test to verify flag removal
// After checking drop-in exists, verify old mechanism removed:
unitContent := helpers.ExecCmdOnNode(t, cs, node, "systemctl", "cat", "kubelet.service")
require.NotContains(t, unitContent, "--system-reserved",
    "%s kubelet.service should not contain --system-reserved flag", role)
require.NotContains(t, unitContent, "EnvironmentFile=/etc/node-sizing.env",
    "%s kubelet.service should not source /etc/node-sizing.env", role)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-2of2/kubeletcfg_test.go` around lines 361 - 377, The test
TestKubeletAutoSizingDropIn only checks the drop-in file content but the PR
description promises verifying removal of the old mechanism; update the test to
also fetch the kubelet unit via helpers.ExecCmdOnNode (e.g., run "systemctl cat
kubelet.service") and add assertions using require.NotContains to ensure the
unit does not include "--system-reserved" and does not reference
"EnvironmentFile=/etc/node-sizing.env", or alternatively change the PR
description to reflect that the test only validates the drop-in YAML.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@templates/master/01-master-kubelet/_base/units/kubelet.service.yaml`:
- Line 13: The PR summary is misleading because the master kubelet unit (because
of the ExecStartPre creating /etc/openshift/kubelet.conf.d and the kubelet
invocation with --config-dir=/etc/openshift/kubelet.conf.d) will now load
drop-in files such as the kubelet-auto-sizing drop-in, so update the PR summary
to explicitly state that masters (and arbiters if applicable) will pick up
systemReserved via the drop-in mechanism; mention the specific symbols
TestKubeletAutoSizingDropIn and
templates/common/_base/files/kubelet-auto-sizing.yaml and clarify exactly what
"remains unchanged" (e.g., default kubelet behavior and existing static config
files) versus what is new (masters now consume drop-ins from
/etc/openshift/kubelet.conf.d created by ExecStartPre and the --config-dir
flag).

In `@test/e2e-2of2/kubeletcfg_test.go`:
- Around line 367-376: The test currently iterates over roles []string{"worker",
"master"} and validates the kubelet drop-in via helpers.GetRandomNode,
helpers.ExecCmdOnNode and require.Contains; update that role slice to include
"arbiter" (i.e. []string{"worker", "master", "arbiter"}) so the t.Run loop
covers the arbiter nodes and exercises the same assertions against the drop-in
content (the require.Contains checks for "kind: KubeletConfiguration" and
"systemReserved") using the existing helpers and dropInPath variable.

---

Nitpick comments:
In `@test/e2e-2of2/kubeletcfg_test.go`:
- Around line 361-377: The test TestKubeletAutoSizingDropIn only checks the
drop-in file content but the PR description promises verifying removal of the
old mechanism; update the test to also fetch the kubelet unit via
helpers.ExecCmdOnNode (e.g., run "systemctl cat kubelet.service") and add
assertions using require.NotContains to ensure the unit does not include
"--system-reserved" and does not reference
"EnvironmentFile=/etc/node-sizing.env", or alternatively change the PR
description to reflect that the test only validates the drop-in YAML.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d9606d45-a14a-49b8-95af-7d23899f0929

📥 Commits

Reviewing files that changed from the base of the PR and between 8c2d700 and de0039f.

📒 Files selected for processing (9)
  • pkg/controller/template/kubelet_config_dir_test.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • templates/common/_base/files/kubelet-auto-sizing.yaml
  • templates/master/01-master-kubelet/_base/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • test/e2e-2of2/kubeletcfg_test.go
💤 Files with no reviewable changes (2)
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml

Comment thread templates/master/01-master-kubelet/_base/units/kubelet.service.yaml
Comment on lines +367 to +376
for _, role := range []string{"worker", "master"} {
t.Run(role, func(t *testing.T) {
node := helpers.GetRandomNode(t, cs, role)
dropIn := helpers.ExecCmdOnNode(t, cs, node, "cat", filepath.Join("/rootfs", dropInPath))
require.Contains(t, dropIn, "kind: KubeletConfiguration",
"%s drop-in should be a KubeletConfiguration", role)
require.Contains(t, dropIn, "systemReserved:",
"%s drop-in should contain systemReserved", role)
})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test coverage gap: arbiter role not tested.

The test checks worker and master roles but omits arbiter, even though arbiter kubelet.service templates were modified in this PR (added --config-dir in templates/arbiter/01-arbiter-kubelet/*/units/kubelet.service.yaml).

📝 Suggested fix to include arbiter role
-	for _, role := range []string{"worker", "master"} {
+	for _, role := range []string{"worker", "master", "arbiter"} {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, role := range []string{"worker", "master"} {
t.Run(role, func(t *testing.T) {
node := helpers.GetRandomNode(t, cs, role)
dropIn := helpers.ExecCmdOnNode(t, cs, node, "cat", filepath.Join("/rootfs", dropInPath))
require.Contains(t, dropIn, "kind: KubeletConfiguration",
"%s drop-in should be a KubeletConfiguration", role)
require.Contains(t, dropIn, "systemReserved:",
"%s drop-in should contain systemReserved", role)
})
}
for _, role := range []string{"worker", "master", "arbiter"} {
t.Run(role, func(t *testing.T) {
node := helpers.GetRandomNode(t, cs, role)
dropIn := helpers.ExecCmdOnNode(t, cs, node, "cat", filepath.Join("/rootfs", dropInPath))
require.Contains(t, dropIn, "kind: KubeletConfiguration",
"%s drop-in should be a KubeletConfiguration", role)
require.Contains(t, dropIn, "systemReserved:",
"%s drop-in should contain systemReserved", role)
})
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-2of2/kubeletcfg_test.go` around lines 367 - 376, The test currently
iterates over roles []string{"worker", "master"} and validates the kubelet
drop-in via helpers.GetRandomNode, helpers.ExecCmdOnNode and require.Contains;
update that role slice to include "arbiter" (i.e. []string{"worker", "master",
"arbiter"}) so the t.Run loop covers the arbiter nodes and exercises the same
assertions against the drop-in content (the require.Contains checks for "kind:
KubeletConfiguration" and "systemReserved") using the existing helpers and
dropInPath variable.

@ngopalak-redhat
Copy link
Copy Markdown
Contributor

/payload-aggregate periodic-ci-openshift-release-master-nightly-5.0-e2e-aws-ovn-upgrade-fips 2
/test e2e-aws-mco-disruptive

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@ngopalak-redhat: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@ngopalak-redhat
Copy link
Copy Markdown
Contributor

/test e2e-aws-mco-disruptive

@ngopalak-redhat
Copy link
Copy Markdown
Contributor

/payload-aggregate periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-upgrade-fips 1

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@ngopalak-redhat: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@ngopalak-redhat
Copy link
Copy Markdown
Contributor

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-disruptive-longrunning

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-disruptive-longrunning

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3f2682a0-5502-11f1-948e-a3de092345bc-0

@nispriha nispriha changed the title OCPNODE-4487: replace worker --system-reserved flags with config drop-in OCPNODE-4487: replace --system-reserved flags with config drop-in May 21, 2026
@nispriha
Copy link
Copy Markdown
Contributor Author

/retest e2e-aws-mco-disruptive

@nispriha
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-mco-disruptive

@nispriha
Copy link
Copy Markdown
Contributor Author

/test e2e-openstack

controllerConfig: "./test_data/controller_config_aws.yaml",
expectedWorkerFlag: true,
expectedMasterFlag: false,
expectedMasterFlag: true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ever false now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since all node roles now have --config-dir, it's always true. I have removed the boolean flags and simplified the test to just assert --config-dir is present for all roles.

recommended_systemreserved_memory=$(echo $recommended_systemreserved_memory | awk '{printf("%d\n",$1 + 0.5)}') # Round off so we avoid float conversions
echo "SYSTEM_RESERVED_MEMORY=${recommended_systemreserved_memory}Gi">> ${NODE_SIZES_ENV}
SYSTEM_RESERVED_MEMORY="${recommended_systemreserved_memory}Gi"
echo "SYSTEM_RESERVED_MEMORY=${SYSTEM_RESERVED_MEMORY}">> ${NODE_SIZES_ENV}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what cases do we still need NODE_SIZES_ENV?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, since all node roles use the drop-in now, nothing reads /etc/node-sizing.env anymore. I have removed all NODE_SIZES_ENV references from the script.

@nispriha nispriha force-pushed the njagan/auto-sizing-dropin branch from de0039f to f3de103 Compare May 25, 2026 14:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e-2of2/kubeletcfg_test.go (1)

371-374: ⚡ Quick win

Strengthen drop-in assertions to validate actual systemReserved fields.

The current checks can pass even if the block exists but lacks required keys. Consider asserting cpu:, memory:, and ephemeral-storage: too.

💡 Suggested enhancement
 			require.Contains(t, dropIn, "kind: KubeletConfiguration",
 				"%s drop-in should be a KubeletConfiguration", role)
 			require.Contains(t, dropIn, "systemReserved:",
 				"%s drop-in should contain systemReserved", role)
+			require.Contains(t, dropIn, "cpu:",
+				"%s drop-in should include systemReserved.cpu", role)
+			require.Contains(t, dropIn, "memory:",
+				"%s drop-in should include systemReserved.memory", role)
+			require.Contains(t, dropIn, "ephemeral-storage:",
+				"%s drop-in should include systemReserved.ephemeral-storage", role)
 		})
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-2of2/kubeletcfg_test.go` around lines 371 - 374, The current
assertions only check for the presence of the "systemReserved:" block but not
that it contains required fields; modify the test in kubeletcfg_test.go (the
assertions using require.Contains on dropIn and role) to also assert that dropIn
includes the specific keys "cpu:", "memory:", and "ephemeral-storage:" (e.g.,
require.Contains(t, dropIn, "cpu:", "%s drop-in should contain systemReserved
cpu", role) and similarly for memory and ephemeral-storage) so the drop-in not
only has the block but the expected reserved resources.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controller/template/kubelet_config_dir_test.go`:
- Around line 59-65: The test iterates ign.Systemd.Units looking for unit.Name
== "kubelet.service" but never asserts that such a unit was found, allowing
false positives; update the loop in kubelet_config_dir_test (the block that
checks ign.Systemd.Units and calls assert.Contains on *unit.Contents) to set a
found boolean when you locate the kubelet.service unit, break, then after the
loop add an explicit assertion (e.g., assert.True or assert.NotNil with a clear
message referencing the platform/role) that the kubelet.service unit was present
before checking its contents.

---

Nitpick comments:
In `@test/e2e-2of2/kubeletcfg_test.go`:
- Around line 371-374: The current assertions only check for the presence of the
"systemReserved:" block but not that it contains required fields; modify the
test in kubeletcfg_test.go (the assertions using require.Contains on dropIn and
role) to also assert that dropIn includes the specific keys "cpu:", "memory:",
and "ephemeral-storage:" (e.g., require.Contains(t, dropIn, "cpu:", "%s drop-in
should contain systemReserved cpu", role) and similarly for memory and
ephemeral-storage) so the drop-in not only has the block but the expected
reserved resources.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c8586419-2d99-49af-a488-66145bf0ae17

📥 Commits

Reviewing files that changed from the base of the PR and between de0039f and f3de103.

📒 Files selected for processing (9)
  • pkg/controller/template/kubelet_config_dir_test.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • templates/common/_base/files/kubelet-auto-sizing.yaml
  • templates/master/01-master-kubelet/_base/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • test/e2e-2of2/kubeletcfg_test.go
💤 Files with no reviewable changes (2)
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml

Comment thread pkg/controller/template/kubelet_config_dir_test.go
@nispriha
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-disruptive-longrunning

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

@nispriha: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-disruptive-longrunning

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c6368b00-584f-11f1-9876-1c7cc9194f05-0

@nispriha nispriha force-pushed the njagan/auto-sizing-dropin branch from f3de103 to 3292bdc Compare May 25, 2026 16:22
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/e2e-2of2/kubeletcfg_test.go (1)

367-376: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test coverage gap: arbiter role not tested.

The test validates worker and master roles but omits arbiter, even though this PR modified arbiter kubelet.service templates to add --config-dir (per the review stack context).

Before adding "arbiter" to the role slice, verify that helpers.GetRandomNode handles clusters without arbiter nodes gracefully:

#!/bin/bash
# Description: Check if GetRandomNode helper has skip/error handling for missing roles

rg -nP -A10 -B2 'func GetRandomNode' --type=go

If the helper skips gracefully when no nodes match the role, add "arbiter" to the slice. Otherwise, wrap the arbiter test in a conditional check that skips when no arbiter nodes exist.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-2of2/kubeletcfg_test.go` around lines 367 - 376, Add coverage for
the arbiter role: update the role iteration in the test to include "arbiter"
(the slice passed to the loop) if helpers.GetRandomNode supports missing-role
skipping; otherwise, before calling helpers.GetRandomNode, query whether any
arbiter nodes exist and call t.Skip when none are present. Locate the loop that
iterates over []string{"worker","master"} and adjust it (or add a conditional
t.Skip) around the call to helpers.GetRandomNode and subsequent
helpers.ExecCmdOnNode and assertions that inspect the drop-in content ("dropIn",
dropInPath) so the test safely handles clusters without arbiter nodes. Ensure
you reference helpers.GetRandomNode, helpers.ExecCmdOnNode, and the existing
assertions when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@test/e2e-2of2/kubeletcfg_test.go`:
- Around line 367-376: Add coverage for the arbiter role: update the role
iteration in the test to include "arbiter" (the slice passed to the loop) if
helpers.GetRandomNode supports missing-role skipping; otherwise, before calling
helpers.GetRandomNode, query whether any arbiter nodes exist and call t.Skip
when none are present. Locate the loop that iterates over
[]string{"worker","master"} and adjust it (or add a conditional t.Skip) around
the call to helpers.GetRandomNode and subsequent helpers.ExecCmdOnNode and
assertions that inspect the drop-in content ("dropIn", dropInPath) so the test
safely handles clusters without arbiter nodes. Ensure you reference
helpers.GetRandomNode, helpers.ExecCmdOnNode, and the existing assertions when
making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 434dfa8e-c4c4-46a0-a1ec-f100fb21b290

📥 Commits

Reviewing files that changed from the base of the PR and between f3de103 and 3292bdc.

📒 Files selected for processing (9)
  • pkg/controller/template/kubelet_config_dir_test.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • templates/common/_base/files/kubelet-auto-sizing.yaml
  • templates/master/01-master-kubelet/_base/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • test/e2e-2of2/kubeletcfg_test.go
💤 Files with no reviewable changes (2)
  • templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml

Remove EnvironmentFile=/etc/node-sizing.env and the --system-reserved
command-line flag from kubelet.service. The auto-sizing script now writes a KubeletConfiguration
drop-in file to /etc/openshift/kubelet.conf.d/20-auto-sizing.conf,
which kubelet reads via --config-dir. Add --config-dir to master and
arbiter kubelet.service for consistency with workers.
@nispriha nispriha force-pushed the njagan/auto-sizing-dropin branch from 3292bdc to 531c6d4 Compare May 25, 2026 16:50
@nispriha
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-mco-disruptive

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/controller/template/kubelet_config_dir_test.go (1)

59-65: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Require a kubelet.service match so the test can’t silently pass.

On Line 59, the assertion only runs conditionally; if no kubelet.service is found, this test still passes. Please track and assert that at least one kubelet unit was validated (ideally per role) for each platform case.

Suggested fix
 		for _, tc := range testCases {
 			t.Run(tc.name, func(t *testing.T) {
@@
-			for _, cfg := range cfgs {
+			roleHasKubelet := map[string]bool{}
+			for _, cfg := range cfgs {
 				role, ok := cfg.Labels[mcfgv1.MachineConfigRoleLabelKey]
 				require.True(t, ok, "Machine config missing role label")
+				if _, seen := roleHasKubelet[role]; !seen {
+					roleHasKubelet[role] = false
+				}
@@
 				for _, unit := range ign.Systemd.Units {
 					if unit.Name == "kubelet.service" && unit.Contents != nil {
+						roleHasKubelet[role] = true
 						assert.Contains(t, *unit.Contents, "--config-dir=/etc/openshift/kubelet.conf.d",
 							"%s node should have --config-dir for platform %s", role, tc.name)
 						break
 					}
 				}
 			}
+			for role, found := range roleHasKubelet {
+				require.True(t, found, "kubelet.service unit not found for %s on platform %s", role, tc.name)
+			}
 		})
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/template/kubelet_config_dir_test.go` around lines 59 - 65, The
test iterates ign.Systemd.Units and only asserts when it finds a unit with
unit.Name == "kubelet.service", so it can silently pass if no kubelet unit
exists; modify the loop in kubelet_config_dir_test.go to set a boolean flag
(e.g., foundKubelet) when a matching unit is seen and validated (inside the
existing for _, unit := range ign.Systemd.Units { if unit.Name ==
"kubelet.service" ... }), then after the loop assert that foundKubelet is true
(assert.True or require.True) with a clear message mentioning the role and
tc.name so the test fails if no kubelet.service was present for that
platform/role.
test/e2e-2of2/kubeletcfg_test.go (1)

367-375: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include arbiter coverage (with skip when arbiter nodes are absent).

This loop currently validates only worker and master, but this PR also changes arbiter kubelet wiring. Please include arbiter in this test path, guarded so clusters without arbiter nodes are skipped instead of failed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-2of2/kubeletcfg_test.go` around lines 367 - 375, The test loop
should include "arbiter" but skip the subtest if the cluster has no arbiter
nodes; update the role list to include "arbiter" and, inside the t.Run for
arbiter, check for arbiter node existence (preferably via an existing helper
such as helpers.HasRole / helpers.CountNodesWithRole or by calling
helpers.GetRandomNode and skipping on a nil/absent result) before running the
assertions that use helpers.GetRandomNode, helpers.ExecCmdOnNode and dropInPath;
if no arbiter node is present call t.Skipf to quietly skip the arbiter subtest.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-2of2/kubeletcfg_test.go`:
- Around line 371-374: The test currently only asserts the presence of the
"systemReserved:" header; update the assertions that reference dropIn and role
to validate concrete entries under systemReserved (e.g., check for "cpu",
"memory", and "ephemeral-storage" keys/values). Replace or augment the
require.Contains calls on dropIn (in this test function) with additional
require.Contains (or regex matches) that look for the expected
"systemReserved:\n" entries such as "systemReserved:\n  cpu:",
"systemReserved:\n  memory:", and "systemReserved:\n  ephemeral-storage:" so the
test verifies actual values are written rather than an empty section.

---

Duplicate comments:
In `@pkg/controller/template/kubelet_config_dir_test.go`:
- Around line 59-65: The test iterates ign.Systemd.Units and only asserts when
it finds a unit with unit.Name == "kubelet.service", so it can silently pass if
no kubelet unit exists; modify the loop in kubelet_config_dir_test.go to set a
boolean flag (e.g., foundKubelet) when a matching unit is seen and validated
(inside the existing for _, unit := range ign.Systemd.Units { if unit.Name ==
"kubelet.service" ... }), then after the loop assert that foundKubelet is true
(assert.True or require.True) with a clear message mentioning the role and
tc.name so the test fails if no kubelet.service was present for that
platform/role.

In `@test/e2e-2of2/kubeletcfg_test.go`:
- Around line 367-375: The test loop should include "arbiter" but skip the
subtest if the cluster has no arbiter nodes; update the role list to include
"arbiter" and, inside the t.Run for arbiter, check for arbiter node existence
(preferably via an existing helper such as helpers.HasRole /
helpers.CountNodesWithRole or by calling helpers.GetRandomNode and skipping on a
nil/absent result) before running the assertions that use helpers.GetRandomNode,
helpers.ExecCmdOnNode and dropInPath; if no arbiter node is present call t.Skipf
to quietly skip the arbiter subtest.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2d184d49-0a8b-4734-9192-40137339b56b

📥 Commits

Reviewing files that changed from the base of the PR and between 3292bdc and 531c6d4.

📒 Files selected for processing (9)
  • pkg/controller/template/kubelet_config_dir_test.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • templates/common/_base/files/kubelet-auto-sizing.yaml
  • templates/master/01-master-kubelet/_base/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • test/e2e-2of2/kubeletcfg_test.go
💤 Files with no reviewable changes (2)
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml

Comment on lines +371 to +374
require.Contains(t, dropIn, "kind: KubeletConfiguration",
"%s drop-in should be a KubeletConfiguration", role)
require.Contains(t, dropIn, "systemReserved:",
"%s drop-in should contain systemReserved", role)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate concrete systemReserved entries, not only the section header.

The current assertions pass even if systemReserved is present but empty. Add checks for expected entries (for example cpu, memory, and ephemeral-storage) so the test actually verifies values are written.

Suggested fix
 			require.Contains(t, dropIn, "kind: KubeletConfiguration",
 				"%s drop-in should be a KubeletConfiguration", role)
 			require.Contains(t, dropIn, "systemReserved:",
 				"%s drop-in should contain systemReserved", role)
+			require.Contains(t, dropIn, "cpu:",
+				"%s drop-in should contain systemReserved cpu", role)
+			require.Contains(t, dropIn, "memory:",
+				"%s drop-in should contain systemReserved memory", role)
+			require.Contains(t, dropIn, "ephemeral-storage:",
+				"%s drop-in should contain systemReserved ephemeral-storage", role)
 		})
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-2of2/kubeletcfg_test.go` around lines 371 - 374, The test currently
only asserts the presence of the "systemReserved:" header; update the assertions
that reference dropIn and role to validate concrete entries under systemReserved
(e.g., check for "cpu", "memory", and "ephemeral-storage" keys/values). Replace
or augment the require.Contains calls on dropIn (in this test function) with
additional require.Contains (or regex matches) that look for the expected
"systemReserved:\n" entries such as "systemReserved:\n  cpu:",
"systemReserved:\n  memory:", and "systemReserved:\n  ephemeral-storage:" so the
test verifies actual values are written rather than an empty section.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

@nispriha: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack 531c6d4 link false /test e2e-openstack
ci/prow/e2e-aws-mco-disruptive 531c6d4 link false /test e2e-aws-mco-disruptive

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants