MCO-2257: [DO NOT MERGE] ensure tests pass in RHEL9 and RHEL10#6054
MCO-2257: [DO NOT MERGE] ensure tests pass in RHEL9 and RHEL10#6054cheesesashimi wants to merge 3 commits into
Conversation
This is primarily to test the RHEL10 path of the tests in CI.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@cheesesashimi: This pull request references MCO-2257 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (10)
WalkthroughThis PR introduces RHEL 10 support to the Machine Config Operator by updating image tag defaults from "rhel-coreos" to "rhel-coreos-10", refactoring container test templates into EL9 and EL10 variants, adding node-aware Dockerfile selection logic, and adapting e2e tests throughout to use the new RHEL10-specific containers. ChangesRHEL 10 Support for Machine Config Operator
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheesesashimi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-gcp-op-ocl-part-1 e2e-gcp-op-ocl-part-2 e2e-gcp-op-part-1 e2e-gcp-op-part-2 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-disruptive-techpreview-3of3 |
|
@cheesesashimi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5a9ee860-52c3-11f1-9d64-d8b71dc98ec6-0 |
|
/test e2e-gcp-op-ocl-part1 |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e-ocl-2of2/onclusterlayering_test.go (1)
1278-1279:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard against empty
current-machine-os-buildannotation before returning success.Line 1279 treats an empty annotation as success, which can return
""as MOSB name and make the subsequentGet(...)fail/flap.🔧 Proposed fix
currentMOSB = mosc.GetAnnotations()[constants.CurrentMachineOSBuildAnnotationKey] - return currentMOSB != mosbName, nil + // stale-annotation handling can briefly clear this annotation + return currentMOSB != "" && currentMOSB != mosbName, nil }))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-ocl-2of2/onclusterlayering_test.go` around lines 1278 - 1279, The current check treats an empty current-machine-os-build annotation as a mismatch (success) which can propagate an empty MOSB name; update the logic that reads currentMOSB (from mosc.GetAnnotations()[constants.CurrentMachineOSBuildAnnotationKey]) to explicitly guard against an empty string: if currentMOSB is empty, return false, nil (or return an appropriate error) instead of treating it as different from mosbName; otherwise continue comparing currentMOSB != mosbName and return that result. Ensure you reference and update the branch that sets currentMOSB and the return immediately after it (using currentMOSB, constants.CurrentMachineOSBuildAnnotationKey, and mosbName).
🧹 Nitpick comments (1)
test/e2e-ocl-shared/Containerfile.cowsay-rhel10 (1)
2-5: ⚡ Quick winClean DNF metadata after installs to reduce image bloat in e2e builds.
Both install steps keep package metadata/cache in layers; adding cleanup keeps test image builds leaner and faster to transfer.
Suggested change
-RUN dnf install -y epel-release && \ +RUN dnf install -y epel-release && \ sed -i 's/\$stream/10-stream/g' /etc/yum.repos.d/centos*.repo && \ sed -i 's/EPEL\-\$releasever_major/EPEL-10/g' /etc/yum.repos.d/epel*.repo && \ - sed -i -E 's/\$\{releasever_minor\:\+\-z\}//g' /etc/yum.repos.d/epel*.repo + sed -i -E 's/\$\{releasever_minor\:\+\-z\}//g' /etc/yum.repos.d/epel*.repo && \ + dnf clean all @@ -RUN dnf install -y cowsay && \ +RUN dnf install -y cowsay && \ + dnf clean all && \ ostree container commitAlso applies to: 10-11
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-ocl-shared/Containerfile.cowsay-rhel10` around lines 2 - 5, The Dockerfile RUN block that installs epel-release and edits repos leaves DNF metadata/cache in the image layers; update the RUN command(s) in Containerfile.cowsay-rhel10 (the multi-step RUN that runs dnf install -y epel-release and the sed edits) to add DNF cleanup at the end—e.g., run dnf clean all && rm -rf /var/cache/dnf /var/cache/yum or the appropriate cache paths for RHEL10/RHEL11—so package metadata is removed before the layer is committed; apply the same change to the equivalent blocks for 10-11.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/machine-config-operator/bootstrap.go`:
- Line 112: The change unconditionally sets baseOSContainerImageTag :=
"rhel-coreos-10" in bootstrap.go, which makes a temporary RHEL10 default leak
into production; revert this by restoring the prior default assignment (or gate
it behind an explicit feature flag/env var) instead: remove the hardcoded
"rhel-coreos-10" assignment in baseOSContainerImageTag and either restore the
original value used before the PR or implement a conditional that sets
"rhel-coreos-10" only when a runtime flag/env (e.g., enableRhel10Default or
BASE_OS_IMAGE_TAG) is present; update any tests that relied on the temporary
default to set the flag or env var explicitly.
In `@test/e2e-ocl-shared/containerfiles.go`:
- Around line 43-50: GetCowsayDockerfileForNode currently assumes non-EL9
implies RHEL10; change it to fail fast for unknown OSes: call
helpers.GetOSReleaseForNode(t, cs, node) as before, check nodeOS.OS.IsEL9() and
nodeOS.OS.IsEL10() explicitly, return CowsayDockerfileRHEL9 or
CowsayDockerfileRHEL10 accordingly, and otherwise call t.Fatalf (or t.Fatalf
with nodeOS details) to report an unsupported node OS so unexpected OS types do
not silently default to RHEL10.
In `@test/extended-priv/mco_ocb.go`:
- Around line 175-176: The code currently assumes
NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail() returns a non-empty slice
and indexes [0] directly; change it to capture the slice into a variable (e.g.,
nodes := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()), assert len(nodes)
> 0 and fail with a clear diagnostic (e.g., t.Fatalf or similar) if empty, then
set workerNode := nodes[0] before calling workerNode.GetRHELVersion(); reference
NewNodeList, GetAllWorkerNodesOrFail, workerNode and GetRHELVersion when making
the change.
---
Outside diff comments:
In `@test/e2e-ocl-2of2/onclusterlayering_test.go`:
- Around line 1278-1279: The current check treats an empty
current-machine-os-build annotation as a mismatch (success) which can propagate
an empty MOSB name; update the logic that reads currentMOSB (from
mosc.GetAnnotations()[constants.CurrentMachineOSBuildAnnotationKey]) to
explicitly guard against an empty string: if currentMOSB is empty, return false,
nil (or return an appropriate error) instead of treating it as different from
mosbName; otherwise continue comparing currentMOSB != mosbName and return that
result. Ensure you reference and update the branch that sets currentMOSB and the
return immediately after it (using currentMOSB,
constants.CurrentMachineOSBuildAnnotationKey, and mosbName).
---
Nitpick comments:
In `@test/e2e-ocl-shared/Containerfile.cowsay-rhel10`:
- Around line 2-5: The Dockerfile RUN block that installs epel-release and edits
repos leaves DNF metadata/cache in the image layers; update the RUN command(s)
in Containerfile.cowsay-rhel10 (the multi-step RUN that runs dnf install -y
epel-release and the sed edits) to add DNF cleanup at the end—e.g., run dnf
clean all && rm -rf /var/cache/dnf /var/cache/yum or the appropriate cache paths
for RHEL10/RHEL11—so package metadata is removed before the layer is committed;
apply the same change to the equivalent blocks for 10-11.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: de187bbc-611b-436b-99ac-2085e7afe39b
📒 Files selected for processing (14)
DockerfileDockerfile.rhel7cmd/machine-config-operator/bootstrap.goinstall/image-referencestest/e2e-1of2/mcd_test.gotest/e2e-ocl-1of2/onclusterlayering_test.gotest/e2e-ocl-2of2/layered_image_scaleup_test.gotest/e2e-ocl-2of2/onclusterlayering_test.gotest/e2e-ocl-shared/Containerfile.cowsay-rhel10test/e2e-ocl-shared/Containerfile.cowsay-rhel9test/e2e-ocl-shared/containerfiles.gotest/extended-priv/mco_ocb.gotest/extended-priv/node.gotest/helpers/utils.go
| workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0] | ||
| rhelVersion, err := workerNode.GetRHELVersion() |
There was a problem hiding this comment.
Guard against empty worker list before indexing.
Direct [0] access can panic when no worker nodes are returned. Add an explicit non-empty assertion before selecting the first node so failures remain diagnosable.
Suggested change
- workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0]
+ workerNodes := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()
+ o.Expect(workerNodes).NotTo(o.BeEmpty(), "expected at least one worker node for containerfile selection")
+ workerNode := workerNodes[0]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0] | |
| rhelVersion, err := workerNode.GetRHELVersion() | |
| workerNodes := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail() | |
| o.Expect(workerNodes).NotTo(o.BeEmpty(), "expected at least one worker node for containerfile selection") | |
| workerNode := workerNodes[0] | |
| rhelVersion, err := workerNode.GetRHELVersion() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended-priv/mco_ocb.go` around lines 175 - 176, The code currently
assumes NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail() returns a non-empty
slice and indexes [0] directly; change it to capture the slice into a variable
(e.g., nodes := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()), assert
len(nodes) > 0 and fail with a clear diagnostic (e.g., t.Fatalf or similar) if
empty, then set workerNode := nodes[0] before calling
workerNode.GetRHELVersion(); reference NewNodeList, GetAllWorkerNodesOrFail,
workerNode and GetRHELVersion when making the change.
There was a problem hiding this comment.
The test will stop execution before returning an empty list.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the implementation of GetAllWorkerNodesOrFail and GetAllWorkerNodes
rg -n "GetAllWorkerNodesOrFail\|GetAllWorkerNodes" --type go -A 8 -B 2
Length of output: 0
🏁 Script executed:
#!/bin/bash
fd "node.go" --type f test/
cat test/extended-priv/node.go 2>/dev/null || echo "File not found"
Length of output: 55347
@cheesesashimi, fair point — PreChecks and GetCompactCompatiblePool running earlier do provide practical protection in a real cluster. That said, for what it's worth, GetAllWorkerNodesOrFail() itself only guards against API errors, not an empty result — unlike some sibling methods in the same file (e.g. GetAllCoreOsWokerNodesOrFail and GetAllCoreOsNodesOrFail) which additionally assert len > 0.
If you're open to it, you could bring GetAllWorkerNodesOrFail in line with those methods by adding an emptiness check there (making the guard reusable across all call sites), but that would be a separate, optional cleanup. Happy to defer to your judgement here.
🧠 Learnings used
Learnt from: HarshwardhanPatil07
Repo: openshift/machine-config-operator PR: 6034
File: test/extended-priv/mco_security.go:1011-1015
Timestamp: 2026-05-13T07:41:05.710Z
Learning: In the extended-priv Go/Ginkgo test suite, it’s safe to use synchronous `o.Expect` assertions on node state immediately after calling `mc.create()`.
Reason: `MachineConfig.create()` internally waits for the MachineConfigPool to complete (`mcp.waitForComplete()`) before returning unless `mc.skipWaitForMcp` is explicitly set to `true`. It also adjusts wait timeouts when extensions are present (`HasExtensionsSafe()`) or when kernel type changes (`GetKernelTypeSafe()`).
So, for node-state checks placed directly after `mc.create()` (and with `skipWaitForMcp` not enabled), you should not require `o.Eventually` for correctness. If `mc.skipWaitForMcp` is set to `true`, then `o.Eventually` may be needed.
70cb748 to
79e8f5e
Compare
|
/test e2e-gcp-op-ocl-part1 |
|
@cheesesashimi: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test e2e-gcp-op-part1 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-disruptive-techpreview-3of3 |
|
@cheesesashimi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/80c24d70-548f-11f1-80de-f3ce07c1ecb5-0 |
|
/test e2e-gcp-op-single-node |
- What I did
This PR introduces several changes in order to select the appropriate test cases for RHEL9 and RHEL10. This PR will cover the RHEL10 side and should not be merged.
- How to verify it
This PR primarily targets fixing the MCO E2E tests, so the MCO E2E test suite can be ran against it for verification.
Summary by CodeRabbit
New Features
Tests