OCPBUGS-86803: e2e: refactor multi-NUMA skip conditions in functests by oblau · Pull Request #1528 · openshift/cluster-node-tuning-operator

oblau · 2026-05-28T17:30:50Z

Tests used "skip if <2 NUMA nodes" as a heuristic to detect bare-metal.

Changes:

llc.go:

BeforeAll (SMT Enabled) — removed len(numaInfo) < 2 skip. Affects: 77725, 77726, 81668, 77728, 77729, 77730, 81670, 81671, 87072, 87073, 87074
BeforeAll (SMT Disabled) — removed len(numaInfo) < 2 skip. Affects: 81672, 81673

updating_profile.go:

50966 — removed NUMA check
50968 — removed NUMA check
50970 — added comment why numa check is required here

Summary by CodeRabbit

Tests
- Removed NUMA-count gating from LLC-aware CPU pinning tests so they run on more hardware.
- Relaxed NUMA-based skip logic in offline-CPU tests so they proceed more often; one test still enforces cores-per-socket and empty-topology checks for reliability.
- Added clarifying test documentation about targeting CPU offlining across NUMA nodes.

coderabbitai · 2026-05-28T17:31:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b2f4d80b-06b6-4c01-9883-ce9baa9eb46c

📥 Commits

Reviewing files that changed from the base of the PR and between b299ffe and 3474a34.

📒 Files selected for processing (2)

test/e2e/performanceprofile/functests/13_llc/llc.go
test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go

💤 Files with no reviewable changes (1)

test/e2e/performanceprofile/functests/13_llc/llc.go

🚧 Files skipped from review as they are similar to previous changes (1)

test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go

Walkthrough

Update e2e test precondition gating: LLC-aware CPU pinning tests stop skipping when nodes report fewer than two NUMA nodes; offline-CPU tests adjust NUMA/core-based skips (one test now skips only on empty topology or first NUMA node with <20 cores; another removes the <2-NUMA skip; a third adds clarifying comment).

Changes

Prerequisite Gating Updates

Layer / File(s)	Summary
LLC test NUMA gating removal `test/e2e/performanceprofile/functests/13_llc/llc.go`	SMT-enabled and SMT-disabled setup paths remove conditional skips for `len(numaInfo) < 2`, allowing setup and subsequent NUMA-indexing logic to run regardless of NUMA node count.
Offline CPU test NUMA/core skip adjustments `test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go`	`test_id:50966` now skips only when the NUMA topology is empty or the first NUMA node has <20 cores; `test_id:50968` removes the `<2 NUMA nodes` early skip; `test_id:50970` adds comments documenting expectation that offlined CPUs come from different NUMA nodes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Structure And Quality	⚠️ Warning	Multiple assertions lack meaningful failure messages in tests 50966, 50970, and AfterAll block: e.g., Expect(offlinedCPUSet.Equals(offlinedCPUSetProfile)) without message describing what went wrong.	Add meaningful messages to assertions: Expect(offlinedCPUSet.Equals(offlinedCPUSetProfile), "offlined CPUs don't match expected values").

✅ Passed checks (14 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: refactoring NUMA-based skip conditions in multiple e2e functional tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All test titles in modified files use static, descriptive strings with no dynamic content (no fmt.Sprintf, string concatenation, variables, or generated values in test names).
Microshift Test Compatibility	✅ Passed	PR modifies existing tests by removing NUMA checks; does not add new Ginkgo tests. Custom check applies only when new tests are added, not when existing tests are modified.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	Tests don't assume multi-node clusters; only need >=1 worker node and will gracefully skip on SNO due to bare-metal/hardware requirements.
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies only e2e test files, not deployment manifests, operator code, or controllers. The topology-aware scheduling check applies only when those entities are modified.
Ote Binary Stdout Contract	✅ Passed	PR changes are test case logic only (removing NUMA topology gating), with no new process-level stdout writes. Existing logging uses testlog which writes to ginkgo.GinkgoWriter, not stdout.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	The added tests focus on CPU pinning, performance profiles, and kernel configuration. No IPv4 hardcoded addresses, external connectivity requirements, or IPv4-only assumptions found.
No-Weak-Crypto	✅ Passed	PR contains no cryptographic code. Changes are to test files managing NUMA topology and CPU pinning logic; no weak crypto detected.
Container-Privileges	✅ Passed	PR modifies only Go test source files, not Kubernetes manifests. The container-privileges check is not applicable to Go test code.
No-Sensitive-Data-In-Logs	✅ Passed	No sensitive data logging detected. All logging statements use safe information: Kubernetes node names, CPU ranges, cache topology data, and profile names—no passwords, tokens, PII, or credentials.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/RHsyseng/operator-utils@v1.4.13: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/coreos/go-systemd@v0.0.0-20191104093116-d3cd4ed1dbcf: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/coreos/ignition@v0.35.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/coreos/ignition/v2@v2.26.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/docker/go-units@v0.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-logr/stdr@v1.2.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/go-cmp@v0.7.0

... [truncated 19340 characters] ...

is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/legacy-cloud-providers: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/metrics: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/mount-utils: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/pod-security-admission: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-apiserver: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tgithub.com/onsi/ginkgo/v2: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-28T17:31:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: oblau
Once this PR has been reviewed and has the lgtm label, please assign jmencak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-28T17:34:12Z

Actionable comments posted: 0

mrniranjan · 2026-05-29T05:53:24Z

Tests used "skip if <2 NUMA nodes" as a proxy for VM detection.

This is not a proxy for VM detection but rather detection of Baremetal

openshift-ci-robot · 2026-06-01T07:28:42Z

@oblau: This pull request references Jira Issue OCPBUGS-86803, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Tests used "skip if <2 NUMA nodes" as a heuristic to detect bare-metal.

Changes:

llc.go:

BeforeAll (SMT Enabled) — removed len(numaInfo) < 2 skip. Affects: 77725, 77726, 81668, 77728, 77729, 77730, 81670, 81671, 87072, 87073, 87074

BeforeAll (SMT Disabled) — removed len(numaInfo) < 2 skip. Affects: 81672, 81673

updating_profile.go:

50966 — removed NUMA check, added skip if <24 online CPUs

50968 — removed NUMA check, added skip if <24 online CPUs

50970 — added skip if <60 online CPUs, kept NUMA check (test exercises cross-NUMA offlining)

Summary by CodeRabbit

Tests

Removed NUMA topology gating from LLC-aware CPU pinning tests for broader hardware compatibility.

Added online CPU capacity validation to offline CPU test cases to improve reliability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

oblau · 2026-06-01T07:28:44Z

/jira refresh

openshift-ci-robot · 2026-06-01T07:28:48Z

@oblau: This pull request references Jira Issue OCPBUGS-86803, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

oblau · 2026-06-01T07:36:43Z

/verified by oblau

openshift-ci-robot · 2026-06-01T07:36:46Z

@oblau: The /verified command must be used with one of the following actions: by, later, remove, or bypass. See https://docs.ci.openshift.org/docs/architecture/jira/#premerge-verification for more information.

Details

In response to this:

/verified

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

oblau · 2026-06-01T07:41:57Z

/verified by oblau

openshift-ci-robot · 2026-06-01T07:42:09Z

@oblau: This PR has been marked as verified by oblau.

Details

In response to this:

/verified by oblau

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Tal-or

I'm fine with the change in general if it has a benefit,
but please always explain in the PR+commit the rational for a change.

Right now, it might be clear why we went with a specific change, but as time goes by we eventually forget why we did what we did.
Hence, it become crucial to understand and document the reasons behind the decisions we're making now, for the sake of our future selves :)

Tal-or · 2026-06-01T08:55:53Z

-				Skip("This test needs systems with at least 20 cores per socket")
+			onlineCPUs, err := nodes.GetOnlineCPUsSet(context.TODO(), &workerRTNodes[0])
+			Expect(err).ToNot(HaveOccurred())
+			if onlineCPUs.Size() < 24 {


Those numbers are seems completely random for reviewers, it would be better to add a comment and/or constant that explain the reason for this specific number.

Reverted the cpu changes.
Now the scope of the pr is clear.
Thanks for the constructive criticism

Tal-or · 2026-06-01T08:56:12Z

 			numaTopology := copyNumaCoreSiblings(numaCoreSiblings)
+			onlineCPUs, err := nodes.GetOnlineCPUsSet(context.TODO(), &workerRTNodes[0])
+			Expect(err).ToNot(HaveOccurred())
+			if onlineCPUs.Size() < 60 {


openshift-ci-robot · 2026-06-02T07:46:08Z

@oblau: This pull request references Jira Issue OCPBUGS-86803, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

Tests used "skip if <2 NUMA nodes" as a heuristic to detect bare-metal.

Changes:

llc.go:

BeforeAll (SMT Enabled) — removed len(numaInfo) < 2 skip. Affects: 77725, 77726, 81668, 77728, 77729, 77730, 81670, 81671, 87072, 87073, 87074

BeforeAll (SMT Disabled) — removed len(numaInfo) < 2 skip. Affects: 81672, 81673

updating_profile.go:

50966 — removed NUMA check, added skip if <24 online CPUs

50968 — removed NUMA check, added skip if <24 online CPUs

50970 — added skip if <60 online CPUs, kept NUMA check (test exercises cross-NUMA offlining)

Summary by CodeRabbit

Tests

Removed NUMA-count gating from LLC-aware CPU pinning tests to run on more hardware.

Relaxed NUMA-based skip logic in offline-CPU tests so they proceed more often; one test still enforces minimum cores-per-socket and empty-topology checks for reliability.

Added clarifying test documentation about targeting CPU offlining across NUMA nodes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

were used for checking if vm or bm Signed-off-by: oblau <oblau@redhat.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go`:
- Around line 780-781: The current precondition only checks numaCoreSiblings
length and may still yield a single contiguous offlinedSet; update the test to
ensure the generated offlined CPU selection actually contains multiple disjoint
ranges before proceeding: after constructing offlinedSet (the variable that
builds the offline CPU list) assert that its string form contains multiple
ranges (e.g., contains a comma or multiple '-' segments) or alternatively
strengthen the topology guard to require at least two disjoint NUMA/core ranges;
if the check fails, call Skip with a clear message so the test only runs when
offlinedSet will produce multiple ranges.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 784b4316-9523-4b23-8ea4-cacca66f4b12

📥 Commits

Reviewing files that changed from the base of the PR and between 454feb2 and b299ffe.

📒 Files selected for processing (2)

test/e2e/performanceprofile/functests/13_llc/llc.go
test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go

💤 Files with no reviewable changes (1)

test/e2e/performanceprofile/functests/13_llc/llc.go

coderabbitai · 2026-06-02T07:50:29Z

+			if len(numaCoreSiblings) == 0 || len(numaCoreSiblings[0]) < 20 {
 				Skip("This test needs systems with at least 20 cores per socket")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The new skip no longer guarantees this test actually uses multiple offline ranges.

Line 780 only checks that NUMA node 0 has 20 cores, but the constructed offlinedSet can still collapse to a single contiguous range on single-socket or contiguous CPU-ID layouts. That means this test may pass without covering the “multiple ranges” case its name is asserting. Please either keep a topology precondition that guarantees disjoint ranges or add an explicit assertion that the generated offline CPU string contains multiple ranges before updating the profile.

Suggested guard

offlinedSet := performancev2.CPUSet(offlined.String()) + Expect(strings.Count(string(offlinedSet), ",")).To(BeNumerically(">", 0), + "test setup must generate multiple offline CPU ranges") profile, err := profiles.GetByNodeLabels(testutils.NodeSelectorLabels) Expect(err).ToNot(HaveOccurred())

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go` around lines 780 - 781, The current precondition only checks numaCoreSiblings length and may still yield a single contiguous offlinedSet; update the test to ensure the generated offlined CPU selection actually contains multiple disjoint ranges before proceeding: after constructing offlinedSet (the variable that builds the offline CPU list) assert that its string form contains multiple ranges (e.g., contains a comma or multiple '-' segments) or alternatively strengthen the topology guard to require at least two disjoint NUMA/core ranges; if the check fails, call Skip with a clear message so the test only runs when offlinedSet will produce multiple ranges.

oblau · 2026-06-02T10:25:41Z

/verified by oblau

openshift-ci-robot · 2026-06-02T10:25:53Z

@oblau: This PR has been marked as verified by oblau.

Details

In response to this:

/verified by oblau

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci Bot requested review from MarSik and swatisehgal May 28, 2026 17:31

oblau changed the title ~~e2e: refactor multi-NUMA skip conditions in functests~~ OCPBUGS-86803: e2e: refactor multi-NUMA skip conditions in functests Jun 1, 2026

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jun 1, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 1, 2026

Tal-or reviewed Jun 1, 2026

View reviewed changes

oblau force-pushed the llc-refactor-bm-detection branch from 454feb2 to b299ffe Compare June 2, 2026 07:45

openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jun 2, 2026

removed unmeeded numa len checks

3474a34

were used for checking if vm or bm Signed-off-by: oblau <oblau@redhat.com>

oblau force-pushed the llc-refactor-bm-detection branch from b299ffe to 3474a34 Compare June 2, 2026 07:46

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 2, 2026

		if len(numaCoreSiblings) == 0 \|\| len(numaCoreSiblings[0]) < 20 {
		Skip("This test needs systems with at least 20 cores per socket")

Conversation

oblau commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

mrniranjan commented May 29, 2026

Uh oh!

openshift-ci-robot commented Jun 1, 2026

Summary by CodeRabbit

Uh oh!

oblau commented Jun 1, 2026

Uh oh!

openshift-ci-robot commented Jun 1, 2026

Uh oh!

oblau commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Jun 1, 2026

Uh oh!

oblau commented Jun 1, 2026

Uh oh!

openshift-ci-robot commented Jun 1, 2026

Uh oh!

Tal-or left a comment

Choose a reason for hiding this comment

Uh oh!

Tal-or Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

oblau Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tal-or Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Jun 2, 2026

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

oblau commented Jun 2, 2026

Uh oh!

openshift-ci-robot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

oblau commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

oblau commented Jun 1, 2026 •

edited

Loading

oblau Jun 2, 2026 •

edited

Loading