Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 20 additions & 16 deletions test/extended/cli/admin.go
Original file line number Diff line number Diff line change
Expand Up @@ -212,38 +212,42 @@ var _ = g.Describe("[sig-cli] oc adm", func() {
o.Expect(err).To(o.HaveOccurred())
o.Expect(out).To(o.ContainSubstring("error: rolebinding custom found for role view, not other"))

o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", "fake-user").Execute()).To(o.Succeed())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to keep the tests we have.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it does causes the race, I think, that's why it's being changed. The point of this change is to execute this change in a single oc command so that it's sent together.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably right to push back on this, because it's not clear why the test is failing very rarely. There is not enough data in the failing job regarding why this happens. It seems the data is written into etcd even though it should return 409, but I don't manage to find information detailed enough in CI jobs to be able to troubleshoot this 😐 So I ended up just trying to improve the test, which is not particularly clear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the issue is that the test is not using randomly generated names and there are some leftovers from the previous test run, causing 409 not to happen because the change is actually already there, but I am gonna continue tomorrow. Thanks for pushing back.

o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", "-z", "fake-sa").Execute()).To(o.Succeed())
o.Expect(oc.Run("adm", "policy", "add-scc-to-group").Args("privileged", "fake-group").Execute()).To(o.Succeed())
fakeUser := gen.GenerateName("fake-user-")
fakeSA := gen.GenerateName("fake-sa-")
fakeGroup := gen.GenerateName("fake-group-")

o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", fakeUser).Execute()).To(o.Succeed())
o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", "-z", fakeSA).Execute()).To(o.Succeed())
o.Expect(oc.Run("adm", "policy", "add-scc-to-group").Args("privileged", fakeGroup).Execute()).To(o.Succeed())
Comment on lines +219 to +221
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Batch SCC subject mutations into single oc calls to remove inter-command race windows.

The user and serviceaccount updates are still split across separate invocations; that leaves a cache-propagation window between writes and weakens the race fix intent.

Suggested change
-		o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", fakeUser).Execute()).To(o.Succeed())
-		o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", "-z", fakeSA).Execute()).To(o.Succeed())
+		o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", fakeUser, "-z", fakeSA).Execute()).To(o.Succeed())
 		o.Expect(oc.Run("adm", "policy", "add-scc-to-group").Args("privileged", fakeGroup).Execute()).To(o.Succeed())

-		o.Expect(oc.Run("adm", "policy", "remove-scc-from-user").Args("privileged", fakeUser).Execute()).To(o.Succeed())
-		o.Expect(oc.Run("adm", "policy", "remove-scc-from-user").Args("privileged", "-z", fakeSA).Execute()).To(o.Succeed())
+		o.Expect(oc.Run("adm", "policy", "remove-scc-from-user").Args("privileged", fakeUser, "-z", fakeSA).Execute()).To(o.Succeed())
 		o.Expect(oc.Run("adm", "policy", "remove-scc-from-group").Args("privileged", fakeGroup).Execute()).To(o.Succeed())

Also applies to: 228-230

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/cli/admin.go` around lines 219 - 221, The test issues multiple
separate oc invocations to mutate SCC subjects, leaving cache-propagation race
windows; instead, batch subject mutations into single oc admin policy commands
by combining multiple subject arguments in one call (e.g., replace separate
oc.Run("adm","policy","add-scc-to-user").Args("privileged", fakeUser) and
oc.Run("adm","policy","add-scc-to-user").Args("privileged", "-z", fakeSA) with a
single oc.Run("adm","policy","add-scc-to-user").Args("privileged", fakeUser,
"-z", fakeSA).Execute(), and do the same for
oc.Run("adm","policy","add-scc-to-group").Args(...) instances referenced around
the same section) so all related user/sa/group updates occur in one command and
remove the inter-command race window.

out, err = oc.Run("get").Args("clusterrolebinding/system:openshift:scc:privileged", "-o", "yaml").Output()
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(out).To(o.ContainSubstring("fake-user"))
o.Expect(out).To(o.ContainSubstring("fake-sa"))
o.Expect(out).To(o.ContainSubstring("fake-group"))
o.Expect(out).To(o.ContainSubstring(fakeUser))
o.Expect(out).To(o.ContainSubstring(fakeSA))
o.Expect(out).To(o.ContainSubstring(fakeGroup))

o.Expect(oc.Run("adm", "policy", "remove-scc-from-user").Args("privileged", "fake-user").Execute()).To(o.Succeed())
Comment thread
tchap marked this conversation as resolved.
o.Expect(oc.Run("adm", "policy", "remove-scc-from-user").Args("privileged", "-z", "fake-sa").Execute()).To(o.Succeed())
o.Expect(oc.Run("adm", "policy", "remove-scc-from-group").Args("privileged", "fake-group").Execute()).To(o.Succeed())
o.Expect(oc.Run("adm", "policy", "remove-scc-from-user").Args("privileged", fakeUser).Execute()).To(o.Succeed())
o.Expect(oc.Run("adm", "policy", "remove-scc-from-user").Args("privileged", "-z", fakeSA).Execute()).To(o.Succeed())
o.Expect(oc.Run("adm", "policy", "remove-scc-from-group").Args("privileged", fakeGroup).Execute()).To(o.Succeed())
out, err = oc.Run("get").Args("clusterrolebinding/system:openshift:scc:privileged", "-o", "yaml").Output()
// there are two possible outcomes here:
if err == nil {
// 1. the binding exists, but it should not contain the removed entities
o.Expect(out).NotTo(o.ContainSubstring("fake-user"))
o.Expect(out).NotTo(o.ContainSubstring("fake-sa"))
o.Expect(out).NotTo(o.ContainSubstring("fake-group"))
o.Expect(out).NotTo(o.ContainSubstring(fakeUser))
o.Expect(out).NotTo(o.ContainSubstring(fakeSA))
o.Expect(out).NotTo(o.ContainSubstring(fakeGroup))
} else {
// 2. the binding does not exists, if we removed all entities from the binding
o.Expect(out).To(o.ContainSubstring(`clusterrolebindings.rbac.authorization.k8s.io "system:openshift:scc:privileged" not found`))
}
Comment on lines 222 to 241
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use Eventually for post-mutation CRB assertions to avoid stale-cache flakes.

These checks read CRB state immediately after add/remove/prune mutations. In multi-master clusters this is eventually consistent and can transiently fail.

Suggested change
-		out, err = oc.Run("get").Args("clusterrolebinding/system:openshift:scc:privileged", "-o", "yaml").Output()
-		o.Expect(err).NotTo(o.HaveOccurred())
-		o.Expect(out).To(o.ContainSubstring(fakeUser))
-		o.Expect(out).To(o.ContainSubstring(fakeSA))
-		o.Expect(out).To(o.ContainSubstring(fakeGroup))
+		o.Eventually(func() (string, error) {
+			return oc.Run("get").Args("clusterrolebinding/system:openshift:scc:privileged", "-o", "yaml").Output()
+		}, 30*time.Second, 1*time.Second).Should(o.And(
+			o.ContainSubstring(fakeUser),
+			o.ContainSubstring(fakeSA),
+			o.ContainSubstring(fakeGroup),
+		))

-		out, err = oc.Run("get").Args("clusterrolebinding/system:openshift:scc:privileged", "-o", "yaml").Output()
-		// there are two possible outcomes here:
-		if err == nil {
-			// 1. the binding exists, but it should not contain the removed entities
-			o.Expect(out).NotTo(o.ContainSubstring(fakeUser))
-			o.Expect(out).NotTo(o.ContainSubstring(fakeSA))
-			o.Expect(out).NotTo(o.ContainSubstring(fakeGroup))
-		} else {
-			// 2. the binding does not exists, if we removed all entities from the binding
-			o.Expect(out).To(o.ContainSubstring(`clusterrolebindings.rbac.authorization.k8s.io "system:openshift:scc:privileged" not found`))
-		}
+		o.Eventually(func() (string, error) {
+			return oc.Run("get").Args("clusterrolebinding/system:openshift:scc:privileged", "-o", "yaml").Output()
+		}, 30*time.Second, 1*time.Second).Should(o.Or(
+			o.And(
+				o.Not(o.ContainSubstring(fakeUser)),
+				o.Not(o.ContainSubstring(fakeSA)),
+				o.Not(o.ContainSubstring(fakeGroup)),
+			),
+			o.ContainSubstring(`clusterrolebindings.rbac.authorization.k8s.io "system:openshift:scc:privileged" not found`),
+		))

Also applies to: 244-252

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/cli/admin.go` around lines 222 - 241, Replace the immediate
post-mutation CRB assertions with retries using Eventually to avoid stale-cache
flakes: after the adm policy remove-scc-from-* calls (the
oc.Run(...).Args("clusterrolebinding/system:openshift:scc:privileged", "-o",
"yaml").Output() call and subsequent checks that reference fakeUser, fakeSA,
fakeGroup), wrap the get+assert logic in an Eventually that polls until either
the CRB YAML no longer contains the removed fakeUser/fakeSA/fakeGroup or the get
returns the "not found" message; do the same replacement for the other similar
block that spans the later checks (the second occurrence noted at lines
244-252).


// check pruning
o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", "fake-user").Execute()).To(o.Succeed())
out, err = oc.Run("adm", "prune", "auth").Args("users/fake-user").Output()
o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", fakeUser).Execute()).To(o.Succeed())
out, err = oc.Run("adm", "prune", "auth").Args(fmt.Sprintf("users/%s", fakeUser)).Output()
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(out).To(o.ContainSubstring("clusterrolebinding.rbac.authorization.k8s.io/system:openshift:scc:privileged updated"))

o.Expect(oc.Run("adm", "policy", "add-scc-to-group").Args("privileged", "fake-group").Execute()).To(o.Succeed())
out, err = oc.Run("adm", "prune", "auth").Args("group/fake-group").Output()
o.Expect(oc.Run("adm", "policy", "add-scc-to-group").Args("privileged", fakeGroup).Execute()).To(o.Succeed())
out, err = oc.Run("adm", "prune", "auth").Args(fmt.Sprintf("group/%s", fakeGroup)).Output()
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(out).To(o.ContainSubstring("clusterrolebinding.rbac.authorization.k8s.io/system:openshift:scc:privileged updated"))
})
Expand Down