WIP: pkg/readiness: Add readiness checks and wire into proposal controller by harche · Pull Request #1395 · openshift/cluster-version-operator

harche · 2026-05-27T23:02:35Z

Summary

Add pkg/readiness package with 9 cluster readiness checks that gather pre-upgrade health data:
- Cluster conditions (CV status, upgrade history)
- Operator health (CO conditions, MCP state)
- API deprecations (removed/deprecated API usage vs target version)
- Node capacity (ready/unschedulable nodes)
- PDB drain blockers (zero-budget PDBs)
- Etcd health (member pods, operator conditions)
- Network config (SDN type, proxy, TLS profile)
- CRD compatibility (stored versions vs served versions)
- OLM operator lifecycle (subscriptions, CSV state, OCP compatibility)
Wire readiness.RunAll() into the proposal controller, replacing the hardcoded readinessJSON := "{}" placeholder (from PR OTA-1966: Install namespace/openshift-lightspeed only when TechPreview is enabled #1394) with real per-target readiness data
Plumb dynamic.Interface from pkg/start → cvo.New() → proposal.NewController() to support cluster queries

Test plan

go build ./... passes
go test ./pkg/readiness/ — unit tests for each check + RunAll integration test with fake cluster (all 9 checks exercised with non-trivial data)
go test ./pkg/proposal/ — controller tests including end-to-end test verifying real readiness JSON flows into proposal request body
go test ./pkg/cvo/ — existing tests updated for new signatures
Deploy to test cluster and verify proposals contain readiness data

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added comprehensive readiness checks for upgrades (node capacity, operator health, etcd health, network/proxy/TLS, API deprecations, CRD compatibility, PDB drain, ClusterVersion conditions, and OLM lifecycle).
- Upgrade proposals now embed cluster readiness data; readiness runs concurrently with timing and aggregated status reporting, JSON-serializable.
Tests
- Added extensive unit, integration, and end-to-end tests plus test payloads validating readiness checks, JSON output, proposal embedding, and performance constraints.

openshift-ci · 2026-05-27T23:02:39Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-05-27T23:02:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 34703eba-b777-47d6-81c6-0cfd7aba1289

📥 Commits

Reviewing files that changed from the base of the PR and between 4bf4b22 and cdba27c.

📒 Files selected for processing (23)

.openshift-tests-extension/openshift_payload_cluster-version-operator.json
pkg/cvo/availableupdates_test.go
pkg/cvo/cvo.go
pkg/cvo/cvo_test.go
pkg/proposal/controller.go
pkg/proposal/controller_test.go
pkg/readiness/api_deprecations.go
pkg/readiness/check.go
pkg/readiness/check_test.go
pkg/readiness/checks_test.go
pkg/readiness/client.go
pkg/readiness/client_test.go
pkg/readiness/cluster_conditions.go
pkg/readiness/crd_compat.go
pkg/readiness/etcd_health.go
pkg/readiness/network.go
pkg/readiness/node_capacity.go
pkg/readiness/olm_lifecycle.go
pkg/readiness/olm_lifecycle_test.go
pkg/readiness/operator_health.go
pkg/readiness/pdb_drain.go
pkg/start/start.go
test/cvo/readiness.go

🚧 Files skipped from review as they are similar to previous changes (22)

pkg/cvo/availableupdates_test.go
pkg/readiness/operator_health.go
.openshift-tests-extension/openshift_payload_cluster-version-operator.json
pkg/readiness/node_capacity.go
pkg/start/start.go
pkg/readiness/api_deprecations.go
pkg/readiness/cluster_conditions.go
pkg/readiness/etcd_health.go
pkg/readiness/olm_lifecycle_test.go
pkg/readiness/client_test.go
pkg/proposal/controller.go
pkg/cvo/cvo_test.go
pkg/cvo/cvo.go
pkg/readiness/olm_lifecycle.go
pkg/readiness/network.go
pkg/readiness/check.go
pkg/readiness/client.go
test/cvo/readiness.go
pkg/proposal/controller_test.go
pkg/readiness/check_test.go
pkg/readiness/checks_test.go
pkg/readiness/pdb_drain.go

Walkthrough

Adds dynamic client wiring to CVO, a readiness-check framework (9 checks) with helpers/tests, and embeds per-target readiness JSON into proposal requests; includes e2e readiness tests and CI payload entries.

Changes

Cluster Readiness Checks

Layer / File(s)	Summary
Dynamic client wiring and operator constructor `pkg/cvo/cvo.go`, `pkg/start/start.go`, `pkg/cvo/availableupdates_test.go`, `pkg/cvo/cvo_test.go`	Adds dynamic client import and field to `Operator`, extends `New(...)` signature to accept `dynamic.Interface`, constructs dynamic client in `NewControllerContext` via `ClientBuilder.DynamicClientOrDie`, and passes it into proposal controller initialization; updates small test callsites.
Proposal controller readiness integration `pkg/proposal/controller.go`, `pkg/proposal/controller_test.go`	Adds `dynamicClient` to Controller, refactors `getProposals` to compute per-target readiness JSON via `readiness.RunAll` using `runReadinessJSON`, updates tests to remove per-case readiness payload and adds a test exercising embedded readiness data with a fake dynamic client.
Readiness framework core `pkg/readiness/check.go`, `pkg/readiness/check_test.go`	Defines `Check` interface, `CheckResult`/`Output`/`Meta` models with custom JSON, `AllChecks` registry, and `RunAll` orchestration that runs checks concurrently with timeouts and panic recovery; adds `SectionError` and framework unit tests.
Dynamic-client helpers `pkg/readiness/client.go`, `pkg/readiness/client_test.go`	Exports GVRs and helper functions to get/list unstructured resources, condition extraction (`GetConditions`), nested field accessors, `CompareVersions`, and `FormatLabelSelector` with unit tests.
Nine readiness checks `pkg/readiness/node_capacity.go`, `pkg/readiness/pdb_drain.go`, `pkg/readiness/etcd_health.go`, `pkg/readiness/operator_health.go`, `pkg/readiness/cluster_conditions.go`, `pkg/readiness/api_deprecations.go`, `pkg/readiness/crd_compat.go`, `pkg/readiness/network.go`, `pkg/readiness/olm_lifecycle.go`	Implements checks for node capacity, PDB disruption, etcd health, operator/MCP health, ClusterVersion conditions/history, API deprecations, CRD stored-version compatibility, network/proxy/TLS profile, and OLM operator lifecycle with OCP compatibility analysis and helpers.
Readiness tests (unit + per-check suites) `pkg/readiness/check_test.go`, `pkg/readiness/checks_test.go`, `pkg/readiness/olm_lifecycle_test.go`, `pkg/proposal/controller_test.go`	Adds unit tests covering framework JSON, RunAll aggregation and panic recovery, client helper behavior, per-check unit tests with fake dynamic clients, comprehensive fake-cluster fixture, OLM lifecycle helper tests, and a proposal test that validates embedded readiness JSON.
End-to-end readiness integration and CI payload `test/cvo/readiness.go`, `.openshift-tests-extension/openshift_payload_cluster-version-operator.json`	Adds Ginkgo suite exercising RunAll against a live cluster (validates check outputs against typed client queries and enforces per-check timing) and extends CI payload JSON with new readiness test cases.

Sequence Diagrams

sequenceDiagram
  participant CVO as cvo.Operator
  participant RunAll as readiness.RunAll
  participant Checks as Check (multiple)
  participant Dyn as dynamic.Interface
  CVO->>RunAll: RunAll(ctx, Dyn, current, target)
  RunAll->>Checks: Run(ctx, Dyn, current, target) (concurrent)
  Checks->>Dyn: GetResource/ListResources calls
  Checks-->>RunAll: per-check map[string]any
  RunAll-->>CVO: *Output{Checks, Meta}

sequenceDiagram
  participant Sync as proposal.Sync
  participant getProposals as getProposals
  participant RunAll as readiness.RunAll
  participant LLM as LLM service
  participant Dyn as dynamic.Interface
  Sync->>getProposals: invoke getProposals(ctx, ...)
  loop per-target
    getProposals->>RunAll: RunAll(ctx, Dyn, current, target)
    RunAll-->>getProposals: readiness Output
    getProposals->>getProposals: marshal readiness Output to JSON
    getProposals->>LLM: POST proposal including readiness JSON
    LLM-->>getProposals: proposals
  end
  getProposals-->>Sync: proposals

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

openshift/cluster-version-operator#1382: Earlier proposal-controller integration work that changes NewController/Sync to accept a dynamic client and generate readiness JSON via readiness.RunAll.

Suggested labels

approved, lgtm, verified

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name	Status	Explanation	Resolution
No-Sensitive-Data-In-Logs	❌ Error	Network check collects proxy URLs without sanitization, embedding them in proposals. Proxy URLs can contain credentials (user:password@host), exposing sensitive data in stored resources.	Sanitize proxy URLs to redact credentials or exclude them entirely from readiness data. Only include network_type and tls_profile.
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	12 critical error assertions in test/cvo/readiness.go lack meaningful messages (e.g., lines 37, 40, 43, 46, 50, 73, 84, 109, 145, 168, 181, 207). Makes test failures harder to diagnose.	Add failure messages to all Expect assertions, especially error checks. Replace bare `o.Expect(err).NotTo(o.HaveOccurred())` with context: `o.Expect(err).NotTo(o.HaveOccurred(), "failed to get RestConfig")`
Microshift Test Compatibility	⚠️ Warning	Test uses ClusterVersion, ClusterOperator, Network (config.openshift.io/v1) and openshift-etcd namespace without MicroShift protections.	Add [apigroup:config.openshift.io] tag to Describe name or add exutil.IsMicroShiftCluster() skip check in BeforeEach.

✅ Passed checks (11 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly references the main change: adding readiness checks to a new pkg/readiness package and wiring them into the proposal controller, which aligns with the primary objective.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All 9 Ginkgo test titles in test/cvo/readiness.go use static, descriptive strings with no dynamic values (variables, timestamps, UUIDs, pod/node names, IP addresses) that change between runs.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	The new Ginkgo test in test/cvo/readiness.go uses dynamic ground-truth values from the actual cluster rather than hardcoded multi-node assumptions, making it SNO-compatible.
Topology-Aware Scheduling Compatibility	✅ Passed	PR adds readiness checks that only read cluster state; no scheduling constraints (affinity/tolerations/nodeSelectors) are introduced.
Ote Binary Stdout Contract	✅ Passed	No process-level stdout writes. Test file uses standard Ginkgo patterns. Readiness package has no init() functions. Test suite follows existing conventions.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	New e2e test suite has no IPv4 assumptions or external connectivity requirements; all tests query cluster-internal Kubernetes resources only via dynamic/typed clients.
No-Weak-Crypto	✅ Passed	PR contains no weak crypto (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time comparisons of secrets.
Container-Privileges	✅ Passed	PR modifies only Go source code and test files, not container or K8s manifests; no privilege escalation settings are introduced or modified.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-27T23:02:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harche
Once this PR has been reviewed and has the lgtm label, please assign pratikmahajan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 12

🧹 Nitpick comments (3)

test/cvo/readiness.go (1)
20-20: ⚡ Quick win

Add Ginkgo labels to the new e2e suite.

This new suite has no Label(...), which makes targeted execution/selection harder in CI and local runs.

As per coding guidelines, "Use Ginkgo Labels to mark test categories (e.g., TechPreview, serial)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cvo/readiness.go` at line 20, The new Ginkgo suite declaration using
g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness
checks`, func() {...}) is missing Labels; update that Describe invocation to
include appropriate Ginkgo Label options (e.g., g.Label("e2e"),
g.Label("serial"), g.Label("TechPreview") or other team-standard tags) by adding
g.Label(...) into the Describe call so the suite can be targeted in CI and local
runs while keeping the existing description string and test body unchanged.
pkg/proposal/controller.go (1)
292-312: 🏗️ Heavy lift

Avoid re-running cluster-wide readiness checks for every target version.

Line 292 notes this already: 7 of 9 checks are target-independent, but Line 300 and Line 311 execute all checks per target. This can make sync latency scale with update count and slow reconciliation on clusters with many targets.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/proposal/controller.go` around lines 292 - 312, The loop currently calls
runReadinessJSON for every target, re-running cluster-wide checks repeatedly;
refactor so you compute target-independent readiness once and reuse it for all
targets while still running only the target-dependent checks per target.
Concretely: call a new or adjusted function (e.g., runClusterWideReadiness or
runReadinessJSON(ctx,dynamicClient,currentVersion,"") returning
baseReadinessJSON) before iterating availableUpdates/conditionalUpdates, then
inside each loop call a lightweight runTargetedReadiness (or runReadinessJSON
with a flag) that merges target-specific checks (api_deprecations,
olm_lifecycle) into the base JSON, and pass the merged readinessJSON to
getProposal; keep collecting errs and proposals as before.
pkg/readiness/checks_test.go (1)
501-612: ⚡ Quick win

Use table-driven subtests for paired scenarios of the same check.

TestAPIDeprecationsCheck/TestAPIDeprecationsCheck_NoBlockers and TestCRDCompatCheck/TestCRDCompatCheck_NoIssues are the same test shape with different fixtures/expected values; folding them into one table-driven test per check will reduce duplication and future drift.

As per coding guidelines, "Prefer table-driven tests over multiple similar test functions. If two test functions differ only in setup values, collapse them into one function with test-case tuples".

Also applies to: 613-708
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/readiness/checks_test.go` around lines 501 - 612, Collapse the two tests
TestAPIDeprecationsCheck and TestAPIDeprecationsCheck_NoBlockers into a single
table-driven test: define a slice of test cases each containing the name, input
objects (runtime.Object slice), from/to versions, and expected counts/entries,
then iterate cases with t.Run; inside each subtest create the fake client via
newFakeDynamicClient, instantiate APIDeprecationsCheck, call check.Run and
assert blocker_apis, warning_apis and summary values for that case. Do the same
refactor for TestCRDCompatCheck/TestCRDCompatCheck_NoIssues (and the other
paired tests mentioned) so each check has one test function that loops over
subtests and performs per-case setup and assertions, preserving the existing
assertions but scoped to the subtest.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/proposal/controller_test.go`:
- Around line 1436-1441: The code currently computes jsonStart by adding
len("```json\n") before checking for a missing fence which can produce incorrect
slices; change the logic in the test to first find start :=
strings.Index(request, "```json\n") and if start < 0 call t.Fatal, then set
jsonStart := start + len("```json\n"), compute jsonEnd :=
strings.Index(request[jsonStart:], "\n```") and if jsonEnd < 0 call t.Fatal, and
finally extract readinessJSON using readinessJSON := request[jsonStart :
jsonStart+jsonEnd]; update references to jsonStart/jsonEnd accordingly.

In `@pkg/proposal/controller.go`:
- Around line 451-454: In runReadinessJSON, when dynamicClient == nil the
function currently returns "{}" silently; add a log entry (info or warning)
before returning that includes the function name and the currentVersion and
targetVersion to make the fallback visible; obtain a logger from the ctx if your
project uses context-based logging (or use the package logger used elsewhere in
this file) and emit something like "dynamicClient nil, returning empty readiness
JSON" with versions attached, then return "{}".

In `@pkg/readiness/check_test.go`:
- Around line 150-224: TestRunAllMixedResults reimplements orchestration instead
of calling RunAll; change it to exercise the real RunAll path by injecting the
fake checks via the AllChecks helper. Replace the manual loop with code that
temporarily overrides the AllChecks function (or uses any existing injection
point) to return []Check{okCheck, failCheck, partialCheck}, call
RunAll(context.Background(), nil, "4.21.5", "4.21.8"), then restore AllChecks
and assert counts and individual CheckResult entries (referencing
TestRunAllMixedResults, RunAll, AllChecks, and fakeCheck to locate the code).
Ensure the test still verifies passing, failing, and partial results and
restores the original AllChecks to avoid test pollution.

In `@pkg/readiness/check.go`:
- Around line 154-158: The SectionError function dereferences err with
err.Error() which will panic if err == nil; update SectionError to guard against
nil by checking if err == nil before calling Error(), and append a safe
representation (e.g. nil or an empty string like "") into the map under the
"error" key when err is nil so the function never calls err.Error() on a nil
error; keep the signature and the append to the errors slice intact while using
this conditional branch inside SectionError.

In `@pkg/readiness/client.go`:
- Around line 154-162: The CompareVersions function currently hides parse errors
by returning 0 on semver.ParseTolerant failures, conflating invalid input with
equal versions; change CompareVersions to return (int, error) (or (int, bool)
validity) instead of just int, and propagate parse errors from
semver.ParseTolerant for both va and vb rather than returning 0—e.g., update the
function signature and replace the two error branches so they return an
appropriate error when semver.ParseTolerant(a) or semver.ParseTolerant(b) fails,
then perform the comparison only after successful parses and return the int
comparison plus nil error (or true validity).

In `@pkg/readiness/cluster_conditions.go`:
- Around line 64-67: Update the stale comment that mentions “upstream” to
accurately describe the emitted fields by reflecting that this block populates
the result map with "channel" and "cluster_id" (i.e., keys result["channel"]
from NestedString(cv.Object, "spec", "channel") and result["cluster_id"] from
NestedString(cv.Object, "spec", "clusterID")); edit the nearby comment in
pkg/readiness/cluster_conditions.go so it no longer references upstream and
instead describes that the code extracts channel and cluster_id from the spec.

In `@pkg/readiness/etcd_health.go`:
- Around line 36-38: The current readiness logic uses phase :=
NestedString(pod.Object, "status", "phase") and treats ready := phase ==
"Running", which is wrong because a Running pod can still be NotReady; change
the check to inspect the pod's status.conditions for the condition with type
"Ready" and set ready only if that condition's status == "True". In practice,
replace the phase-based check in the function handling pod health (where
variables pod.Object, phase, and ready are used) with logic that iterates
status.conditions (or reads the Ready condition via the existing helpers) and
assigns ready = (Ready condition == "True") before updating healthy_members.

In `@pkg/readiness/network.go`:
- Around line 29-31: The current code always sets result["sdn_warning"] when
networkType == "OpenShiftSDN"; change it to only add the warning when the
requested target version crosses the deprecation boundary (e.g., targetVersion
>= 4.17). In pkg/readiness/network.go check networkType == "OpenShiftSDN" AND
compare the requested targetVersion (variable name targetVersion or equivalent)
using a semantic version compare (e.g., golang.org/x/mod/semver or a semver
package) and only set result["sdn_warning"] when the parsed/comparable
targetVersion is >= "4.17"; if targetVersion is not available in scope, add it
as a parameter to the function that builds result or obtain it from the existing
context before performing the conditional. Ensure you use explicit semver
comparison rather than string comparison to avoid incorrect matches.

In `@pkg/readiness/olm_lifecycle.go`:
- Around line 248-252: The current olmProperty type uses Value string which
makes json.Unmarshal fail if any element's "value" is non-string; change the
struct used by parseMinOCPFromProperties so Value is json.RawMessage (or
interface{}) and then in parseMinOCPFromProperties iterate each olmProperty,
attempt to decode Value as a string (falling back to converting numbers/objects
to string safely) and only consider entries where Type ==
"olm.minOpenShiftVersion" and the extracted string is valid; update references
to olmProperty and the parseMinOCPFromProperties logic to handle the new Value
type and avoid returning "" on unmarshal errors caused by non-string values.
- Around line 64-66: pkgIndex is currently built with indexByName and then
looked up by pkgName which ignores catalog/source namespace and can
mis-correlate PackageManifests; change construction to use indexByNamespacedName
and key lookups by the PackageManifest's namespaced name (use
spec.sourceNamespace) instead of plain name (replace indexByName ->
indexByNamespacedName where pkgIndex is created and any pkgIndex[pkgName]
lookups). Also harden parseMinOCPFromProperties to accept non-string
olm.properties values by unmarshaling property.Value as json.RawMessage (or
interface{}) and only treat it as a string when the property.Name ==
"olm.minOpenShiftVersion"; extract the string safely (e.g., unquote or decode
JSON string) and add a test with an object-valued property to ensure non-string
values do not cause json.Unmarshal to fail or return an empty minOCP.

In `@pkg/readiness/pdb_drain.go`:
- Around line 26-27: The code currently uses NestedString(pdb.Object, "spec",
"maxUnavailable") / "minAvailable" which loses IntOrString forms; replace these
calls with unstructured.NestedFieldNoCopy(pdb.Object, "spec", "maxUnavailable")
and unstructured.NestedFieldNoCopy(pdb.Object, "spec", "minAvailable") to fetch
the raw interface{}, then convert that raw value into a preserved string
representation (e.g., if it's already a string keep it, if it's numeric or a
JSON number use fmt.Sprintf("%v", v), or parse into
k8s.io/apimachinery/pkg/util/intstr.IntOrString when needed) so the original
IntOrString value is preserved when reporting; reference the symbols
NestedFieldNoCopy, pdb.Object, maxUnavailable, minAvailable, and
intstr.IntOrString when making the change.

In `@test/cvo/readiness.go`:
- Around line 25-54: Replace the use of context.TODO() (the ctx variable
declared at top and used in BeforeEach and later calls such as readiness.RunAll)
with a timeout-bound context created via context.WithTimeout inside BeforeEach
(e.g., ctx, cancel := context.WithTimeout(context.Background(),
<sensibleDuration>)) and ensure you defer cancel() in the same scope so all
cluster API calls (configClient.ClusterVersions().Get, dynamic/kube client
calls, and readiness.RunAll) use this cancelable context to avoid hanging tests;
update any other test helpers that reference the package-level ctx to accept or
use this new timeout context.

---

Nitpick comments:
In `@pkg/proposal/controller.go`:
- Around line 292-312: The loop currently calls runReadinessJSON for every
target, re-running cluster-wide checks repeatedly; refactor so you compute
target-independent readiness once and reuse it for all targets while still
running only the target-dependent checks per target. Concretely: call a new or
adjusted function (e.g., runClusterWideReadiness or
runReadinessJSON(ctx,dynamicClient,currentVersion,"") returning
baseReadinessJSON) before iterating availableUpdates/conditionalUpdates, then
inside each loop call a lightweight runTargetedReadiness (or runReadinessJSON
with a flag) that merges target-specific checks (api_deprecations,
olm_lifecycle) into the base JSON, and pass the merged readinessJSON to
getProposal; keep collecting errs and proposals as before.

In `@pkg/readiness/checks_test.go`:
- Around line 501-612: Collapse the two tests TestAPIDeprecationsCheck and
TestAPIDeprecationsCheck_NoBlockers into a single table-driven test: define a
slice of test cases each containing the name, input objects (runtime.Object
slice), from/to versions, and expected counts/entries, then iterate cases with
t.Run; inside each subtest create the fake client via newFakeDynamicClient,
instantiate APIDeprecationsCheck, call check.Run and assert blocker_apis,
warning_apis and summary values for that case. Do the same refactor for
TestCRDCompatCheck/TestCRDCompatCheck_NoIssues (and the other paired tests
mentioned) so each check has one test function that loops over subtests and
performs per-case setup and assertions, preserving the existing assertions but
scoped to the subtest.

In `@test/cvo/readiness.go`:
- Line 20: The new Ginkgo suite declaration using g.Describe(`[Jira:"Cluster
Version Operator"] cluster-version-operator readiness checks`, func() {...}) is
missing Labels; update that Describe invocation to include appropriate Ginkgo
Label options (e.g., g.Label("e2e"), g.Label("serial"), g.Label("TechPreview")
or other team-standard tags) by adding g.Label(...) into the Describe call so
the suite can be targeted in CI and local runs while keeping the existing
description string and test body unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0f81fe59-c00a-4053-8218-57d36ea17f93

📥 Commits

Reviewing files that changed from the base of the PR and between 52cc8e3 and e98b557.

📒 Files selected for processing (22)

pkg/cvo/availableupdates_test.go
pkg/cvo/cvo.go
pkg/cvo/cvo_test.go
pkg/proposal/controller.go
pkg/proposal/controller_test.go
pkg/readiness/api_deprecations.go
pkg/readiness/check.go
pkg/readiness/check_test.go
pkg/readiness/checks_test.go
pkg/readiness/client.go
pkg/readiness/client_test.go
pkg/readiness/cluster_conditions.go
pkg/readiness/crd_compat.go
pkg/readiness/etcd_health.go
pkg/readiness/network.go
pkg/readiness/node_capacity.go
pkg/readiness/olm_lifecycle.go
pkg/readiness/olm_lifecycle_test.go
pkg/readiness/operator_health.go
pkg/readiness/pdb_drain.go
pkg/start/start.go
test/cvo/readiness.go

harche · 2026-05-28T03:46:28Z

/hold until tested against the CVO skill in agentic-skills repo.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/readiness/check.go`:
- Around line 92-125: The goroutine launched in RunAll that calls Check.Run can
panic and crash the process; wrap the goroutine body with a defer recover to
catch panics, convert the panic value (and optionally debug.Stack()) into an
error string, set result.Status = StatusError, set result.Error to that string
(and ensure result.Data is a non-nil map), and then continue to store the result
into results[ch.Name()] under the existing mu lock so the panic is recorded
instead of crashing; implement this recovery logic inside the anonymous func
passed to go (the closure that invokes ch.Run) and ensure the defer runs before
wg.Done().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 97741758-f119-4be3-b58c-be64deb6dbfe

📥 Commits

Reviewing files that changed from the base of the PR and between e98b557 and a3f0869.

📒 Files selected for processing (23)

.openshift-tests-extension/openshift_payload_cluster-version-operator.json
pkg/cvo/availableupdates_test.go
pkg/cvo/cvo.go
pkg/cvo/cvo_test.go
pkg/proposal/controller.go
pkg/proposal/controller_test.go
pkg/readiness/api_deprecations.go
pkg/readiness/check.go
pkg/readiness/check_test.go
pkg/readiness/checks_test.go
pkg/readiness/client.go
pkg/readiness/client_test.go
pkg/readiness/cluster_conditions.go
pkg/readiness/crd_compat.go
pkg/readiness/etcd_health.go
pkg/readiness/network.go
pkg/readiness/node_capacity.go
pkg/readiness/olm_lifecycle.go
pkg/readiness/olm_lifecycle_test.go
pkg/readiness/operator_health.go
pkg/readiness/pdb_drain.go
pkg/start/start.go
test/cvo/readiness.go

🚧 Files skipped from review as they are similar to previous changes (19)

pkg/readiness/node_capacity.go
pkg/cvo/availableupdates_test.go
pkg/readiness/cluster_conditions.go
pkg/readiness/etcd_health.go
pkg/readiness/client_test.go
pkg/start/start.go
pkg/readiness/crd_compat.go
pkg/readiness/client.go
pkg/readiness/network.go
pkg/cvo/cvo.go
pkg/readiness/check_test.go
pkg/proposal/controller.go
pkg/readiness/pdb_drain.go
pkg/readiness/olm_lifecycle_test.go
pkg/readiness/olm_lifecycle.go
test/cvo/readiness.go
pkg/readiness/operator_health.go
pkg/readiness/checks_test.go
pkg/proposal/controller_test.go

harche · 2026-05-28T14:10:45Z

Manual Test Report — Readiness Checks on Live Cluster

Environment


Cluster	OCP 4.21.5 on GCP
Nodes	6 (3 master + 3 worker)
Feature set	Default (proposal controller feature gate bypassed locally for testing)
Lightspeed operator	Installed, ApprovalPolicy `cluster` configured (all stages Manual)
Binary	Cross-compiled on macOS, built on-cluster via OpenShift Build, deployed via `oc adm release new` with internal registry
Binary size match	101,271,851 bytes (pod == local) ✅

Results

Proposals created: 11 (one per available update: 4.21.6 → 4.21.16) ✅

ProposalApprovals created: 11 (by lightspeed operator, all stages Manual — no action taken without human approval) ✅

Readiness checks (sample from ota-4-21-5-to-4-21-16):

Check	Status	Key data
`cluster_conditions`	✅ ok (0.015s)	channel=stable-4.21, Upgradeable=False (reconciling), history present
`operator_health`	✅ ok (0.607s)	34 COs healthy, 0 degraded, 0 not-upgradeable, 2 MCPs (master 3/3, worker 3/3)
`etcd_health`	✅ ok (0.223s)	3/3 members healthy and Ready on all masters
`node_capacity`	✅ ok (0.028s)	6 nodes, 6 ready, 0 unschedulable
`pdb_drain`	✅ ok (0.030s)	21 PDBs, 0 blocking
`crd_compat`	✅ ok (0.571s)	160 CRDs, 0 version issues
`api_deprecations`	✅ ok (0.028s)	0 blockers, 0 warnings
`network`	✅ ok (0.406s)	OVNKubernetes, TLS Intermediate, no proxy
`olm_operator_lifecycle`	✅ ok (2.248s)	0 OLM subscriptions (none on this cluster)

Total readiness time: 2.25s per target | All 9 checks OK, 0 errored ✅

What was verified

CVO binary with readiness wiring deploys and starts successfully
Proposal controller syncs and creates proposals for all available updates
Readiness data (real cluster state) is embedded in each proposal's spec.request as JSON
Readiness JSON contains accurate cluster data (verified node count, operator count, etcd members, network type against ground truth)
ApprovalPolicy gates are respected — proposals created but no agent action without manual approval
All 9 readiness checks run in parallel with per-check timeouts and produce structured output

Not tested

Agent actually processing a proposal (would require approving the Analysis stage)
Conditional updates with risk data (cluster had none)
Target-dependent check accuracy (api_deprecations blocker detection — cluster had no deprecated API usage)

🤖 Generated with Claude Code

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/cvo/readiness.go`:
- Around line 197-199: The direct assertion
result.Data["blocking_pdbs"].([]map[string]any) can panic; change to a safe
check: retrieve value from result.Data, use a comma-ok type assertion for
[]map[string]any (or []any then convert), assert ok with o.Expect (e.g.,
o.Expect(ok).To(BeTrue(), "...")) and only then compute len(blockingPDBs) and
compare to expectedBlocking using the existing o.Expect call; reference symbols:
blockingPDBs, result.Data["blocking_pdbs"], o.Expect, expectedBlocking.
- Line 130: The direct type assertion result.Data["summary"].(map[string]any)
can panic; change the code in test/cvo/readiness.go to use a safe two-value
assertion (e.g., v, ok := result.Data["summary"].(map[string]any) and fail the
test with a clear message if !ok) or replace the assertion with a Gomega matcher
(e.g., Expect(result.Data["summary"]).To(BeAssignableToTypeOf(map[string]any{}))
/ Expect(result.Data["summary"]).To(BeAKindOf(map[string]any))) before using the
variable `summary` so the test reports a failure instead of panicking.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7ff63e7c-fd65-4b18-a2ac-95a18bff8ad1

📥 Commits

Reviewing files that changed from the base of the PR and between a3f0869 and 4bf4b22.

📒 Files selected for processing (23)

.openshift-tests-extension/openshift_payload_cluster-version-operator.json
pkg/cvo/availableupdates_test.go
pkg/cvo/cvo.go
pkg/cvo/cvo_test.go
pkg/proposal/controller.go
pkg/proposal/controller_test.go
pkg/readiness/api_deprecations.go
pkg/readiness/check.go
pkg/readiness/check_test.go
pkg/readiness/checks_test.go
pkg/readiness/client.go
pkg/readiness/client_test.go
pkg/readiness/cluster_conditions.go
pkg/readiness/crd_compat.go
pkg/readiness/etcd_health.go
pkg/readiness/network.go
pkg/readiness/node_capacity.go
pkg/readiness/olm_lifecycle.go
pkg/readiness/olm_lifecycle_test.go
pkg/readiness/operator_health.go
pkg/readiness/pdb_drain.go
pkg/start/start.go
test/cvo/readiness.go

✅ Files skipped from review due to trivial changes (1)

pkg/cvo/cvo_test.go

🚧 Files skipped from review as they are similar to previous changes (21)

pkg/cvo/availableupdates_test.go
pkg/readiness/cluster_conditions.go
pkg/readiness/pdb_drain.go
pkg/readiness/api_deprecations.go
pkg/readiness/network.go
.openshift-tests-extension/openshift_payload_cluster-version-operator.json
pkg/readiness/operator_health.go
pkg/readiness/client.go
pkg/proposal/controller.go
pkg/readiness/client_test.go
pkg/readiness/node_capacity.go
pkg/readiness/check_test.go
pkg/readiness/olm_lifecycle_test.go
pkg/readiness/check.go
pkg/readiness/olm_lifecycle.go
pkg/readiness/crd_compat.go
pkg/cvo/cvo.go
pkg/start/start.go
pkg/readiness/checks_test.go
pkg/proposal/controller_test.go
pkg/readiness/etcd_health.go

Add pkg/readiness package with 9 cluster readiness checks that gather pre-upgrade health data: cluster conditions, operator health, API deprecations, node capacity, PDB drain blockers, etcd health, network config, CRD compatibility, and OLM operator lifecycle. Wire readiness.RunAll() into the proposal controller, replacing the hardcoded readinessJSON placeholder with real per-target readiness data that gets embedded in each proposal's request body. Plumb dynamic.Interface from pkg/start through cvo.New() to the proposal controller to support the readiness checks' cluster queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-28T14:49:16Z

Actionable comments posted: 0

openshift-ci · 2026-05-28T18:02:38Z

@harche: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-hypershift-conformance	`cdba27c`	link	true	`/test e2e-hypershift-conformance`
ci/prow/e2e-agnostic-ovn-upgrade-into-change	`cdba27c`	link	true	`/test e2e-agnostic-ovn-upgrade-into-change`
ci/prow/e2e-agnostic-ovn-techpreview-serial-1of3	`cdba27c`	link	true	`/test e2e-agnostic-ovn-techpreview-serial-1of3`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 27, 2026

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

harche force-pushed the readiness_pkg branch from a5c516e to 7c21533 Compare May 28, 2026 03:43

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2026

harche marked this pull request as ready for review May 28, 2026 03:46

harche force-pushed the readiness_pkg branch 2 times, most recently from 8be9908 to a3f0869 Compare May 28, 2026 11:30

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread pkg/readiness/check.go

harche force-pushed the readiness_pkg branch from a3f0869 to 4bf4b22 Compare May 28, 2026 14:14

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread test/cvo/readiness.go Outdated

Comment thread test/cvo/readiness.go Outdated

harche force-pushed the readiness_pkg branch from 4bf4b22 to cdba27c Compare May 28, 2026 14:44

Conversation

harche commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested labels

Pre-merge checks failed

❌ Failed checks (1 error, 3 warnings)

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harche commented May 28, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

harche commented May 28, 2026

Manual Test Report — Readiness Checks on Live Cluster

Environment

Results

What was verified

Not tested

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harche commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading