Skip to content

WIP: pkg/readiness: Add readiness checks and wire into proposal controller#1395

Open
harche wants to merge 1 commit into
openshift:mainfrom
harche:readiness_pkg
Open

WIP: pkg/readiness: Add readiness checks and wire into proposal controller#1395
harche wants to merge 1 commit into
openshift:mainfrom
harche:readiness_pkg

Conversation

@harche
Copy link
Copy Markdown
Contributor

@harche harche commented May 27, 2026

Summary

  • Add pkg/readiness package with 9 cluster readiness checks that gather pre-upgrade health data:
    • Cluster conditions (CV status, upgrade history)
    • Operator health (CO conditions, MCP state)
    • API deprecations (removed/deprecated API usage vs target version)
    • Node capacity (ready/unschedulable nodes)
    • PDB drain blockers (zero-budget PDBs)
    • Etcd health (member pods, operator conditions)
    • Network config (SDN type, proxy, TLS profile)
    • CRD compatibility (stored versions vs served versions)
    • OLM operator lifecycle (subscriptions, CSV state, OCP compatibility)
  • Wire readiness.RunAll() into the proposal controller, replacing the hardcoded readinessJSON := "{}" placeholder (from PR OTA-1966: Install namespace/openshift-lightspeed only when TechPreview is enabled #1394) with real per-target readiness data
  • Plumb dynamic.Interface from pkg/startcvo.New()proposal.NewController() to support cluster queries

Test plan

  • go build ./... passes
  • go test ./pkg/readiness/ — unit tests for each check + RunAll integration test with fake cluster (all 9 checks exercised with non-trivial data)
  • go test ./pkg/proposal/ — controller tests including end-to-end test verifying real readiness JSON flows into proposal request body
  • go test ./pkg/cvo/ — existing tests updated for new signatures
  • Deploy to test cluster and verify proposals contain readiness data

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added comprehensive readiness checks for upgrades (node capacity, operator health, etcd health, network/proxy/TLS, API deprecations, CRD compatibility, PDB drain, ClusterVersion conditions, and OLM lifecycle).
    • Upgrade proposals now embed cluster readiness data; readiness runs concurrently with timing and aggregated status reporting, JSON-serializable.
  • Tests

    • Added extensive unit, integration, and end-to-end tests plus test payloads validating readiness checks, JSON output, proposal embedding, and performance constraints.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 27, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 34703eba-b777-47d6-81c6-0cfd7aba1289

📥 Commits

Reviewing files that changed from the base of the PR and between 4bf4b22 and cdba27c.

📒 Files selected for processing (23)
  • .openshift-tests-extension/openshift_payload_cluster-version-operator.json
  • pkg/cvo/availableupdates_test.go
  • pkg/cvo/cvo.go
  • pkg/cvo/cvo_test.go
  • pkg/proposal/controller.go
  • pkg/proposal/controller_test.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/client_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/network.go
  • pkg/readiness/node_capacity.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/operator_health.go
  • pkg/readiness/pdb_drain.go
  • pkg/start/start.go
  • test/cvo/readiness.go
🚧 Files skipped from review as they are similar to previous changes (22)
  • pkg/cvo/availableupdates_test.go
  • pkg/readiness/operator_health.go
  • .openshift-tests-extension/openshift_payload_cluster-version-operator.json
  • pkg/readiness/node_capacity.go
  • pkg/start/start.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/client_test.go
  • pkg/proposal/controller.go
  • pkg/cvo/cvo_test.go
  • pkg/cvo/cvo.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/network.go
  • pkg/readiness/check.go
  • pkg/readiness/client.go
  • test/cvo/readiness.go
  • pkg/proposal/controller_test.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/pdb_drain.go

Walkthrough

Adds dynamic client wiring to CVO, a readiness-check framework (9 checks) with helpers/tests, and embeds per-target readiness JSON into proposal requests; includes e2e readiness tests and CI payload entries.

Changes

Cluster Readiness Checks

Layer / File(s) Summary
Dynamic client wiring and operator constructor
pkg/cvo/cvo.go, pkg/start/start.go, pkg/cvo/availableupdates_test.go, pkg/cvo/cvo_test.go
Adds dynamic client import and field to Operator, extends New(...) signature to accept dynamic.Interface, constructs dynamic client in NewControllerContext via ClientBuilder.DynamicClientOrDie, and passes it into proposal controller initialization; updates small test callsites.
Proposal controller readiness integration
pkg/proposal/controller.go, pkg/proposal/controller_test.go
Adds dynamicClient to Controller, refactors getProposals to compute per-target readiness JSON via readiness.RunAll using runReadinessJSON, updates tests to remove per-case readiness payload and adds a test exercising embedded readiness data with a fake dynamic client.
Readiness framework core
pkg/readiness/check.go, pkg/readiness/check_test.go
Defines Check interface, CheckResult/Output/Meta models with custom JSON, AllChecks registry, and RunAll orchestration that runs checks concurrently with timeouts and panic recovery; adds SectionError and framework unit tests.
Dynamic-client helpers
pkg/readiness/client.go, pkg/readiness/client_test.go
Exports GVRs and helper functions to get/list unstructured resources, condition extraction (GetConditions), nested field accessors, CompareVersions, and FormatLabelSelector with unit tests.
Nine readiness checks
pkg/readiness/node_capacity.go, pkg/readiness/pdb_drain.go, pkg/readiness/etcd_health.go, pkg/readiness/operator_health.go, pkg/readiness/cluster_conditions.go, pkg/readiness/api_deprecations.go, pkg/readiness/crd_compat.go, pkg/readiness/network.go, pkg/readiness/olm_lifecycle.go
Implements checks for node capacity, PDB disruption, etcd health, operator/MCP health, ClusterVersion conditions/history, API deprecations, CRD stored-version compatibility, network/proxy/TLS profile, and OLM operator lifecycle with OCP compatibility analysis and helpers.
Readiness tests (unit + per-check suites)
pkg/readiness/check_test.go, pkg/readiness/checks_test.go, pkg/readiness/olm_lifecycle_test.go, pkg/proposal/controller_test.go
Adds unit tests covering framework JSON, RunAll aggregation and panic recovery, client helper behavior, per-check unit tests with fake dynamic clients, comprehensive fake-cluster fixture, OLM lifecycle helper tests, and a proposal test that validates embedded readiness JSON.
End-to-end readiness integration and CI payload
test/cvo/readiness.go, .openshift-tests-extension/openshift_payload_cluster-version-operator.json
Adds Ginkgo suite exercising RunAll against a live cluster (validates check outputs against typed client queries and enforces per-check timing) and extends CI payload JSON with new readiness test cases.

Sequence Diagrams

sequenceDiagram
  participant CVO as cvo.Operator
  participant RunAll as readiness.RunAll
  participant Checks as Check (multiple)
  participant Dyn as dynamic.Interface
  CVO->>RunAll: RunAll(ctx, Dyn, current, target)
  RunAll->>Checks: Run(ctx, Dyn, current, target) (concurrent)
  Checks->>Dyn: GetResource/ListResources calls
  Checks-->>RunAll: per-check map[string]any
  RunAll-->>CVO: *Output{Checks, Meta}
Loading
sequenceDiagram
  participant Sync as proposal.Sync
  participant getProposals as getProposals
  participant RunAll as readiness.RunAll
  participant LLM as LLM service
  participant Dyn as dynamic.Interface
  Sync->>getProposals: invoke getProposals(ctx, ...)
  loop per-target
    getProposals->>RunAll: RunAll(ctx, Dyn, current, target)
    RunAll-->>getProposals: readiness Output
    getProposals->>getProposals: marshal readiness Output to JSON
    getProposals->>LLM: POST proposal including readiness JSON
    LLM-->>getProposals: proposals
  end
  getProposals-->>Sync: proposals
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • openshift/cluster-version-operator#1382: Earlier proposal-controller integration work that changes NewController/Sync to accept a dynamic client and generate readiness JSON via readiness.RunAll.

Suggested labels

approved, lgtm, verified


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Network check collects proxy URLs without sanitization, embedding them in proposals. Proxy URLs can contain credentials (user:password@host), exposing sensitive data in stored resources. Sanitize proxy URLs to redact credentials or exclude them entirely from readiness data. Only include network_type and tls_profile.
Docstring Coverage ⚠️ Warning Docstring coverage is 27.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning 12 critical error assertions in test/cvo/readiness.go lack meaningful messages (e.g., lines 37, 40, 43, 46, 50, 73, 84, 109, 145, 168, 181, 207). Makes test failures harder to diagnose. Add failure messages to all Expect assertions, especially error checks. Replace bare o.Expect(err).NotTo(o.HaveOccurred()) with context: o.Expect(err).NotTo(o.HaveOccurred(), "failed to get RestConfig")
Microshift Test Compatibility ⚠️ Warning Test uses ClusterVersion, ClusterOperator, Network (config.openshift.io/v1) and openshift-etcd namespace without MicroShift protections. Add [apigroup:config.openshift.io] tag to Describe name or add exutil.IsMicroShiftCluster() skip check in BeforeEach.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly references the main change: adding readiness checks to a new pkg/readiness package and wiring them into the proposal controller, which aligns with the primary objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 9 Ginkgo test titles in test/cvo/readiness.go use static, descriptive strings with no dynamic values (variables, timestamps, UUIDs, pod/node names, IP addresses) that change between runs.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The new Ginkgo test in test/cvo/readiness.go uses dynamic ground-truth values from the actual cluster rather than hardcoded multi-node assumptions, making it SNO-compatible.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds readiness checks that only read cluster state; no scheduling constraints (affinity/tolerations/nodeSelectors) are introduced.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes. Test file uses standard Ginkgo patterns. Readiness package has no init() functions. Test suite follows existing conventions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed New e2e test suite has no IPv4 assumptions or external connectivity requirements; all tests query cluster-internal Kubernetes resources only via dynamic/typed clients.
No-Weak-Crypto ✅ Passed PR contains no weak crypto (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time comparisons of secrets.
Container-Privileges ✅ Passed PR modifies only Go source code and test files, not container or K8s manifests; no privilege escalation settings are introduced or modified.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harche
Once this PR has been reviewed and has the lgtm label, please assign pratikmahajan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (3)
test/cvo/readiness.go (1)

20-20: ⚡ Quick win

Add Ginkgo labels to the new e2e suite.

This new suite has no Label(...), which makes targeted execution/selection harder in CI and local runs.

As per coding guidelines, "Use Ginkgo Labels to mark test categories (e.g., TechPreview, serial)".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cvo/readiness.go` at line 20, The new Ginkgo suite declaration using
g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness
checks`, func() {...}) is missing Labels; update that Describe invocation to
include appropriate Ginkgo Label options (e.g., g.Label("e2e"),
g.Label("serial"), g.Label("TechPreview") or other team-standard tags) by adding
g.Label(...) into the Describe call so the suite can be targeted in CI and local
runs while keeping the existing description string and test body unchanged.
pkg/proposal/controller.go (1)

292-312: 🏗️ Heavy lift

Avoid re-running cluster-wide readiness checks for every target version.

Line 292 notes this already: 7 of 9 checks are target-independent, but Line 300 and Line 311 execute all checks per target. This can make sync latency scale with update count and slow reconciliation on clusters with many targets.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/proposal/controller.go` around lines 292 - 312, The loop currently calls
runReadinessJSON for every target, re-running cluster-wide checks repeatedly;
refactor so you compute target-independent readiness once and reuse it for all
targets while still running only the target-dependent checks per target.
Concretely: call a new or adjusted function (e.g., runClusterWideReadiness or
runReadinessJSON(ctx,dynamicClient,currentVersion,"") returning
baseReadinessJSON) before iterating availableUpdates/conditionalUpdates, then
inside each loop call a lightweight runTargetedReadiness (or runReadinessJSON
with a flag) that merges target-specific checks (api_deprecations,
olm_lifecycle) into the base JSON, and pass the merged readinessJSON to
getProposal; keep collecting errs and proposals as before.
pkg/readiness/checks_test.go (1)

501-612: ⚡ Quick win

Use table-driven subtests for paired scenarios of the same check.

TestAPIDeprecationsCheck/TestAPIDeprecationsCheck_NoBlockers and TestCRDCompatCheck/TestCRDCompatCheck_NoIssues are the same test shape with different fixtures/expected values; folding them into one table-driven test per check will reduce duplication and future drift.

As per coding guidelines, "Prefer table-driven tests over multiple similar test functions. If two test functions differ only in setup values, collapse them into one function with test-case tuples".

Also applies to: 613-708

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/readiness/checks_test.go` around lines 501 - 612, Collapse the two tests
TestAPIDeprecationsCheck and TestAPIDeprecationsCheck_NoBlockers into a single
table-driven test: define a slice of test cases each containing the name, input
objects (runtime.Object slice), from/to versions, and expected counts/entries,
then iterate cases with t.Run; inside each subtest create the fake client via
newFakeDynamicClient, instantiate APIDeprecationsCheck, call check.Run and
assert blocker_apis, warning_apis and summary values for that case. Do the same
refactor for TestCRDCompatCheck/TestCRDCompatCheck_NoIssues (and the other
paired tests mentioned) so each check has one test function that loops over
subtests and performs per-case setup and assertions, preserving the existing
assertions but scoped to the subtest.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/proposal/controller_test.go`:
- Around line 1436-1441: The code currently computes jsonStart by adding
len("```json\n") before checking for a missing fence which can produce incorrect
slices; change the logic in the test to first find start :=
strings.Index(request, "```json\n") and if start < 0 call t.Fatal, then set
jsonStart := start + len("```json\n"), compute jsonEnd :=
strings.Index(request[jsonStart:], "\n```") and if jsonEnd < 0 call t.Fatal, and
finally extract readinessJSON using readinessJSON := request[jsonStart :
jsonStart+jsonEnd]; update references to jsonStart/jsonEnd accordingly.

In `@pkg/proposal/controller.go`:
- Around line 451-454: In runReadinessJSON, when dynamicClient == nil the
function currently returns "{}" silently; add a log entry (info or warning)
before returning that includes the function name and the currentVersion and
targetVersion to make the fallback visible; obtain a logger from the ctx if your
project uses context-based logging (or use the package logger used elsewhere in
this file) and emit something like "dynamicClient nil, returning empty readiness
JSON" with versions attached, then return "{}".

In `@pkg/readiness/check_test.go`:
- Around line 150-224: TestRunAllMixedResults reimplements orchestration instead
of calling RunAll; change it to exercise the real RunAll path by injecting the
fake checks via the AllChecks helper. Replace the manual loop with code that
temporarily overrides the AllChecks function (or uses any existing injection
point) to return []Check{okCheck, failCheck, partialCheck}, call
RunAll(context.Background(), nil, "4.21.5", "4.21.8"), then restore AllChecks
and assert counts and individual CheckResult entries (referencing
TestRunAllMixedResults, RunAll, AllChecks, and fakeCheck to locate the code).
Ensure the test still verifies passing, failing, and partial results and
restores the original AllChecks to avoid test pollution.

In `@pkg/readiness/check.go`:
- Around line 154-158: The SectionError function dereferences err with
err.Error() which will panic if err == nil; update SectionError to guard against
nil by checking if err == nil before calling Error(), and append a safe
representation (e.g. nil or an empty string like "") into the map under the
"error" key when err is nil so the function never calls err.Error() on a nil
error; keep the signature and the append to the errors slice intact while using
this conditional branch inside SectionError.

In `@pkg/readiness/client.go`:
- Around line 154-162: The CompareVersions function currently hides parse errors
by returning 0 on semver.ParseTolerant failures, conflating invalid input with
equal versions; change CompareVersions to return (int, error) (or (int, bool)
validity) instead of just int, and propagate parse errors from
semver.ParseTolerant for both va and vb rather than returning 0—e.g., update the
function signature and replace the two error branches so they return an
appropriate error when semver.ParseTolerant(a) or semver.ParseTolerant(b) fails,
then perform the comparison only after successful parses and return the int
comparison plus nil error (or true validity).

In `@pkg/readiness/cluster_conditions.go`:
- Around line 64-67: Update the stale comment that mentions “upstream” to
accurately describe the emitted fields by reflecting that this block populates
the result map with "channel" and "cluster_id" (i.e., keys result["channel"]
from NestedString(cv.Object, "spec", "channel") and result["cluster_id"] from
NestedString(cv.Object, "spec", "clusterID")); edit the nearby comment in
pkg/readiness/cluster_conditions.go so it no longer references upstream and
instead describes that the code extracts channel and cluster_id from the spec.

In `@pkg/readiness/etcd_health.go`:
- Around line 36-38: The current readiness logic uses phase :=
NestedString(pod.Object, "status", "phase") and treats ready := phase ==
"Running", which is wrong because a Running pod can still be NotReady; change
the check to inspect the pod's status.conditions for the condition with type
"Ready" and set ready only if that condition's status == "True". In practice,
replace the phase-based check in the function handling pod health (where
variables pod.Object, phase, and ready are used) with logic that iterates
status.conditions (or reads the Ready condition via the existing helpers) and
assigns ready = (Ready condition == "True") before updating healthy_members.

In `@pkg/readiness/network.go`:
- Around line 29-31: The current code always sets result["sdn_warning"] when
networkType == "OpenShiftSDN"; change it to only add the warning when the
requested target version crosses the deprecation boundary (e.g., targetVersion
>= 4.17). In pkg/readiness/network.go check networkType == "OpenShiftSDN" AND
compare the requested targetVersion (variable name targetVersion or equivalent)
using a semantic version compare (e.g., golang.org/x/mod/semver or a semver
package) and only set result["sdn_warning"] when the parsed/comparable
targetVersion is >= "4.17"; if targetVersion is not available in scope, add it
as a parameter to the function that builds result or obtain it from the existing
context before performing the conditional. Ensure you use explicit semver
comparison rather than string comparison to avoid incorrect matches.

In `@pkg/readiness/olm_lifecycle.go`:
- Around line 248-252: The current olmProperty type uses Value string which
makes json.Unmarshal fail if any element's "value" is non-string; change the
struct used by parseMinOCPFromProperties so Value is json.RawMessage (or
interface{}) and then in parseMinOCPFromProperties iterate each olmProperty,
attempt to decode Value as a string (falling back to converting numbers/objects
to string safely) and only consider entries where Type ==
"olm.minOpenShiftVersion" and the extracted string is valid; update references
to olmProperty and the parseMinOCPFromProperties logic to handle the new Value
type and avoid returning "" on unmarshal errors caused by non-string values.
- Around line 64-66: pkgIndex is currently built with indexByName and then
looked up by pkgName which ignores catalog/source namespace and can
mis-correlate PackageManifests; change construction to use indexByNamespacedName
and key lookups by the PackageManifest's namespaced name (use
spec.sourceNamespace) instead of plain name (replace indexByName ->
indexByNamespacedName where pkgIndex is created and any pkgIndex[pkgName]
lookups). Also harden parseMinOCPFromProperties to accept non-string
olm.properties values by unmarshaling property.Value as json.RawMessage (or
interface{}) and only treat it as a string when the property.Name ==
"olm.minOpenShiftVersion"; extract the string safely (e.g., unquote or decode
JSON string) and add a test with an object-valued property to ensure non-string
values do not cause json.Unmarshal to fail or return an empty minOCP.

In `@pkg/readiness/pdb_drain.go`:
- Around line 26-27: The code currently uses NestedString(pdb.Object, "spec",
"maxUnavailable") / "minAvailable" which loses IntOrString forms; replace these
calls with unstructured.NestedFieldNoCopy(pdb.Object, "spec", "maxUnavailable")
and unstructured.NestedFieldNoCopy(pdb.Object, "spec", "minAvailable") to fetch
the raw interface{}, then convert that raw value into a preserved string
representation (e.g., if it's already a string keep it, if it's numeric or a
JSON number use fmt.Sprintf("%v", v), or parse into
k8s.io/apimachinery/pkg/util/intstr.IntOrString when needed) so the original
IntOrString value is preserved when reporting; reference the symbols
NestedFieldNoCopy, pdb.Object, maxUnavailable, minAvailable, and
intstr.IntOrString when making the change.

In `@test/cvo/readiness.go`:
- Around line 25-54: Replace the use of context.TODO() (the ctx variable
declared at top and used in BeforeEach and later calls such as readiness.RunAll)
with a timeout-bound context created via context.WithTimeout inside BeforeEach
(e.g., ctx, cancel := context.WithTimeout(context.Background(),
<sensibleDuration>)) and ensure you defer cancel() in the same scope so all
cluster API calls (configClient.ClusterVersions().Get, dynamic/kube client
calls, and readiness.RunAll) use this cancelable context to avoid hanging tests;
update any other test helpers that reference the package-level ctx to accept or
use this new timeout context.

---

Nitpick comments:
In `@pkg/proposal/controller.go`:
- Around line 292-312: The loop currently calls runReadinessJSON for every
target, re-running cluster-wide checks repeatedly; refactor so you compute
target-independent readiness once and reuse it for all targets while still
running only the target-dependent checks per target. Concretely: call a new or
adjusted function (e.g., runClusterWideReadiness or
runReadinessJSON(ctx,dynamicClient,currentVersion,"") returning
baseReadinessJSON) before iterating availableUpdates/conditionalUpdates, then
inside each loop call a lightweight runTargetedReadiness (or runReadinessJSON
with a flag) that merges target-specific checks (api_deprecations,
olm_lifecycle) into the base JSON, and pass the merged readinessJSON to
getProposal; keep collecting errs and proposals as before.

In `@pkg/readiness/checks_test.go`:
- Around line 501-612: Collapse the two tests TestAPIDeprecationsCheck and
TestAPIDeprecationsCheck_NoBlockers into a single table-driven test: define a
slice of test cases each containing the name, input objects (runtime.Object
slice), from/to versions, and expected counts/entries, then iterate cases with
t.Run; inside each subtest create the fake client via newFakeDynamicClient,
instantiate APIDeprecationsCheck, call check.Run and assert blocker_apis,
warning_apis and summary values for that case. Do the same refactor for
TestCRDCompatCheck/TestCRDCompatCheck_NoIssues (and the other paired tests
mentioned) so each check has one test function that loops over subtests and
performs per-case setup and assertions, preserving the existing assertions but
scoped to the subtest.

In `@test/cvo/readiness.go`:
- Line 20: The new Ginkgo suite declaration using g.Describe(`[Jira:"Cluster
Version Operator"] cluster-version-operator readiness checks`, func() {...}) is
missing Labels; update that Describe invocation to include appropriate Ginkgo
Label options (e.g., g.Label("e2e"), g.Label("serial"), g.Label("TechPreview")
or other team-standard tags) by adding g.Label(...) into the Describe call so
the suite can be targeted in CI and local runs while keeping the existing
description string and test body unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0f81fe59-c00a-4053-8218-57d36ea17f93

📥 Commits

Reviewing files that changed from the base of the PR and between 52cc8e3 and e98b557.

📒 Files selected for processing (22)
  • pkg/cvo/availableupdates_test.go
  • pkg/cvo/cvo.go
  • pkg/cvo/cvo_test.go
  • pkg/proposal/controller.go
  • pkg/proposal/controller_test.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/client_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/network.go
  • pkg/readiness/node_capacity.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/operator_health.go
  • pkg/readiness/pdb_drain.go
  • pkg/start/start.go
  • test/cvo/readiness.go

Comment thread pkg/proposal/controller_test.go Outdated
Comment thread pkg/proposal/controller.go
Comment thread pkg/readiness/check_test.go
Comment thread pkg/readiness/check.go
Comment thread pkg/readiness/client.go Outdated
Comment thread pkg/readiness/network.go
Comment thread pkg/readiness/olm_lifecycle.go
Comment thread pkg/readiness/olm_lifecycle.go
Comment thread pkg/readiness/pdb_drain.go Outdated
Comment thread test/cvo/readiness.go Outdated
@harche
Copy link
Copy Markdown
Contributor Author

harche commented May 28, 2026

/hold until tested against the CVO skill in agentic-skills repo.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2026
@harche harche marked this pull request as ready for review May 28, 2026 03:46
@harche harche force-pushed the readiness_pkg branch 2 times, most recently from 8be9908 to a3f0869 Compare May 28, 2026 11:30
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/readiness/check.go`:
- Around line 92-125: The goroutine launched in RunAll that calls Check.Run can
panic and crash the process; wrap the goroutine body with a defer recover to
catch panics, convert the panic value (and optionally debug.Stack()) into an
error string, set result.Status = StatusError, set result.Error to that string
(and ensure result.Data is a non-nil map), and then continue to store the result
into results[ch.Name()] under the existing mu lock so the panic is recorded
instead of crashing; implement this recovery logic inside the anonymous func
passed to go (the closure that invokes ch.Run) and ensure the defer runs before
wg.Done().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 97741758-f119-4be3-b58c-be64deb6dbfe

📥 Commits

Reviewing files that changed from the base of the PR and between e98b557 and a3f0869.

📒 Files selected for processing (23)
  • .openshift-tests-extension/openshift_payload_cluster-version-operator.json
  • pkg/cvo/availableupdates_test.go
  • pkg/cvo/cvo.go
  • pkg/cvo/cvo_test.go
  • pkg/proposal/controller.go
  • pkg/proposal/controller_test.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/client_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/network.go
  • pkg/readiness/node_capacity.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/operator_health.go
  • pkg/readiness/pdb_drain.go
  • pkg/start/start.go
  • test/cvo/readiness.go
🚧 Files skipped from review as they are similar to previous changes (19)
  • pkg/readiness/node_capacity.go
  • pkg/cvo/availableupdates_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/client_test.go
  • pkg/start/start.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/client.go
  • pkg/readiness/network.go
  • pkg/cvo/cvo.go
  • pkg/readiness/check_test.go
  • pkg/proposal/controller.go
  • pkg/readiness/pdb_drain.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/olm_lifecycle.go
  • test/cvo/readiness.go
  • pkg/readiness/operator_health.go
  • pkg/readiness/checks_test.go
  • pkg/proposal/controller_test.go

Comment thread pkg/readiness/check.go
@harche
Copy link
Copy Markdown
Contributor Author

harche commented May 28, 2026

Manual Test Report — Readiness Checks on Live Cluster

Environment

Cluster OCP 4.21.5 on GCP
Nodes 6 (3 master + 3 worker)
Feature set Default (proposal controller feature gate bypassed locally for testing)
Lightspeed operator Installed, ApprovalPolicy cluster configured (all stages Manual)
Binary Cross-compiled on macOS, built on-cluster via OpenShift Build, deployed via oc adm release new with internal registry
Binary size match 101,271,851 bytes (pod == local) ✅

Results

Proposals created: 11 (one per available update: 4.21.6 → 4.21.16) ✅

ProposalApprovals created: 11 (by lightspeed operator, all stages Manual — no action taken without human approval) ✅

Readiness checks (sample from ota-4-21-5-to-4-21-16):

Check Status Key data
cluster_conditions ✅ ok (0.015s) channel=stable-4.21, Upgradeable=False (reconciling), history present
operator_health ✅ ok (0.607s) 34 COs healthy, 0 degraded, 0 not-upgradeable, 2 MCPs (master 3/3, worker 3/3)
etcd_health ✅ ok (0.223s) 3/3 members healthy and Ready on all masters
node_capacity ✅ ok (0.028s) 6 nodes, 6 ready, 0 unschedulable
pdb_drain ✅ ok (0.030s) 21 PDBs, 0 blocking
crd_compat ✅ ok (0.571s) 160 CRDs, 0 version issues
api_deprecations ✅ ok (0.028s) 0 blockers, 0 warnings
network ✅ ok (0.406s) OVNKubernetes, TLS Intermediate, no proxy
olm_operator_lifecycle ✅ ok (2.248s) 0 OLM subscriptions (none on this cluster)

Total readiness time: 2.25s per target | All 9 checks OK, 0 errored

What was verified

  1. CVO binary with readiness wiring deploys and starts successfully
  2. Proposal controller syncs and creates proposals for all available updates
  3. Readiness data (real cluster state) is embedded in each proposal's spec.request as JSON
  4. Readiness JSON contains accurate cluster data (verified node count, operator count, etcd members, network type against ground truth)
  5. ApprovalPolicy gates are respected — proposals created but no agent action without manual approval
  6. All 9 readiness checks run in parallel with per-check timeouts and produce structured output

Not tested

  • Agent actually processing a proposal (would require approving the Analysis stage)
  • Conditional updates with risk data (cluster had none)
  • Target-dependent check accuracy (api_deprecations blocker detection — cluster had no deprecated API usage)

🤖 Generated with Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/cvo/readiness.go`:
- Around line 197-199: The direct assertion
result.Data["blocking_pdbs"].([]map[string]any) can panic; change to a safe
check: retrieve value from result.Data, use a comma-ok type assertion for
[]map[string]any (or []any then convert), assert ok with o.Expect (e.g.,
o.Expect(ok).To(BeTrue(), "...")) and only then compute len(blockingPDBs) and
compare to expectedBlocking using the existing o.Expect call; reference symbols:
blockingPDBs, result.Data["blocking_pdbs"], o.Expect, expectedBlocking.
- Line 130: The direct type assertion result.Data["summary"].(map[string]any)
can panic; change the code in test/cvo/readiness.go to use a safe two-value
assertion (e.g., v, ok := result.Data["summary"].(map[string]any) and fail the
test with a clear message if !ok) or replace the assertion with a Gomega matcher
(e.g., Expect(result.Data["summary"]).To(BeAssignableToTypeOf(map[string]any{}))
/ Expect(result.Data["summary"]).To(BeAKindOf(map[string]any))) before using the
variable `summary` so the test reports a failure instead of panicking.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7ff63e7c-fd65-4b18-a2ac-95a18bff8ad1

📥 Commits

Reviewing files that changed from the base of the PR and between a3f0869 and 4bf4b22.

📒 Files selected for processing (23)
  • .openshift-tests-extension/openshift_payload_cluster-version-operator.json
  • pkg/cvo/availableupdates_test.go
  • pkg/cvo/cvo.go
  • pkg/cvo/cvo_test.go
  • pkg/proposal/controller.go
  • pkg/proposal/controller_test.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/client_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/network.go
  • pkg/readiness/node_capacity.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/operator_health.go
  • pkg/readiness/pdb_drain.go
  • pkg/start/start.go
  • test/cvo/readiness.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/cvo/cvo_test.go
🚧 Files skipped from review as they are similar to previous changes (21)
  • pkg/cvo/availableupdates_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/pdb_drain.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/network.go
  • .openshift-tests-extension/openshift_payload_cluster-version-operator.json
  • pkg/readiness/operator_health.go
  • pkg/readiness/client.go
  • pkg/proposal/controller.go
  • pkg/readiness/client_test.go
  • pkg/readiness/node_capacity.go
  • pkg/readiness/check_test.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/check.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/crd_compat.go
  • pkg/cvo/cvo.go
  • pkg/start/start.go
  • pkg/readiness/checks_test.go
  • pkg/proposal/controller_test.go
  • pkg/readiness/etcd_health.go

Comment thread test/cvo/readiness.go Outdated
Comment thread test/cvo/readiness.go Outdated
Add pkg/readiness package with 9 cluster readiness checks that gather
pre-upgrade health data: cluster conditions, operator health, API
deprecations, node capacity, PDB drain blockers, etcd health, network
config, CRD compatibility, and OLM operator lifecycle.

Wire readiness.RunAll() into the proposal controller, replacing the
hardcoded readinessJSON placeholder with real per-target readiness data
that gets embedded in each proposal's request body.

Plumb dynamic.Interface from pkg/start through cvo.New() to the proposal
controller to support the readiness checks' cluster queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

@harche: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift-conformance cdba27c link true /test e2e-hypershift-conformance
ci/prow/e2e-agnostic-ovn-upgrade-into-change cdba27c link true /test e2e-agnostic-ovn-upgrade-into-change
ci/prow/e2e-agnostic-ovn-techpreview-serial-1of3 cdba27c link true /test e2e-agnostic-ovn-techpreview-serial-1of3

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant