Skip to content

NO-JIRA: OpenShiftUpdateRiskMightApply: bump pending to 15m from 10m#1372

Open
hongkailiu wants to merge 1 commit intoopenshift:mainfrom
hongkailiu:ignore-testing-alert
Open

NO-JIRA: OpenShiftUpdateRiskMightApply: bump pending to 15m from 10m#1372
hongkailiu wants to merge 1 commit intoopenshift:mainfrom
hongkailiu:ignore-testing-alert

Conversation

@hongkailiu
Copy link
Copy Markdown
Member

@hongkailiu hongkailiu commented Apr 10, 2026

This is to follow up a recent finding [1].

Further digging shows that the related risks to our e2e tests are TestAlertFeatureE2ETestOTA1813, SyntheticRiskA.

They are in the pending state. This pull bump to a longer pending
time so that it gives more time window to get e2e finish. It could
avoid disruption from Production alerts.

We could use max by (namespace, risk, reason) (last_over_time(cluster_version_risk_conditions{job="cluster-version-operator", condition="Applies", risk!~"TestAlertFeatureE2ETest.*"}[5m]) != 0) to ignore testing alerts,
but it does not look good to have code handling special cases only for testing.

This will recover the health of the TP-enabled jobs in CI.

[1]. #1367 (comment)

[2]. openshift/origin#30929

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced Prometheus alert expression filtering to improve the accuracy of cluster update risk monitoring.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 0591f330-3d0d-4aa2-8f9c-ff1cb1d59c88

📥 Commits

Reviewing files that changed from the base of the PR and between c8406b6 and c43f2cd.

📒 Files selected for processing (1)
  • install/0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • install/0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml

Walkthrough

Updated the OpenShiftUpdateRiskMightApply Prometheus alert in the cluster-version-operator configuration: the alert's annotations.summary text was changed from "10 minutes" to "15 minutes" and the alert for duration was increased from 10m to 15m.

Changes

Cohort / File(s) Summary
Prometheus Alert Timing
install/0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml
Updated OpenShiftUpdateRiskMightApply alert: changed annotations.summary text from "10 minutes" → "15 minutes" and updated for: from 10m15m (no other fields changed).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 10, 2026
@hongkailiu hongkailiu changed the title Ignore risks from e2e testing NO-JIRA: Ignore risks from e2e testing Apr 10, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 10, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@hongkailiu: This pull request explicitly references no jira issue.

Details

In response to this:

This is to follow up a recent finding [1].

Further digging shows that the related risks to our e2e tests are TestAlertFeatureE2ETestOTA1813, SyntheticRiskA.

This pull ignores the risks with prefix TestAlertFeatureE2ETest similar to [2].

I will fix SyntheticRiskA by modifying "fauxinnati" so that the risks coming from it have the prefix too.

This will recover the health of the TP-enabled jobs in CI.

[1]. #1367 (comment)

[2]. openshift/origin#30929

Summary by CodeRabbit

  • Bug Fixes
  • Enhanced Prometheus alert expression filtering to improve the accuracy of cluster update risk monitoring.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This is to follow up a recent finding [1].

Further digging shows that the related risks to our e2e tests are `TestAlertFeatureE2ETestOTA1813`, `SyntheticRiskA`.

They are in the pending state. This pull bump to a longer pending
time so that it gives more time window to get e2e finish. It could
avoid disruption from Production alerts.

We could use `max by (namespace, risk, reason) (last_over_time(cluster_version_risk_conditions{job="cluster-version-operator", condition="Applies", risk!~"TestAlertFeatureE2ETest.*"}[5m]) != 0)` to ignore testing alerts,
but it does not look good to have code handling special cases only for testing.

This will recover the health of the TP-enabled jobs in CI.

[1]. openshift#1367 (comment)

[2]. openshift/origin#30929
@hongkailiu hongkailiu force-pushed the ignore-testing-alert branch from c8406b6 to c43f2cd Compare April 10, 2026 22:33
@hongkailiu hongkailiu changed the title NO-JIRA: Ignore risks from e2e testing NO-JIRA: OpenShiftUpdateRiskMightApply: bump pending to 15m from 10m Apr 10, 2026
Copy link
Copy Markdown
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 10, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 11, 2026

@hongkailiu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-techpreview c43f2cd link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-hypershift-conformance c43f2cd link true /test e2e-hypershift-conformance
ci/prow/e2e-agnostic-operator c43f2cd link true /test e2e-agnostic-operator
ci/prow/e2e-agnostic-ovn-upgrade-into-change c43f2cd link true /test e2e-agnostic-ovn-upgrade-into-change
ci/prow/e2e-agnostic-ovn c43f2cd link true /test e2e-agnostic-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants