Skip to content

DNM/TEST: checking bastion hosts for gcp/azure#78883

Open
jluhrsen wants to merge 1 commit intoopenshift:mainfrom
jluhrsen:CORENET-7068
Open

DNM/TEST: checking bastion hosts for gcp/azure#78883
jluhrsen wants to merge 1 commit intoopenshift:mainfrom
jluhrsen:CORENET-7068

Conversation

@jluhrsen
Copy link
Copy Markdown
Contributor

@jluhrsen jluhrsen commented May 6, 2026

Summary

Add automated bastion host validation steps for GCP and Azure to verify:

  1. SSH access to bastion VM
  2. Container runtime (podman/docker) is installed and functional
  3. Network connectivity from bastion to cluster

Results are automatically saved to artifacts (bastion-validation.log and bastion-validation-results.txt) instead of requiring manual validation via long sleep periods.

Testing

Integrated bastion provisioning and validation into standard OVN e2e workflows:

  • GCP: openshift-e2e-gcp-ovn-workflow
  • Azure: openshift-e2e-azure-ovn-workflow

Rehearse with:

  • /pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.14-e2e-gcp-ovn
  • /pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.8-e2e-azure-ovn

Jira

https://redhat.atlassian.net/browse/CORENET-7068

Summary by CodeRabbit

  • New Features
    • Added Azure Bastion host validation step to CI/CD workflows to verify network connectivity, SSH access, and container runtime availability.
    • Added GCP Bastion host validation step to CI/CD workflows for similar infrastructure readiness checks.
    • Integrated validation checks into Azure OVN, Azure CSI, GCP OVN, and GCP custom endpoints test workflows.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Walkthrough

Adds Azure and GCP bastion-host validation steps: new validation scripts, step-ref YAML and metadata, OWNERS, and inserts the validation steps into several OpenShift E2E workflows; scripts perform SSH, container runtime, and cluster-network checks and emit artifacts with results.

Changes

Azure Bastion Validation

Layer / File(s) Summary
Data / Metadata
ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.metadata.json, ci-operator/step-registry/azure/validate/bastionhost/OWNERS
Adds metadata referencing the YAML and an OWNERS file listing five approvers.
Step Definition
ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.yaml
Introduces azure-validate-bastionhost step-ref with image, grace_period, command, resource requests, credentials mount, and documentation.
Core Implementation
ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh
New Bash script implementing SSH connectivity checks to the bastion, container runtime detection and test (podman/docker), cluster API reachability checks, optional internal network checks, system information collection, artifact logging, and overall pass/fail exit.
Workflow Wiring
ci-operator/step-registry/openshift/e2e/azure/csi/file/.../openshift-e2e-azure-csi-file-cross-subscription-workflow.yaml, ci-operator/step-registry/openshift/e2e/azure/ovn/.../openshift-e2e-azure-ovn-workflow.yaml
Inserts azure-validate-bastionhost as a pre-step after bastion provisioning in listed workflows.

GCP Bastion Validation

Layer / File(s) Summary
Data / Metadata
ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.metadata.json, ci-operator/step-registry/gcp/validate/bastionhost/OWNERS
Adds metadata referencing the YAML and an OWNERS file listing three approvers.
Step Definition
ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.yaml
Introduces gcp-validate-bastionhost step-ref with image, grace_period, command, resource requests, credentials mount, and documentation.
Core Implementation
ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh
New Bash script implementing UID/SSH validation, SSH connectivity checks to the bastion, container runtime detection and test (podman/docker), cluster API reachability checks, system information collection, artifact logging, and overall pass/fail exit.
Workflow Wiring
ci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/.../openshift-e2e-gcp-custom-endpoints-workflow.yaml, ci-operator/step-registry/openshift/e2e/gcp/ovn/.../openshift-e2e-gcp-ovn-workflow.yaml
Inserts gcp-validate-bastionhost as a pre-step after bastion provisioning and adds related pre/post bastion steps in the GCP OVN workflow.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CI as CI Job
    participant Shared as Shared Data (bastion creds)
    participant Bastion as Bastion Host (SSH)
    participant Container as Container Runtime (podman/docker)
    participant Cluster as Cluster API
    rect rgba(0,128,255,0.5)
    CI->>Shared: read bastion host/user/key
    end
    CI->>Bastion: SSH connect and run checks
    Bastion->>Container: detect & run test container
    CI->>Cluster: validate API reachability via Bastion (proxy/ssh)
    CI->>CI: collect artifacts (logs, results)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title uses 'DNM/TEST' prefix indicating do-not-merge/test status, which is vague about the actual technical changes and does not clearly describe the primary implementation. Consider revising the title to clearly describe the main change, such as 'Add bastion host validation steps for GCP and Azure' or similar, removing vague prefixes that obscure the actual change.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo test files present in this PR. PR adds CI/CD scripts and configs (bash, YAML, JSON). Custom check for stable test names is not applicable.
Test Structure And Quality ✅ Passed Not applicable. This PR adds only CI/CD infrastructure (shell scripts, YAML workflows, OWNERS files, JSON metadata). No Ginkgo test code is present.
Microshift Test Compatibility ✅ Passed PR adds no Ginkgo e2e tests. Only CI/CD infrastructure (bash scripts, YAML step definitions, metadata files). Custom check applies only to new Ginkgo tests, so it is inapplicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds CI infrastructure (shell scripts, YAML configs, OWNERS files) for bastion validation, not Ginkgo e2e tests. Check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed No Kubernetes workload manifests added. Changes are CI test steps and scripts without scheduling constraints affecting topology compatibility.
Ote Binary Stdout Contract ✅ Passed PR adds only shell scripts, YAML configs, and metadata files for bastion validation steps. No Go binaries, test code, or OTE implementations present. OTE Binary Stdout Contract check not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds only Bash CI scripts, YAML workflow configs, and OWNERS files. No Ginkgo e2e tests added. Custom check applies only when Ginkgo tests are added, so it is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@jluhrsen
Copy link
Copy Markdown
Contributor Author

jluhrsen commented May 6, 2026

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-aws-ovn

@openshift-ci openshift-ci Bot requested review from dobsonj and neisw May 6, 2026 04:29
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@jluhrsen: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci openshift-ci Bot added the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label May 6, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@jluhrsen: job(s): pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-aws-ovn either don't exist or were not found to be affected, and cannot be rehearsed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`:
- Around line 162-169: The current check pings the bastion's own private IP
(BASTION_PRIVATE_IP) which always succeeds; change the check to ping a different
internal host to validate internal connectivity: retrieve a distinct internal
target (e.g., CLUSTER_INTERNAL_IP or MASTER_PRIVATE_IP) from the shared data,
fall back to skipping the test with a clear log if that target is absent, then
call run_ssh_cmd (using SSH_PRIV_KEY_PATH, BASTION_SSH_USER, BASTION_PUBLIC_IP)
to run "ping -c 3 <internal_target>" and treat failure as a real connectivity
error (log and return non-zero) while keeping the success log "✓ Internal
network is accessible from bastion". Ensure you reference and use the existing
variables BASTION_PRIVATE_IP only for informational logging and do not ping it
as the connectivity probe.
- Line 37: The ssh invocation is vulnerable to word-splitting and injection
because ${options} and ${user}@${host} are unquoted; change options to an array
(e.g., options=()) and expand it as "${options[@]}", and quote the identity and
remote target and command as: -i "${sshkey}" "${user}@${host}" "${remote_cmd}"
so ssh receives each argument atomically; update the assignment to output to use
the array expansion and quoted variables (output=$(ssh "${options[@]}" -i
"${sshkey}" "${user}@${host}" "${remote_cmd}" 2>&1)) and adjust any code that
builds options to push elements into the options array rather than concatenating
a string.

In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`:
- Around line 136-138: The code expands CLUSTER_NAME and BASE_DOMAIN directly
which breaks under set -o nounset; update the check to guard against unset
variables before using them — for example, first ensure CLUSTER_NAME and
BASE_DOMAIN are set (e.g. test -n "${CLUSTER_NAME:-}" && test -n
"${BASE_DOMAIN:-}") or use safe parameter expansion like
"${SHARED_DIR}/api.${CLUSTER_NAME:-}.${BASE_DOMAIN:-}" when testing the file,
then fallback to checking "${SHARED_DIR}/metadata.json"; modify the if that
references SHARED_DIR, CLUSTER_NAME and BASE_DOMAIN so it won't expand undefined
variables (symbols: CLUSTER_NAME, BASE_DOMAIN, SHARED_DIR, metadata.json).
- Around line 138-143: The script is reading an Azure-specific metadata key for
BASE_DOMAIN; update the jq path to use the GCP/installer-appropriate key (e.g.,
read '.baseDomain' from "${SHARED_DIR}/metadata.json") and keep the same
error-suppression/fallback logic so BASE_DOMAIN is correctly populated for GCP.
Modify the lines that set BASE_DOMAIN (and consequently CLUSTER_API) to use jq
-r '.baseDomain' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "" and ensure
CLUSTER_NAME, BASE_DOMAIN and CLUSTER_API variables are validated the same way
as before.
- Line 44: The SSH invocation is vulnerable to word-splitting because ${user}
and ${host} are unquoted and ${options} is a plain string; update the call in
the ssh command (where variables options, sshkey, user, host, remote_cmd are
used) to quote the remote target as "${user}@${host}" and convert options into
an array (e.g. options_array and expand as "${options_array[@]}") so each option
is preserved; ensure the identity key and remote command remain quoted as
"${sshkey}" and "${remote_cmd}" respectively and consider adding a "--" before
the remote command to avoid option parsing issues.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5a7a12ea-4b63-45b3-8e52-55447d0e6d1d

📥 Commits

Reviewing files that changed from the base of the PR and between 0fe1b46 and 82b3610.

📒 Files selected for processing (10)
  • ci-operator/step-registry/azure/validate/bastionhost/OWNERS
  • ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh
  • ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.metadata.json
  • ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.yaml
  • ci-operator/step-registry/gcp/validate/bastionhost/OWNERS
  • ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh
  • ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.metadata.json
  • ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.yaml
  • ci-operator/step-registry/openshift/e2e/azure/csi/file/cross-subscription/openshift-e2e-azure-csi-file-cross-subscription-workflow.yaml
  • ci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/openshift-e2e-gcp-custom-endpoints-workflow.yaml

[[ $- == *x* ]] && WAS_TRACING=true || WAS_TRACING=false
set +x

output=$(ssh ${options} -i "${sshkey}" ${user}@${host} "${remote_cmd}" 2>&1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# Find and read the file
fd -type f -name "*azure-validate-bastionhost-commands.sh"

Repository: openshift/release

Length of output: 232


🏁 Script executed:

# Find the file with correct syntax
fd --type f "azure-validate-bastionhost-commands.sh"

Repository: openshift/release

Length of output: 153


🏁 Script executed:

# Read the file to examine the SSH command and surrounding context
cat -n "ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh"

Repository: openshift/release

Length of output: 9943


Quote SSH arguments safely to avoid splitting/injection.

Line 37 has unquoted ${user}@${host}, which is error-prone with unexpected input values. Additionally, refactor options into an array for idiomatic bash and safer argument handling.

Suggested fix
-    options="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ServerAliveInterval=30 -o ServerAliveCountMax=5 -o ConnectTimeout=10"
+    local -a options=(
+        -o StrictHostKeyChecking=no
+        -o UserKnownHostsFile=/dev/null
+        -o ServerAliveInterval=30
+        -o ServerAliveCountMax=5
+        -o ConnectTimeout=10
+    )
@@
-    output=$(ssh ${options} -i "${sshkey}" ${user}@${host} "${remote_cmd}" 2>&1)
+    output=$(ssh "${options[@]}" -i "${sshkey}" "${user}@${host}" "${remote_cmd}" 2>&1)
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 37-37: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 37-37: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 37-37: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`
at line 37, The ssh invocation is vulnerable to word-splitting and injection
because ${options} and ${user}@${host} are unquoted; change options to an array
(e.g., options=()) and expand it as "${options[@]}", and quote the identity and
remote target and command as: -i "${sshkey}" "${user}@${host}" "${remote_cmd}"
so ssh receives each argument atomically; update the assignment to output to use
the array expansion and quoted variables (output=$(ssh "${options[@]}" -i
"${sshkey}" "${user}@${host}" "${remote_cmd}" 2>&1)) and adjust any code that
builds options to push elements into the options array rather than concatenating
a string.

Comment on lines +162 to +169
if [[ -f "${SHARED_DIR}/bastion_private_address" ]]; then
BASTION_PRIVATE_IP=$(cat "${SHARED_DIR}/bastion_private_address")
log "Testing internal network by pinging private IP: ${BASTION_PRIVATE_IP}"

if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
"ping -c 3 ${BASTION_PRIVATE_IP}" "Test internal network"; then
log "✓ Internal network is accessible from bastion"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Current “internal network” check is not meaningful.

Line 167 pings the bastion’s own private IP from the bastion, which can pass even when broader internal connectivity is broken. This can mislead triage output.

Suggested adjustment
-        log "Testing internal network by pinging private IP: ${BASTION_PRIVATE_IP}"
+        log "Internal network fallback check should target a distinct internal endpoint (not bastion self IP)"
@@
-            "ping -c 3 ${BASTION_PRIVATE_IP}" "Test internal network"; then
+            "ping -c 3 168.63.129.16" "Test Azure fabric endpoint reachability"; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ -f "${SHARED_DIR}/bastion_private_address" ]]; then
BASTION_PRIVATE_IP=$(cat "${SHARED_DIR}/bastion_private_address")
log "Testing internal network by pinging private IP: ${BASTION_PRIVATE_IP}"
if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
"ping -c 3 ${BASTION_PRIVATE_IP}" "Test internal network"; then
log "✓ Internal network is accessible from bastion"
fi
if [[ -f "${SHARED_DIR}/bastion_private_address" ]]; then
BASTION_PRIVATE_IP=$(cat "${SHARED_DIR}/bastion_private_address")
log "Internal network fallback check should target a distinct internal endpoint (not bastion self IP)"
if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
"ping -c 3 168.63.129.16" "Test Azure fabric endpoint reachability"; then
log "✓ Internal network is accessible from bastion"
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`
around lines 162 - 169, The current check pings the bastion's own private IP
(BASTION_PRIVATE_IP) which always succeeds; change the check to ping a different
internal host to validate internal connectivity: retrieve a distinct internal
target (e.g., CLUSTER_INTERNAL_IP or MASTER_PRIVATE_IP) from the shared data,
fall back to skipping the test with a clear log if that target is absent, then
call run_ssh_cmd (using SSH_PRIV_KEY_PATH, BASTION_SSH_USER, BASTION_PUBLIC_IP)
to run "ping -c 3 <internal_target>" and treat failure as a real connectivity
error (log and return non-zero) while keeping the success log "✓ Internal
network is accessible from bastion". Ensure you reference and use the existing
variables BASTION_PRIVATE_IP only for informational logging and do not ping it
as the connectivity probe.

[[ $- == *x* ]] && WAS_TRACING=true || WAS_TRACING=false
set +x

output=$(ssh ${options} -i "${sshkey}" ${user}@${host} "${remote_cmd}" 2>&1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

head -60 ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh | tail -40

Repository: openshift/release

Length of output: 1265


🏁 Script executed:

cat ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh

Repository: openshift/release

Length of output: 7733


Quote SSH arguments safely to prevent word splitting and injection.

Line 44 uses unquoted ${user}@${host} in the SSH command. Since these values are sourced from external files (${SHARED_DIR}/bastion_ssh_user and ${SHARED_DIR}/bastion_public_address), any spaces or special characters in those files will cause word splitting, breaking the SSH connection string. Apply the suggested fix to quote the host parameter and convert options to an array for defensive programming:

Suggested fix
-    options="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ServerAliveInterval=30 -o ServerAliveCountMax=5 -o ConnectTimeout=10"
+    local -a options=(
+        -o StrictHostKeyChecking=no
+        -o UserKnownHostsFile=/dev/null
+        -o ServerAliveInterval=30
+        -o ServerAliveCountMax=5
+        -o ConnectTimeout=10
+    )
@@
-    output=$(ssh ${options} -i "${sshkey}" ${user}@${host} "${remote_cmd}" 2>&1)
+    output=$(ssh "${options[@]}" -i "${sshkey}" "${user}@${host}" "${remote_cmd}" 2>&1)
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 44-44: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 44-44: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 44-44: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`
at line 44, The SSH invocation is vulnerable to word-splitting because ${user}
and ${host} are unquoted and ${options} is a plain string; update the call in
the ssh command (where variables options, sshkey, user, host, remote_cmd are
used) to quote the remote target as "${user}@${host}" and convert options into
an array (e.g. options_array and expand as "${options_array[@]}") so each option
is preserved; ensure the identity key and remote command remain quoted as
"${sshkey}" and "${remote_cmd}" respectively and consider adding a "--" before
the remote command to avoid option parsing issues.

Comment on lines +136 to +138
if [[ -f "${SHARED_DIR}/api.${CLUSTER_NAME}.${BASE_DOMAIN}" ]]; then
CLUSTER_API=$(cat "${SHARED_DIR}/api.${CLUSTER_NAME}.${BASE_DOMAIN}")
elif [[ -f "${SHARED_DIR}/metadata.json" ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard against unbound variables with nounset.

Line 136 expands CLUSTER_NAME/BASE_DOMAIN before they are guaranteed to exist. With set -o nounset, this can terminate the script early.

Suggested fix
-if [[ -f "${SHARED_DIR}/api.${CLUSTER_NAME}.${BASE_DOMAIN}" ]]; then
+if [[ -n "${CLUSTER_NAME:-}" && -n "${BASE_DOMAIN:-}" && -f "${SHARED_DIR}/api.${CLUSTER_NAME}.${BASE_DOMAIN}" ]]; then
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`
around lines 136 - 138, The code expands CLUSTER_NAME and BASE_DOMAIN directly
which breaks under set -o nounset; update the check to guard against unset
variables before using them — for example, first ensure CLUSTER_NAME and
BASE_DOMAIN are set (e.g. test -n "${CLUSTER_NAME:-}" && test -n
"${BASE_DOMAIN:-}") or use safe parameter expansion like
"${SHARED_DIR}/api.${CLUSTER_NAME:-}.${BASE_DOMAIN:-}" when testing the file,
then fallback to checking "${SHARED_DIR}/metadata.json"; modify the if that
references SHARED_DIR, CLUSTER_NAME and BASE_DOMAIN so it won't expand undefined
variables (symbols: CLUSTER_NAME, BASE_DOMAIN, SHARED_DIR, metadata.json).

Comment on lines +138 to +143
elif [[ -f "${SHARED_DIR}/metadata.json" ]]; then
# Extract from metadata if available
CLUSTER_NAME=$(jq -r '.infraID' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "")
BASE_DOMAIN=$(jq -r '.azure.baseDomainResourceGroupName' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "")
if [[ -n "${CLUSTER_NAME}" ]]; then
CLUSTER_API="api.${CLUSTER_NAME}.${BASE_DOMAIN}:6443"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a GCP-appropriate source for BASE_DOMAIN.

Line 141 reads an Azure-specific metadata key in the GCP validator, which can produce an invalid CLUSTER_API and skip/misreport connectivity validation.

Suggested fix
-    BASE_DOMAIN=$(jq -r '.azure.baseDomainResourceGroupName' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "")
+    # For GCP, prefer install-config baseDomain when available
+    if [[ -f "${SHARED_DIR}/install-config.yaml" ]]; then
+        BASE_DOMAIN=$(yq-go r "${SHARED_DIR}/install-config.yaml" 'baseDomain' 2>/dev/null || echo "")
+    else
+        BASE_DOMAIN=""
+    fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif [[ -f "${SHARED_DIR}/metadata.json" ]]; then
# Extract from metadata if available
CLUSTER_NAME=$(jq -r '.infraID' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "")
BASE_DOMAIN=$(jq -r '.azure.baseDomainResourceGroupName' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "")
if [[ -n "${CLUSTER_NAME}" ]]; then
CLUSTER_API="api.${CLUSTER_NAME}.${BASE_DOMAIN}:6443"
elif [[ -f "${SHARED_DIR}/metadata.json" ]]; then
# Extract from metadata if available
CLUSTER_NAME=$(jq -r '.infraID' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "")
# For GCP, prefer install-config baseDomain when available
if [[ -f "${SHARED_DIR}/install-config.yaml" ]]; then
BASE_DOMAIN=$(yq-go r "${SHARED_DIR}/install-config.yaml" 'baseDomain' 2>/dev/null || echo "")
else
BASE_DOMAIN=""
fi
if [[ -n "${CLUSTER_NAME}" ]]; then
CLUSTER_API="api.${CLUSTER_NAME}.${BASE_DOMAIN}:6443"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`
around lines 138 - 143, The script is reading an Azure-specific metadata key for
BASE_DOMAIN; update the jq path to use the GCP/installer-appropriate key (e.g.,
read '.baseDomain' from "${SHARED_DIR}/metadata.json") and keep the same
error-suppression/fallback logic so BASE_DOMAIN is correctly populated for GCP.
Modify the lines that set BASE_DOMAIN (and consequently CLUSTER_API) to use jq
-r '.baseDomain' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "" and ensure
CLUSTER_NAME, BASE_DOMAIN and CLUSTER_API variables are validated the same way
as before.

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jluhrsen
Once this PR has been reviewed and has the lgtm label, please assign patrickdillon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jluhrsen
Copy link
Copy Markdown
Contributor Author

jluhrsen commented May 6, 2026

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-gcp-ovn
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-azure-ovn

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@jluhrsen: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

The OWNERS file contains untrusted users, which makes it INVALID. The following users are mentioned in OWNERS file(s) but are untrusted for the following reasons. One way to make the user trusted is to add them as members of the openshift org. You can then trigger verification by writing /verify-owners in a comment.

  • MayXuQQ
    • User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.
  • jianlinliu
    • User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.
    • ci-operator/step-registry/gcp/validate/bastionhost/OWNERS
  • jinyunma
    • User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.
  • gpei
    • User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.
  • jianli-wei
    • User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.
  • yunjiang29
    • User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@jluhrsen: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-cert-manager-operator-master-e2e-operator-azure-ovn openshift/cert-manager-operator presubmit Registry content changed
pull-ci-openshift-cert-manager-operator-cert-manager-1.19-e2e-operator-azure-ovn openshift/cert-manager-operator presubmit Registry content changed
pull-ci-openshift-cert-manager-operator-master-e2e-operator-gcp-ovn openshift/cert-manager-operator presubmit Registry content changed
pull-ci-openshift-cert-manager-operator-cert-manager-1.19-e2e-operator-gcp-ovn openshift/cert-manager-operator presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-master-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-5.1-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-5.0-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.23-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.21-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.20-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.19-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.18-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.17-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.16-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.15-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.14-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.13-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.12-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.11-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.9-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.8-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.7-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.6-e2e-azure-ovn openshift/ovn-kubernetes presubmit Registry content changed

A total of 361 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@jluhrsen: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`:
- Around line 148-150: The remote curl pipeline can return the exit status of
head instead of curl; update the run_ssh_cmd invocation (the call that uses
SSH_PRIV_KEY_PATH, BASTION_SSH_USER, BASTION_PUBLIC_IP and CLUSTER_API) so the
remote shell captures curl's exit code before piping and then exits with that
code — for example, wrap the remote command in sh -c and run curl with
redirection to a temporary file or capture its exit code into a variable
(rc=$?), emit the first 20 lines (head) from that output, clean up, and finally
exit $rc so the SSH command reflects the real curl result.

In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`:
- Around line 151-153: The pipeline that probes the cluster API using curl
inside run_ssh_cmd ("curl -k ... | head -20") loses curl's exit status to head;
modify the probe invoked in run_ssh_cmd so curl's failure is propagated — either
enable pipefail in the remote shell before running curl (set -o pipefail) or
have curl write to a temp file, cat the first 20 lines for logging, then exit
with curl's exit code; update the command string passed to run_ssh_cmd (the
cluster API probe invocation) accordingly so connectivity failures are not
masked.

In
`@ci-operator/step-registry/openshift/e2e/azure/ovn/openshift-e2e-azure-ovn-workflow.yaml`:
- Around line 7-8: The azure-validate-bastionhost ref currently runs before
cluster installation (it appears right after chain:
azure-provision-bastionhost), which causes connectivity checks from the bastion
to the cluster to run too early; edit the workflow so that the
azure-validate-bastionhost step is moved to run after the ipi-install step
(i.e., place the ref to azure-validate-bastionhost after the ipi-install entry)
ensuring validation executes against a deployed cluster. Make sure to preserve
existing dependencies and ordering around azure-provision-bastionhost and
ipi-install when relocating the validation step.

In
`@ci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/openshift-e2e-gcp-custom-endpoints-workflow.yaml`:
- Line 10: The step referencing gcp-validate-bastionhost is scheduled too early
(before cluster install) so its Check 4 (bastion-to-cluster connectivity) cannot
run correctly; either move the ref to run after the install completes or split
the validation into two steps—keep SSH/runtime checks here
(gcp-validate-bastionhost-runtime) and add a new post-install validation step
(e.g., gcp-validate-bastionhost-postinstall) that runs after the cluster install
and performs Check 4/cluster API probes—update the workflow to call the
appropriate validator refs in the correct sequence.

In
`@ci-operator/step-registry/openshift/e2e/gcp/ovn/openshift-e2e-gcp-ovn-workflow.yaml`:
- Around line 7-9: The pre-install validator step (ref:
gcp-validate-bastionhost) currently runs before chain: ipi-install so its
cluster API connectivity check cannot reach an installed cluster; update the
workflow to either move gcp-validate-bastionhost to run after chain: ipi-install
or split gcp-validate-bastionhost into two steps (e.g.,
gcp-validate-bastionhost-pre for SSH/runtime checks kept in pre and
gcp-validate-bastionhost-post for cluster API/cluster connectivity checks placed
after ipi-install) and adjust references to ignition-bastionhost and
gcp-provision-bastionhost accordingly so only runtime/SSH validation remains in
pre-install.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6ad07ec6-fe58-40f2-9ee4-9c3b9c85ff24

📥 Commits

Reviewing files that changed from the base of the PR and between 82b3610 and 6cb5615.

📒 Files selected for processing (12)
  • ci-operator/step-registry/azure/validate/bastionhost/OWNERS
  • ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh
  • ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.metadata.json
  • ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.yaml
  • ci-operator/step-registry/gcp/validate/bastionhost/OWNERS
  • ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh
  • ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.metadata.json
  • ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.yaml
  • ci-operator/step-registry/openshift/e2e/azure/csi/file/cross-subscription/openshift-e2e-azure-csi-file-cross-subscription-workflow.yaml
  • ci-operator/step-registry/openshift/e2e/azure/ovn/openshift-e2e-azure-ovn-workflow.yaml
  • ci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/openshift-e2e-gcp-custom-endpoints-workflow.yaml
  • ci-operator/step-registry/openshift/e2e/gcp/ovn/openshift-e2e-gcp-ovn-workflow.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci-operator/step-registry/openshift/e2e/azure/csi/file/cross-subscription/openshift-e2e-azure-csi-file-cross-subscription-workflow.yaml

Comment on lines +148 to +150
if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
"curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz 2>&1 | head -20" \
"Test cluster API connectivity"; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
sed -n '144,151p' ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh
printf '\nBash pipeline status demo:\n'
bash -lc 'false 2>&1 | head -20 >/dev/null; printf "pipeline_exit=%s\n" "$?"'

Repository: openshift/release

Length of output: 728


🏁 Script executed:

# Check if the script uses set -o pipefail globally
head -50 ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh

# Find the run_ssh_cmd function definition to understand how it handles exit codes
rg "run_ssh_cmd" ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh -A 10 -B 5

Repository: openshift/release

Length of output: 7050


The Azure API connectivity probe's remote pipeline can falsely pass.

Lines 148-150 execute a remote command via SSH: curl ... | head -20. Although the local script has set -o pipefail, this setting does not apply to the remote shell on the bastion host where the pipeline actually runs. Without pipefail enabled on the remote side, the SSH command returns the exit status of head (the last command), not curl. A failed curl connection that produces some output will still exit 0 here, falsely logging successful network connectivity.

The proposed fix correctly captures curl's exit code before piping:

Possible fix
-    if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
-        "curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz 2>&1 | head -20" \
-        "Test cluster API connectivity"; then
+    if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
+        "tmp=\$(mktemp) && curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz >\"\${tmp}\" 2>&1; rc=\$?; head -20 \"\${tmp}\"; rm -f \"\${tmp}\"; exit \${rc}" \
+        "Test cluster API connectivity"; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
"curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz 2>&1 | head -20" \
"Test cluster API connectivity"; then
if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
"tmp=\$(mktemp) && curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz >\"\${tmp}\" 2>&1; rc=\$?; head -20 \"\${tmp}\"; rm -f \"\${tmp}\"; exit \${rc}" \
"Test cluster API connectivity"; then
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`
around lines 148 - 150, The remote curl pipeline can return the exit status of
head instead of curl; update the run_ssh_cmd invocation (the call that uses
SSH_PRIV_KEY_PATH, BASTION_SSH_USER, BASTION_PUBLIC_IP and CLUSTER_API) so the
remote shell captures curl's exit code before piping and then exits with that
code — for example, wrap the remote command in sh -c and run curl with
redirection to a temporary file or capture its exit code into a variable
(rc=$?), emit the first 20 lines (head) from that output, clean up, and finally
exit $rc so the SSH command reflects the real curl result.

Comment on lines +151 to +153
if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
"curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz 2>&1 | head -20" \
"Test cluster API connectivity"; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
sed -n '147,154p' ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh
printf '\nBash pipeline status demo:\n'
bash -lc 'false 2>&1 | head -20 >/dev/null; printf "pipeline_exit=%s\n" "$?"'

Repository: openshift/release

Length of output: 730


Preserve curl's exit status in the cluster API probe.

Lines 151-153 pipe curl into head -20, so the pipeline status is determined by head's exit code rather than curl's. This means a curl failure (DNS resolution, TCP timeout, TLS error) that produces output will be masked—head reads the lines successfully and the pipeline returns 0, falsely reporting connectivity success.

Consider using set -o pipefail (if available in the remote shell context) or capturing curl output to a temp file, displaying the first 20 lines, then explicitly exiting with curl's status.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`
around lines 151 - 153, The pipeline that probes the cluster API using curl
inside run_ssh_cmd ("curl -k ... | head -20") loses curl's exit status to head;
modify the probe invoked in run_ssh_cmd so curl's failure is propagated — either
enable pipefail in the remote shell before running curl (set -o pipefail) or
have curl write to a temp file, cat the first 20 lines for logging, then exit
with curl's exit code; update the command string passed to run_ssh_cmd (the
cluster API probe invocation) accordingly so connectivity failures are not
masked.

Comment on lines +7 to +8
- chain: azure-provision-bastionhost
- ref: azure-validate-bastionhost
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move bastion validation after cluster install.

On Line 8, azure-validate-bastionhost runs before ipi-install (Line 11), but the validation scope includes connectivity from bastion to the installed cluster. This makes that check invalid (or flaky/failing) at this point in the workflow.

Suggested workflow ordering
     pre:
     - chain: ipi-conf-azure
     - chain: azure-provision-bastionhost
-    - ref: azure-validate-bastionhost
     - ref: ovn-conf
     - ref: rhcos-conf-osstream
     - chain: ipi-install
     - ref: ipi-azure-rbac
+    - ref: azure-validate-bastionhost
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/e2e/azure/ovn/openshift-e2e-azure-ovn-workflow.yaml`
around lines 7 - 8, The azure-validate-bastionhost ref currently runs before
cluster installation (it appears right after chain:
azure-provision-bastionhost), which causes connectivity checks from the bastion
to the cluster to run too early; edit the workflow so that the
azure-validate-bastionhost step is moved to run after the ipi-install step
(i.e., place the ref to azure-validate-bastionhost after the ipi-install entry)
ensuring validation executes against a deployed cluster. Make sure to preserve
existing dependencies and ordering around azure-provision-bastionhost and
ipi-install when relocating the validation step.

- ref: gcp-provision-custom-endpoints
- ref: ignition-bastionhost
- ref: gcp-provision-bastionhost
- ref: gcp-validate-bastionhost
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This runs too early to validate bastion-to-cluster connectivity.

At this point in the workflow the cluster has not been configured/installed yet, so the new validator can only exercise SSH/runtime checks and will skip or warn on the cluster API probe. If Check 4 is part of the contract for this step, it needs to run after install, or be split into a later post-install validation step.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/openshift-e2e-gcp-custom-endpoints-workflow.yaml`
at line 10, The step referencing gcp-validate-bastionhost is scheduled too early
(before cluster install) so its Check 4 (bastion-to-cluster connectivity) cannot
run correctly; either move the ref to run after the install completes or split
the validation into two steps—keep SSH/runtime checks here
(gcp-validate-bastionhost-runtime) and add a new post-install validation step
(e.g., gcp-validate-bastionhost-postinstall) that runs after the cluster install
and performs Check 4/cluster API probes—update the workflow to call the
appropriate validator refs in the correct sequence.

Comment on lines +7 to +9
- ref: ignition-bastionhost
- ref: gcp-provision-bastionhost
- ref: gcp-validate-bastionhost
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Pre-install placement prevents the cluster connectivity check from doing real validation.

This validator now runs before chain: ipi-install, so its cluster API check cannot verify reachability to the installed cluster in this workflow. If that check is intended to gate bastion readiness, move it later or split the step so only SSH/runtime validation stays in pre.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/e2e/gcp/ovn/openshift-e2e-gcp-ovn-workflow.yaml`
around lines 7 - 9, The pre-install validator step (ref:
gcp-validate-bastionhost) currently runs before chain: ipi-install so its
cluster API connectivity check cannot reach an installed cluster; update the
workflow to either move gcp-validate-bastionhost to run after chain: ipi-install
or split gcp-validate-bastionhost into two steps (e.g.,
gcp-validate-bastionhost-pre for SSH/runtime checks kept in pre and
gcp-validate-bastionhost-post for cluster API/cluster connectivity checks placed
after ipi-install) and adjust references to ignition-bastionhost and
gcp-provision-bastionhost accordingly so only runtime/SSH validation remains in
pre-install.

@jluhrsen
Copy link
Copy Markdown
Contributor Author

jluhrsen commented May 6, 2026

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-azure-ovn

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@jluhrsen: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/ovn-kubernetes/release-4.22/e2e-gcp-ovn 6cb5615 link unknown /pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-gcp-ovn
ci/prow/owners 6cb5615 link true /test owners
ci/rehearse/openshift/ovn-kubernetes/release-4.22/e2e-azure-ovn 6cb5615 link unknown /pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-azure-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant