DNM/TEST: checking bastion hosts for gcp/azure#78883
DNM/TEST: checking bastion hosts for gcp/azure#78883jluhrsen wants to merge 1 commit intoopenshift:mainfrom
Conversation
WalkthroughAdds Azure and GCP bastion-host validation steps: new validation scripts, step-ref YAML and metadata, OWNERS, and inserts the validation steps into several OpenShift E2E workflows; scripts perform SSH, container runtime, and cluster-network checks and emit artifacts with results. ChangesAzure Bastion Validation
GCP Bastion Validation
Sequence Diagram(s)sequenceDiagram
autonumber
participant CI as CI Job
participant Shared as Shared Data (bastion creds)
participant Bastion as Bastion Host (SSH)
participant Container as Container Runtime (podman/docker)
participant Cluster as Cluster API
rect rgba(0,128,255,0.5)
CI->>Shared: read bastion host/user/key
end
CI->>Bastion: SSH connect and run checks
Bastion->>Container: detect & run test container
CI->>Cluster: validate API reachability via Bastion (proxy/ssh)
CI->>CI: collect artifacts (logs, results)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-aws-ovn |
|
@jluhrsen: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jluhrsen: job(s): pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-aws-ovn either don't exist or were not found to be affected, and cannot be rehearsed |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`:
- Around line 162-169: The current check pings the bastion's own private IP
(BASTION_PRIVATE_IP) which always succeeds; change the check to ping a different
internal host to validate internal connectivity: retrieve a distinct internal
target (e.g., CLUSTER_INTERNAL_IP or MASTER_PRIVATE_IP) from the shared data,
fall back to skipping the test with a clear log if that target is absent, then
call run_ssh_cmd (using SSH_PRIV_KEY_PATH, BASTION_SSH_USER, BASTION_PUBLIC_IP)
to run "ping -c 3 <internal_target>" and treat failure as a real connectivity
error (log and return non-zero) while keeping the success log "✓ Internal
network is accessible from bastion". Ensure you reference and use the existing
variables BASTION_PRIVATE_IP only for informational logging and do not ping it
as the connectivity probe.
- Line 37: The ssh invocation is vulnerable to word-splitting and injection
because ${options} and ${user}@${host} are unquoted; change options to an array
(e.g., options=()) and expand it as "${options[@]}", and quote the identity and
remote target and command as: -i "${sshkey}" "${user}@${host}" "${remote_cmd}"
so ssh receives each argument atomically; update the assignment to output to use
the array expansion and quoted variables (output=$(ssh "${options[@]}" -i
"${sshkey}" "${user}@${host}" "${remote_cmd}" 2>&1)) and adjust any code that
builds options to push elements into the options array rather than concatenating
a string.
In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`:
- Around line 136-138: The code expands CLUSTER_NAME and BASE_DOMAIN directly
which breaks under set -o nounset; update the check to guard against unset
variables before using them — for example, first ensure CLUSTER_NAME and
BASE_DOMAIN are set (e.g. test -n "${CLUSTER_NAME:-}" && test -n
"${BASE_DOMAIN:-}") or use safe parameter expansion like
"${SHARED_DIR}/api.${CLUSTER_NAME:-}.${BASE_DOMAIN:-}" when testing the file,
then fallback to checking "${SHARED_DIR}/metadata.json"; modify the if that
references SHARED_DIR, CLUSTER_NAME and BASE_DOMAIN so it won't expand undefined
variables (symbols: CLUSTER_NAME, BASE_DOMAIN, SHARED_DIR, metadata.json).
- Around line 138-143: The script is reading an Azure-specific metadata key for
BASE_DOMAIN; update the jq path to use the GCP/installer-appropriate key (e.g.,
read '.baseDomain' from "${SHARED_DIR}/metadata.json") and keep the same
error-suppression/fallback logic so BASE_DOMAIN is correctly populated for GCP.
Modify the lines that set BASE_DOMAIN (and consequently CLUSTER_API) to use jq
-r '.baseDomain' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "" and ensure
CLUSTER_NAME, BASE_DOMAIN and CLUSTER_API variables are validated the same way
as before.
- Line 44: The SSH invocation is vulnerable to word-splitting because ${user}
and ${host} are unquoted and ${options} is a plain string; update the call in
the ssh command (where variables options, sshkey, user, host, remote_cmd are
used) to quote the remote target as "${user}@${host}" and convert options into
an array (e.g. options_array and expand as "${options_array[@]}") so each option
is preserved; ensure the identity key and remote command remain quoted as
"${sshkey}" and "${remote_cmd}" respectively and consider adding a "--" before
the remote command to avoid option parsing issues.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5a7a12ea-4b63-45b3-8e52-55447d0e6d1d
📒 Files selected for processing (10)
ci-operator/step-registry/azure/validate/bastionhost/OWNERSci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.shci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.metadata.jsonci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.yamlci-operator/step-registry/gcp/validate/bastionhost/OWNERSci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.shci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.metadata.jsonci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.yamlci-operator/step-registry/openshift/e2e/azure/csi/file/cross-subscription/openshift-e2e-azure-csi-file-cross-subscription-workflow.yamlci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/openshift-e2e-gcp-custom-endpoints-workflow.yaml
| [[ $- == *x* ]] && WAS_TRACING=true || WAS_TRACING=false | ||
| set +x | ||
|
|
||
| output=$(ssh ${options} -i "${sshkey}" ${user}@${host} "${remote_cmd}" 2>&1) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and read the file
fd -type f -name "*azure-validate-bastionhost-commands.sh"Repository: openshift/release
Length of output: 232
🏁 Script executed:
# Find the file with correct syntax
fd --type f "azure-validate-bastionhost-commands.sh"Repository: openshift/release
Length of output: 153
🏁 Script executed:
# Read the file to examine the SSH command and surrounding context
cat -n "ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh"Repository: openshift/release
Length of output: 9943
Quote SSH arguments safely to avoid splitting/injection.
Line 37 has unquoted ${user}@${host}, which is error-prone with unexpected input values. Additionally, refactor options into an array for idiomatic bash and safer argument handling.
Suggested fix
- options="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ServerAliveInterval=30 -o ServerAliveCountMax=5 -o ConnectTimeout=10"
+ local -a options=(
+ -o StrictHostKeyChecking=no
+ -o UserKnownHostsFile=/dev/null
+ -o ServerAliveInterval=30
+ -o ServerAliveCountMax=5
+ -o ConnectTimeout=10
+ )
@@
- output=$(ssh ${options} -i "${sshkey}" ${user}@${host} "${remote_cmd}" 2>&1)
+ output=$(ssh "${options[@]}" -i "${sshkey}" "${user}@${host}" "${remote_cmd}" 2>&1)🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 37-37: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 37-37: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 37-37: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`
at line 37, The ssh invocation is vulnerable to word-splitting and injection
because ${options} and ${user}@${host} are unquoted; change options to an array
(e.g., options=()) and expand it as "${options[@]}", and quote the identity and
remote target and command as: -i "${sshkey}" "${user}@${host}" "${remote_cmd}"
so ssh receives each argument atomically; update the assignment to output to use
the array expansion and quoted variables (output=$(ssh "${options[@]}" -i
"${sshkey}" "${user}@${host}" "${remote_cmd}" 2>&1)) and adjust any code that
builds options to push elements into the options array rather than concatenating
a string.
| if [[ -f "${SHARED_DIR}/bastion_private_address" ]]; then | ||
| BASTION_PRIVATE_IP=$(cat "${SHARED_DIR}/bastion_private_address") | ||
| log "Testing internal network by pinging private IP: ${BASTION_PRIVATE_IP}" | ||
|
|
||
| if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \ | ||
| "ping -c 3 ${BASTION_PRIVATE_IP}" "Test internal network"; then | ||
| log "✓ Internal network is accessible from bastion" | ||
| fi |
There was a problem hiding this comment.
Current “internal network” check is not meaningful.
Line 167 pings the bastion’s own private IP from the bastion, which can pass even when broader internal connectivity is broken. This can mislead triage output.
Suggested adjustment
- log "Testing internal network by pinging private IP: ${BASTION_PRIVATE_IP}"
+ log "Internal network fallback check should target a distinct internal endpoint (not bastion self IP)"
@@
- "ping -c 3 ${BASTION_PRIVATE_IP}" "Test internal network"; then
+ "ping -c 3 168.63.129.16" "Test Azure fabric endpoint reachability"; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ -f "${SHARED_DIR}/bastion_private_address" ]]; then | |
| BASTION_PRIVATE_IP=$(cat "${SHARED_DIR}/bastion_private_address") | |
| log "Testing internal network by pinging private IP: ${BASTION_PRIVATE_IP}" | |
| if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \ | |
| "ping -c 3 ${BASTION_PRIVATE_IP}" "Test internal network"; then | |
| log "✓ Internal network is accessible from bastion" | |
| fi | |
| if [[ -f "${SHARED_DIR}/bastion_private_address" ]]; then | |
| BASTION_PRIVATE_IP=$(cat "${SHARED_DIR}/bastion_private_address") | |
| log "Internal network fallback check should target a distinct internal endpoint (not bastion self IP)" | |
| if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \ | |
| "ping -c 3 168.63.129.16" "Test Azure fabric endpoint reachability"; then | |
| log "✓ Internal network is accessible from bastion" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`
around lines 162 - 169, The current check pings the bastion's own private IP
(BASTION_PRIVATE_IP) which always succeeds; change the check to ping a different
internal host to validate internal connectivity: retrieve a distinct internal
target (e.g., CLUSTER_INTERNAL_IP or MASTER_PRIVATE_IP) from the shared data,
fall back to skipping the test with a clear log if that target is absent, then
call run_ssh_cmd (using SSH_PRIV_KEY_PATH, BASTION_SSH_USER, BASTION_PUBLIC_IP)
to run "ping -c 3 <internal_target>" and treat failure as a real connectivity
error (log and return non-zero) while keeping the success log "✓ Internal
network is accessible from bastion". Ensure you reference and use the existing
variables BASTION_PRIVATE_IP only for informational logging and do not ping it
as the connectivity probe.
| [[ $- == *x* ]] && WAS_TRACING=true || WAS_TRACING=false | ||
| set +x | ||
|
|
||
| output=$(ssh ${options} -i "${sshkey}" ${user}@${host} "${remote_cmd}" 2>&1) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -60 ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh | tail -40Repository: openshift/release
Length of output: 1265
🏁 Script executed:
cat ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.shRepository: openshift/release
Length of output: 7733
Quote SSH arguments safely to prevent word splitting and injection.
Line 44 uses unquoted ${user}@${host} in the SSH command. Since these values are sourced from external files (${SHARED_DIR}/bastion_ssh_user and ${SHARED_DIR}/bastion_public_address), any spaces or special characters in those files will cause word splitting, breaking the SSH connection string. Apply the suggested fix to quote the host parameter and convert options to an array for defensive programming:
Suggested fix
- options="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ServerAliveInterval=30 -o ServerAliveCountMax=5 -o ConnectTimeout=10"
+ local -a options=(
+ -o StrictHostKeyChecking=no
+ -o UserKnownHostsFile=/dev/null
+ -o ServerAliveInterval=30
+ -o ServerAliveCountMax=5
+ -o ConnectTimeout=10
+ )
@@
- output=$(ssh ${options} -i "${sshkey}" ${user}@${host} "${remote_cmd}" 2>&1)
+ output=$(ssh "${options[@]}" -i "${sshkey}" "${user}@${host}" "${remote_cmd}" 2>&1)🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 44-44: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 44-44: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 44-44: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`
at line 44, The SSH invocation is vulnerable to word-splitting because ${user}
and ${host} are unquoted and ${options} is a plain string; update the call in
the ssh command (where variables options, sshkey, user, host, remote_cmd are
used) to quote the remote target as "${user}@${host}" and convert options into
an array (e.g. options_array and expand as "${options_array[@]}") so each option
is preserved; ensure the identity key and remote command remain quoted as
"${sshkey}" and "${remote_cmd}" respectively and consider adding a "--" before
the remote command to avoid option parsing issues.
| if [[ -f "${SHARED_DIR}/api.${CLUSTER_NAME}.${BASE_DOMAIN}" ]]; then | ||
| CLUSTER_API=$(cat "${SHARED_DIR}/api.${CLUSTER_NAME}.${BASE_DOMAIN}") | ||
| elif [[ -f "${SHARED_DIR}/metadata.json" ]]; then |
There was a problem hiding this comment.
Guard against unbound variables with nounset.
Line 136 expands CLUSTER_NAME/BASE_DOMAIN before they are guaranteed to exist. With set -o nounset, this can terminate the script early.
Suggested fix
-if [[ -f "${SHARED_DIR}/api.${CLUSTER_NAME}.${BASE_DOMAIN}" ]]; then
+if [[ -n "${CLUSTER_NAME:-}" && -n "${BASE_DOMAIN:-}" && -f "${SHARED_DIR}/api.${CLUSTER_NAME}.${BASE_DOMAIN}" ]]; then🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`
around lines 136 - 138, The code expands CLUSTER_NAME and BASE_DOMAIN directly
which breaks under set -o nounset; update the check to guard against unset
variables before using them — for example, first ensure CLUSTER_NAME and
BASE_DOMAIN are set (e.g. test -n "${CLUSTER_NAME:-}" && test -n
"${BASE_DOMAIN:-}") or use safe parameter expansion like
"${SHARED_DIR}/api.${CLUSTER_NAME:-}.${BASE_DOMAIN:-}" when testing the file,
then fallback to checking "${SHARED_DIR}/metadata.json"; modify the if that
references SHARED_DIR, CLUSTER_NAME and BASE_DOMAIN so it won't expand undefined
variables (symbols: CLUSTER_NAME, BASE_DOMAIN, SHARED_DIR, metadata.json).
| elif [[ -f "${SHARED_DIR}/metadata.json" ]]; then | ||
| # Extract from metadata if available | ||
| CLUSTER_NAME=$(jq -r '.infraID' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "") | ||
| BASE_DOMAIN=$(jq -r '.azure.baseDomainResourceGroupName' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "") | ||
| if [[ -n "${CLUSTER_NAME}" ]]; then | ||
| CLUSTER_API="api.${CLUSTER_NAME}.${BASE_DOMAIN}:6443" |
There was a problem hiding this comment.
Use a GCP-appropriate source for BASE_DOMAIN.
Line 141 reads an Azure-specific metadata key in the GCP validator, which can produce an invalid CLUSTER_API and skip/misreport connectivity validation.
Suggested fix
- BASE_DOMAIN=$(jq -r '.azure.baseDomainResourceGroupName' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "")
+ # For GCP, prefer install-config baseDomain when available
+ if [[ -f "${SHARED_DIR}/install-config.yaml" ]]; then
+ BASE_DOMAIN=$(yq-go r "${SHARED_DIR}/install-config.yaml" 'baseDomain' 2>/dev/null || echo "")
+ else
+ BASE_DOMAIN=""
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif [[ -f "${SHARED_DIR}/metadata.json" ]]; then | |
| # Extract from metadata if available | |
| CLUSTER_NAME=$(jq -r '.infraID' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "") | |
| BASE_DOMAIN=$(jq -r '.azure.baseDomainResourceGroupName' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "") | |
| if [[ -n "${CLUSTER_NAME}" ]]; then | |
| CLUSTER_API="api.${CLUSTER_NAME}.${BASE_DOMAIN}:6443" | |
| elif [[ -f "${SHARED_DIR}/metadata.json" ]]; then | |
| # Extract from metadata if available | |
| CLUSTER_NAME=$(jq -r '.infraID' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "") | |
| # For GCP, prefer install-config baseDomain when available | |
| if [[ -f "${SHARED_DIR}/install-config.yaml" ]]; then | |
| BASE_DOMAIN=$(yq-go r "${SHARED_DIR}/install-config.yaml" 'baseDomain' 2>/dev/null || echo "") | |
| else | |
| BASE_DOMAIN="" | |
| fi | |
| if [[ -n "${CLUSTER_NAME}" ]]; then | |
| CLUSTER_API="api.${CLUSTER_NAME}.${BASE_DOMAIN}:6443" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`
around lines 138 - 143, The script is reading an Azure-specific metadata key for
BASE_DOMAIN; update the jq path to use the GCP/installer-appropriate key (e.g.,
read '.baseDomain' from "${SHARED_DIR}/metadata.json") and keep the same
error-suppression/fallback logic so BASE_DOMAIN is correctly populated for GCP.
Modify the lines that set BASE_DOMAIN (and consequently CLUSTER_API) to use jq
-r '.baseDomain' "${SHARED_DIR}/metadata.json" 2>/dev/null || echo "" and ensure
CLUSTER_NAME, BASE_DOMAIN and CLUSTER_API variables are validated the same way
as before.
Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jluhrsen The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-gcp-ovn |
|
@jluhrsen: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
The OWNERS file contains untrusted users, which makes it INVALID. The following users are mentioned in OWNERS file(s) but are untrusted for the following reasons. One way to make the user trusted is to add them as members of the openshift org. You can then trigger verification by writing
|
|
[REHEARSALNOTIFIER]
A total of 361 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@jluhrsen: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command. |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`:
- Around line 148-150: The remote curl pipeline can return the exit status of
head instead of curl; update the run_ssh_cmd invocation (the call that uses
SSH_PRIV_KEY_PATH, BASTION_SSH_USER, BASTION_PUBLIC_IP and CLUSTER_API) so the
remote shell captures curl's exit code before piping and then exits with that
code — for example, wrap the remote command in sh -c and run curl with
redirection to a temporary file or capture its exit code into a variable
(rc=$?), emit the first 20 lines (head) from that output, clean up, and finally
exit $rc so the SSH command reflects the real curl result.
In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`:
- Around line 151-153: The pipeline that probes the cluster API using curl
inside run_ssh_cmd ("curl -k ... | head -20") loses curl's exit status to head;
modify the probe invoked in run_ssh_cmd so curl's failure is propagated — either
enable pipefail in the remote shell before running curl (set -o pipefail) or
have curl write to a temp file, cat the first 20 lines for logging, then exit
with curl's exit code; update the command string passed to run_ssh_cmd (the
cluster API probe invocation) accordingly so connectivity failures are not
masked.
In
`@ci-operator/step-registry/openshift/e2e/azure/ovn/openshift-e2e-azure-ovn-workflow.yaml`:
- Around line 7-8: The azure-validate-bastionhost ref currently runs before
cluster installation (it appears right after chain:
azure-provision-bastionhost), which causes connectivity checks from the bastion
to the cluster to run too early; edit the workflow so that the
azure-validate-bastionhost step is moved to run after the ipi-install step
(i.e., place the ref to azure-validate-bastionhost after the ipi-install entry)
ensuring validation executes against a deployed cluster. Make sure to preserve
existing dependencies and ordering around azure-provision-bastionhost and
ipi-install when relocating the validation step.
In
`@ci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/openshift-e2e-gcp-custom-endpoints-workflow.yaml`:
- Line 10: The step referencing gcp-validate-bastionhost is scheduled too early
(before cluster install) so its Check 4 (bastion-to-cluster connectivity) cannot
run correctly; either move the ref to run after the install completes or split
the validation into two steps—keep SSH/runtime checks here
(gcp-validate-bastionhost-runtime) and add a new post-install validation step
(e.g., gcp-validate-bastionhost-postinstall) that runs after the cluster install
and performs Check 4/cluster API probes—update the workflow to call the
appropriate validator refs in the correct sequence.
In
`@ci-operator/step-registry/openshift/e2e/gcp/ovn/openshift-e2e-gcp-ovn-workflow.yaml`:
- Around line 7-9: The pre-install validator step (ref:
gcp-validate-bastionhost) currently runs before chain: ipi-install so its
cluster API connectivity check cannot reach an installed cluster; update the
workflow to either move gcp-validate-bastionhost to run after chain: ipi-install
or split gcp-validate-bastionhost into two steps (e.g.,
gcp-validate-bastionhost-pre for SSH/runtime checks kept in pre and
gcp-validate-bastionhost-post for cluster API/cluster connectivity checks placed
after ipi-install) and adjust references to ignition-bastionhost and
gcp-provision-bastionhost accordingly so only runtime/SSH validation remains in
pre-install.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6ad07ec6-fe58-40f2-9ee4-9c3b9c85ff24
📒 Files selected for processing (12)
ci-operator/step-registry/azure/validate/bastionhost/OWNERSci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.shci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.metadata.jsonci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-ref.yamlci-operator/step-registry/gcp/validate/bastionhost/OWNERSci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.shci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.metadata.jsonci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-ref.yamlci-operator/step-registry/openshift/e2e/azure/csi/file/cross-subscription/openshift-e2e-azure-csi-file-cross-subscription-workflow.yamlci-operator/step-registry/openshift/e2e/azure/ovn/openshift-e2e-azure-ovn-workflow.yamlci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/openshift-e2e-gcp-custom-endpoints-workflow.yamlci-operator/step-registry/openshift/e2e/gcp/ovn/openshift-e2e-gcp-ovn-workflow.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- ci-operator/step-registry/openshift/e2e/azure/csi/file/cross-subscription/openshift-e2e-azure-csi-file-cross-subscription-workflow.yaml
| if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \ | ||
| "curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz 2>&1 | head -20" \ | ||
| "Test cluster API connectivity"; then |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
sed -n '144,151p' ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh
printf '\nBash pipeline status demo:\n'
bash -lc 'false 2>&1 | head -20 >/dev/null; printf "pipeline_exit=%s\n" "$?"'Repository: openshift/release
Length of output: 728
🏁 Script executed:
# Check if the script uses set -o pipefail globally
head -50 ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh
# Find the run_ssh_cmd function definition to understand how it handles exit codes
rg "run_ssh_cmd" ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh -A 10 -B 5Repository: openshift/release
Length of output: 7050
The Azure API connectivity probe's remote pipeline can falsely pass.
Lines 148-150 execute a remote command via SSH: curl ... | head -20. Although the local script has set -o pipefail, this setting does not apply to the remote shell on the bastion host where the pipeline actually runs. Without pipefail enabled on the remote side, the SSH command returns the exit status of head (the last command), not curl. A failed curl connection that produces some output will still exit 0 here, falsely logging successful network connectivity.
The proposed fix correctly captures curl's exit code before piping:
Possible fix
- if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
- "curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz 2>&1 | head -20" \
- "Test cluster API connectivity"; then
+ if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \
+ "tmp=\$(mktemp) && curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz >\"\${tmp}\" 2>&1; rc=\$?; head -20 \"\${tmp}\"; rm -f \"\${tmp}\"; exit \${rc}" \
+ "Test cluster API connectivity"; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \ | |
| "curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz 2>&1 | head -20" \ | |
| "Test cluster API connectivity"; then | |
| if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \ | |
| "tmp=\$(mktemp) && curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz >\"\${tmp}\" 2>&1; rc=\$?; head -20 \"\${tmp}\"; rm -f \"\${tmp}\"; exit \${rc}" \ | |
| "Test cluster API connectivity"; then |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/azure/validate/bastionhost/azure-validate-bastionhost-commands.sh`
around lines 148 - 150, The remote curl pipeline can return the exit status of
head instead of curl; update the run_ssh_cmd invocation (the call that uses
SSH_PRIV_KEY_PATH, BASTION_SSH_USER, BASTION_PUBLIC_IP and CLUSTER_API) so the
remote shell captures curl's exit code before piping and then exits with that
code — for example, wrap the remote command in sh -c and run curl with
redirection to a temporary file or capture its exit code into a variable
(rc=$?), emit the first 20 lines (head) from that output, clean up, and finally
exit $rc so the SSH command reflects the real curl result.
| if run_ssh_cmd "${SSH_PRIV_KEY_PATH}" "${BASTION_SSH_USER}" "${BASTION_PUBLIC_IP}" \ | ||
| "curl -k --connect-timeout 10 --max-time 20 -v https://${CLUSTER_API}/healthz 2>&1 | head -20" \ | ||
| "Test cluster API connectivity"; then |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
sed -n '147,154p' ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh
printf '\nBash pipeline status demo:\n'
bash -lc 'false 2>&1 | head -20 >/dev/null; printf "pipeline_exit=%s\n" "$?"'Repository: openshift/release
Length of output: 730
Preserve curl's exit status in the cluster API probe.
Lines 151-153 pipe curl into head -20, so the pipeline status is determined by head's exit code rather than curl's. This means a curl failure (DNS resolution, TCP timeout, TLS error) that produces output will be masked—head reads the lines successfully and the pipeline returns 0, falsely reporting connectivity success.
Consider using set -o pipefail (if available in the remote shell context) or capturing curl output to a temp file, displaying the first 20 lines, then explicitly exiting with curl's status.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/gcp/validate/bastionhost/gcp-validate-bastionhost-commands.sh`
around lines 151 - 153, The pipeline that probes the cluster API using curl
inside run_ssh_cmd ("curl -k ... | head -20") loses curl's exit status to head;
modify the probe invoked in run_ssh_cmd so curl's failure is propagated — either
enable pipefail in the remote shell before running curl (set -o pipefail) or
have curl write to a temp file, cat the first 20 lines for logging, then exit
with curl's exit code; update the command string passed to run_ssh_cmd (the
cluster API probe invocation) accordingly so connectivity failures are not
masked.
| - chain: azure-provision-bastionhost | ||
| - ref: azure-validate-bastionhost |
There was a problem hiding this comment.
Move bastion validation after cluster install.
On Line 8, azure-validate-bastionhost runs before ipi-install (Line 11), but the validation scope includes connectivity from bastion to the installed cluster. This makes that check invalid (or flaky/failing) at this point in the workflow.
Suggested workflow ordering
pre:
- chain: ipi-conf-azure
- chain: azure-provision-bastionhost
- - ref: azure-validate-bastionhost
- ref: ovn-conf
- ref: rhcos-conf-osstream
- chain: ipi-install
- ref: ipi-azure-rbac
+ - ref: azure-validate-bastionhost🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/openshift/e2e/azure/ovn/openshift-e2e-azure-ovn-workflow.yaml`
around lines 7 - 8, The azure-validate-bastionhost ref currently runs before
cluster installation (it appears right after chain:
azure-provision-bastionhost), which causes connectivity checks from the bastion
to the cluster to run too early; edit the workflow so that the
azure-validate-bastionhost step is moved to run after the ipi-install step
(i.e., place the ref to azure-validate-bastionhost after the ipi-install entry)
ensuring validation executes against a deployed cluster. Make sure to preserve
existing dependencies and ordering around azure-provision-bastionhost and
ipi-install when relocating the validation step.
| - ref: gcp-provision-custom-endpoints | ||
| - ref: ignition-bastionhost | ||
| - ref: gcp-provision-bastionhost | ||
| - ref: gcp-validate-bastionhost |
There was a problem hiding this comment.
This runs too early to validate bastion-to-cluster connectivity.
At this point in the workflow the cluster has not been configured/installed yet, so the new validator can only exercise SSH/runtime checks and will skip or warn on the cluster API probe. If Check 4 is part of the contract for this step, it needs to run after install, or be split into a later post-install validation step.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/openshift/e2e/gcp/custom-endpoints/openshift-e2e-gcp-custom-endpoints-workflow.yaml`
at line 10, The step referencing gcp-validate-bastionhost is scheduled too early
(before cluster install) so its Check 4 (bastion-to-cluster connectivity) cannot
run correctly; either move the ref to run after the install completes or split
the validation into two steps—keep SSH/runtime checks here
(gcp-validate-bastionhost-runtime) and add a new post-install validation step
(e.g., gcp-validate-bastionhost-postinstall) that runs after the cluster install
and performs Check 4/cluster API probes—update the workflow to call the
appropriate validator refs in the correct sequence.
| - ref: ignition-bastionhost | ||
| - ref: gcp-provision-bastionhost | ||
| - ref: gcp-validate-bastionhost |
There was a problem hiding this comment.
Pre-install placement prevents the cluster connectivity check from doing real validation.
This validator now runs before chain: ipi-install, so its cluster API check cannot verify reachability to the installed cluster in this workflow. If that check is intended to gate bastion readiness, move it later or split the step so only SSH/runtime validation stays in pre.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/openshift/e2e/gcp/ovn/openshift-e2e-gcp-ovn-workflow.yaml`
around lines 7 - 9, The pre-install validator step (ref:
gcp-validate-bastionhost) currently runs before chain: ipi-install so its
cluster API connectivity check cannot reach an installed cluster; update the
workflow to either move gcp-validate-bastionhost to run after chain: ipi-install
or split gcp-validate-bastionhost into two steps (e.g.,
gcp-validate-bastionhost-pre for SSH/runtime checks kept in pre and
gcp-validate-bastionhost-post for cluster API/cluster connectivity checks placed
after ipi-install) and adjust references to ignition-bastionhost and
gcp-provision-bastionhost accordingly so only runtime/SSH validation remains in
pre-install.
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-azure-ovn |
|
@jluhrsen: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jluhrsen: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Add automated bastion host validation steps for GCP and Azure to verify:
Results are automatically saved to artifacts (bastion-validation.log and bastion-validation-results.txt) instead of requiring manual validation via long sleep periods.
Testing
Integrated bastion provisioning and validation into standard OVN e2e workflows:
Rehearse with:
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.14-e2e-gcp-ovn/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.8-e2e-azure-ovnJira
https://redhat.atlassian.net/browse/CORENET-7068
Summary by CodeRabbit