MULTIARCH-6023: PowerVS IPI install signal handling to prevent deprovision failures#80917
MULTIARCH-6023: PowerVS IPI install signal handling to prevent deprovision failures#80917Neha-dot-Yadav wants to merge 2 commits into
Conversation
|
Caution Review failedAn error occurred during the review process. Please try again later. WalkthroughThe script's signal handling is split from a single combined trap into separate TERM and EXIT traps. The TERM trap now kills background child processes and exits with status 143, while EXIT runs cleanup on normal termination. Both ChangesSignal handling and process execution refactor
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh (1)
972-976: 🔒 Security & Privacy | 🔴 Critical | ⚡ Quick winSame redirection-order bug as
create cluster— stderr bypasses the filter here too.🔒 Proposed fix
- openshift-install wait-for install-complete --dir="${dir}" 2>&1 > >(grep --line-buffered -v 'password\|X-Auth-Token\|UserData:') & + openshift-install wait-for install-complete --dir="${dir}" > >(grep --line-buffered -v 'password\|X-Auth-Token\|UserData:') 2>&1 & INSTALL_PID=$! wait $INSTALL_PID ret=$?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh` around lines 972 - 976, The redirection order in the openshift-install wait-for install-complete command is incorrect, causing stderr to bypass the grep filter. The current order `2>&1 > >(grep ...)` redirects stderr before stdout is piped to grep, so stderr goes to the original stdout instead of through the filter. Move the `2>&1` redirection to come after the process substitution `> >(grep ...)` so that both stdout and stderr are properly filtered through the grep command that removes sensitive information like passwords and authentication tokens.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh`:
- Around line 940-944: The redirection order in the openshift-install command is
incorrect, causing stderr to bypass the secret-scrubbing grep filter. Currently,
stderr is redirected to stdout before stdout is redirected to the grep process
substitution, allowing sensitive information like passwords and tokens to leak
into CI logs unredacted. Fix this by swapping the redirection order in the
openshift-install invocation so that stdout is first redirected to the grep
process substitution using > >(...), and then stderr is merged into it using
2>&1, ensuring both output streams pass through the password, X-Auth-Token, and
UserData filtering.
- Around line 707-710: Split the combined trap statement handling both EXIT and
TERM into two separate trap handlers to prevent prepare_next_steps from
executing twice and to preserve the correct exit code. Create a TERM trap that
kills background children and exits the script immediately, and a separate EXIT
trap that captures the exit code before running prepare_next_steps, ensuring the
function reads the actual installation result rather than the status from the
wait command.
---
Duplicate comments:
In
`@ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh`:
- Around line 972-976: The redirection order in the openshift-install wait-for
install-complete command is incorrect, causing stderr to bypass the grep filter.
The current order `2>&1 > >(grep ...)` redirects stderr before stdout is piped
to grep, so stderr goes to the original stdout instead of through the filter.
Move the `2>&1` redirection to come after the process substitution `> >(grep
...)` so that both stdout and stderr are properly filtered through the grep
command that removes sensitive information like passwords and authentication
tokens.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b04622d5-638c-4b9d-86ce-8f97b938dad7
📒 Files selected for processing (1)
ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh
|
[REHEARSALNOTIFIER]
A total of 43 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@Neha-dot-Yadav: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-powervc-multi-p-p |
|
@hamzy: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@hamzy: job(s): periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-powervc-multi-p-p either don't exist or were not found to be affected, and cannot be rehearsed |
|
/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-4.22-ocp-e2e-ovn-powervs-capi-multi-p-p |
|
@hamzy: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Neha-dot-Yadav, prb112 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retitle MULTIARCH-6023: PowerVS IPI install signal handling to prevent deprovision failures |
|
@Neha-dot-Yadav: This pull request references MULTIARCH-6023 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-4.22-ocp-e2e-ovn-powervs-capi-multi-p-p |
|
@hamzy: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Problem
When PowerVS IPI installations timeout, the deprovision step fails with:
Skipping: /tmp/secret/metadata.json not found.This prevents cleanup of cloud resources (VMs, networks, DNS records), leading to stale resource and subsequent job failures.
Root Cause
The script has two issues with signal handling:
openshift-install ... | grep ... # Bash is blocked hereWhen SIGTERM arrives, it's queued but cannot be processed until the pipeline completes. If the pipeline never completes (stuck installation), SIGKILL arrives after the grace period and terminates everything without running cleanup.
Solution
Summary by CodeRabbit
This PR improves the reliability of PowerVS IPI provisioning cleanup in OpenShift CI by fixing signal-handling behavior during install timeouts—preventing failures that previously left stale infrastructure behind (VMs, networks, and DNS records) and caused subsequent jobs to fail.
What changed (practical impact)
In
ci-operator/step-registry/ipi/install/powervs/install/ipi-install-powervs-install-commands.sh, the install flow was adjusted so that when an install is interrupted (e.g., due to timeout), the script can still run its cleanup/“next steps” logic instead of exiting prematurely with errors likeSkipping: /tmp/secret/metadata.json not found.How it fixes the timeout/deprovision failure
openshift-install ... | grep ...as a foreground pipeline (which can delay trap execution until the pipeline completes),openshift-installis now run in the background with output filtered via process substitution.waitfor interruptible lifecycle control: The script captures the installer PID, waits on it, and propagates the resulting status—so SIGTERM/SIGKILL timing doesn’t prevent the script from executing its termination path.These adjustments apply to both phases:
openshift-install --dir=... create clusteropenshift-install wait-for install-completeOverall, PowerVS IPI jobs are less likely to strand cloud resources when interrupted, reducing downstream retries and cascading failures.