Skip to content

[CP 1268] anr - fixes#502

Merged
sajmera-pensando merged 1 commit intoROCm:mainfrom
ci-penbot-01:CP.O2O.pensando.gpu-operator.1268.rocm.gpu-operator.main
Apr 3, 2026
Merged

[CP 1268] anr - fixes#502
sajmera-pensando merged 1 commit intoROCm:mainfrom
ci-penbot-01:CP.O2O.pensando.gpu-operator.1268.rocm.gpu-operator.main

Conversation

@ci-penbot-01
Copy link
Copy Markdown
Contributor

cp of pensando/gpu-operator#1268


Source PR Description (pensando/gpu-operator#1268):

This PR addresses several issues in the remediation workflow handling:

  • GPUOP-603: The applyLabels step previously showed as "Succeeded" even when no custom labels were provided. It is now skipped when the label list is empty, giving clearer visibility into what the workflow actually executed.

  • GPUOP-604: The applyLabels and removeLabels steps previously treated label application failures as best-effort and always reported success. They now fail the workflow if any user-provided label operation fails, ensuring errors are surfaced rather than silently ignored.

  • GPUOP-605: Fixed a corner case where workload resume was not triggered when the recoveryPolicy limit was reached.

  • GPUOP-609: During helm uninstall, the operator was unconditionally deleting the remediation ConfigMap, even if it was user-created. The cleanup now only removes operator-created ConfigMaps, leaving user-provided ones intact.

  • GPUOP-610: Fixed autoStartWorkflow not taking effect when set via --set during helm install. The Helm template used {{- with }} to render the field, which treats false as a falsy value and skips the block entirely. Replaced with {{- if hasKey }} to correctly handle boolean values.

  • GPUOP-611: Fixed the testRunnerImageSecret parameter not being passed down to the workflow step's script, causing the secret to be ignored during workflow execution.

Cherrypick triggered by: ACP-Automation

* anr - fixes for applylabels step

* multiple anr fixes

(cherry picked from commit b33e4c9cc14ac69c6eab868e86bc20b295414c03)
@sajmera-pensando sajmera-pensando merged commit b18f660 into ROCm:main Apr 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants