ensure the machine state snapshot capturing on build pods too#5052
ensure the machine state snapshot capturing on build pods too#5052openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
Signed-off-by: Nikolaos Moraitis <nmoraiti@redhat.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
WalkthroughAdded a new method to Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/test e2e |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/steps/source.go`:
- Line 601: The call to
client.MetricsAgent().StoreMachinesSnapshotForBuildPod(...) is being started
inside the retry loop and thus spawns a new 60-minute watcher on every retry;
move this call so the snapshot watcher is started only once per build attempt
set—either relocate the invocation outside the retry loop that contains the pod
start/retry logic or protect it with a sync.Once (or a boolean on the build
attempt struct) so client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx,
ns, fmt.Sprintf("%s-build", name), podClient) is invoked a single time using the
established podClient/ns/name for that attempt.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0d50279f-43d2-4bd7-865b-e5a3129f78dc
📒 Files selected for processing (2)
pkg/metrics/metrics.gopkg/steps/source.go
| } | ||
|
|
||
| client.MetricsAgent().AddNodeWorkload(ctx, ns, fmt.Sprintf("%s-build", name), name, podClient) | ||
| client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx, ns, fmt.Sprintf("%s-build", name), podClient) |
There was a problem hiding this comment.
Start the build-pod snapshot watcher only once per build attempt set.
Line 601 is inside the retry loop, so each retry starts another 60-minute polling goroutine for the same pod. That can inflate API calls and duplicate machine snapshot events.
Suggested fix
func handleBuild(ctx context.Context, client BuildClient, podClient kubernetes.PodClient, build buildapi.Build, needsMultiArchWorkaround func() bool) error {
const attempts = 5
ns, name := build.Namespace, build.Name
var errs []error
+ snapshotWatcherStarted := false
if err := wait.ExponentialBackoff(wait.Backoff{Duration: time.Minute, Factor: 1.5, Steps: attempts}, func() (bool, error) {
var attempt buildapi.Build
@@
client.MetricsAgent().AddNodeWorkload(ctx, ns, fmt.Sprintf("%s-build", name), name, podClient)
- client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx, ns, fmt.Sprintf("%s-build", name), podClient)
+ if !snapshotWatcherStarted {
+ client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx, ns, fmt.Sprintf("%s-build", name), podClient)
+ snapshotWatcherStarted = true
+ }
if err := waitForBuildOrTimeout(ctx, client, podClient, ns, name); err != nil {
errs = append(errs, err)
return false, handleFailedBuild(ctx, client, ns, name, err)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/steps/source.go` at line 601, The call to
client.MetricsAgent().StoreMachinesSnapshotForBuildPod(...) is being started
inside the retry loop and thus spawns a new 60-minute watcher on every retry;
move this call so the snapshot watcher is started only once per build attempt
set—either relocate the invocation outside the retry loop that contains the pod
start/retry logic or protect it with a sync.Once (or a boolean on the build
attempt struct) so client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx,
ns, fmt.Sprintf("%s-build", name), podClient) is invoked a single time using the
established podClient/ns/name for that attempt.
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
/test e2e |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: droslean, liangxia The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@droslean: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test images |
|
Scheduling tests matching the |
621c8b7
into
openshift:main
No description provided.