ensure the machine state snapshot capturing on build pods too by droslean · Pull Request #5052 · openshift/ci-tools

droslean · 2026-03-26T14:27:49Z

No description provided.

Signed-off-by: Nikolaos Moraitis <nmoraiti@redhat.com>

openshift-ci-robot · 2026-03-26T14:27:52Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-03-26T14:28:04Z

Walkthrough

Added a new method to MetricsAgent that asynchronously polls for a Kubernetes Pod and captures a machine snapshot when a ci-workload label is detected. The method is integrated into the build pod handling flow in the source step processor.

Changes

Cohort / File(s)	Summary
Metrics snapshot polling for build pods `pkg/metrics/metrics.go`, `pkg/steps/source.go`	Added `StoreMachinesSnapshotForBuildPod()` method with goroutine-based polling logic (5s intervals, 60-minute timeout) that waits for a Pod's `ci-workload` label before capturing metrics. Integrated the new method into `handleBuild()` to trigger snapshot capture for build pods.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

droslean · 2026-03-26T14:28:49Z

/test e2e

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/steps/source.go`:
- Line 601: The call to
client.MetricsAgent().StoreMachinesSnapshotForBuildPod(...) is being started
inside the retry loop and thus spawns a new 60-minute watcher on every retry;
move this call so the snapshot watcher is started only once per build attempt
set—either relocate the invocation outside the retry loop that contains the pod
start/retry logic or protect it with a sync.Once (or a boolean on the build
attempt struct) so client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx,
ns, fmt.Sprintf("%s-build", name), podClient) is invoked a single time using the
established podClient/ns/name for that attempt.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0d50279f-43d2-4bd7-865b-e5a3129f78dc

📥 Commits

Reviewing files that changed from the base of the PR and between 5357e13 and 9273294.

📒 Files selected for processing (2)

pkg/metrics/metrics.go
pkg/steps/source.go

coderabbitai · 2026-03-26T14:38:29Z

pkg/steps/source.go

 		}

 		client.MetricsAgent().AddNodeWorkload(ctx, ns, fmt.Sprintf("%s-build", name), name, podClient)
+		client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx, ns, fmt.Sprintf("%s-build", name), podClient)


⚠️ Potential issue | 🟠 Major

Start the build-pod snapshot watcher only once per build attempt set.

Line 601 is inside the retry loop, so each retry starts another 60-minute polling goroutine for the same pod. That can inflate API calls and duplicate machine snapshot events.

Suggested fix

func handleBuild(ctx context.Context, client BuildClient, podClient kubernetes.PodClient, build buildapi.Build, needsMultiArchWorkaround func() bool) error { const attempts = 5 ns, name := build.Namespace, build.Name var errs []error + snapshotWatcherStarted := false if err := wait.ExponentialBackoff(wait.Backoff{Duration: time.Minute, Factor: 1.5, Steps: attempts}, func() (bool, error) { var attempt buildapi.Build @@ client.MetricsAgent().AddNodeWorkload(ctx, ns, fmt.Sprintf("%s-build", name), name, podClient) - client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx, ns, fmt.Sprintf("%s-build", name), podClient) + if !snapshotWatcherStarted { + client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx, ns, fmt.Sprintf("%s-build", name), podClient) + snapshotWatcherStarted = true + } if err := waitForBuildOrTimeout(ctx, client, podClient, ns, name); err != nil { errs = append(errs, err) return false, handleFailedBuild(ctx, client, ns, name, err) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/steps/source.go` at line 601, The call to client.MetricsAgent().StoreMachinesSnapshotForBuildPod(...) is being started inside the retry loop and thus spawns a new 60-minute watcher on every retry; move this call so the snapshot watcher is started only once per build attempt set—either relocate the invocation outside the retry loop that contains the pod start/retry logic or protect it with a sync.Once (or a boolean on the build attempt struct) so client.MetricsAgent().StoreMachinesSnapshotForBuildPod(ctx, ns, fmt.Sprintf("%s-build", name), podClient) is invoked a single time using the established podClient/ns/name for that attempt.

openshift-ci-robot · 2026-03-26T16:06:46Z

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

droslean · 2026-03-28T13:33:53Z

/test e2e

openshift-ci · 2026-03-31T10:06:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: droslean, liangxia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [droslean,liangxia]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-03-31T13:23:22Z

@droslean: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/breaking-changes	`9273294`	link	false	`/test breaking-changes`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

droslean · 2026-03-31T14:02:37Z

/test images

openshift-ci-robot · 2026-03-31T15:26:08Z

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e

ensure the machine state snapshot capturing on build pods too

9273294

Signed-off-by: Nikolaos Moraitis <nmoraiti@redhat.com>

openshift-ci bot requested review from deepsm007 and hector-vido March 26, 2026 14:28

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2026

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

liangxia approved these changes Mar 31, 2026

View reviewed changes

openshift-ci bot assigned liangxia Mar 31, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2026

openshift-merge-bot bot merged commit 621c8b7 into openshift:main Mar 31, 2026
14 of 15 checks passed

droslean deleted the machines-2 branch April 1, 2026 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure the machine state snapshot capturing on build pods too#5052

ensure the machine state snapshot capturing on build pods too#5052
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
droslean:machines-2

droslean commented Mar 26, 2026

Uh oh!

openshift-ci-robot commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

droslean commented Mar 26, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 26, 2026

Uh oh!

openshift-ci-robot commented Mar 26, 2026

Uh oh!

droslean commented Mar 28, 2026

Uh oh!

openshift-ci bot commented Mar 31, 2026

Uh oh!

openshift-ci bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

droslean commented Mar 31, 2026

Uh oh!

openshift-ci-robot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

droslean commented Mar 26, 2026

Uh oh!

openshift-ci-robot commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

droslean commented Mar 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Mar 26, 2026

Uh oh!

droslean commented Mar 28, 2026

Uh oh!

openshift-ci bot commented Mar 31, 2026

Uh oh!

openshift-ci bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

droslean commented Mar 31, 2026

Uh oh!

openshift-ci-robot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

openshift-ci bot commented Mar 31, 2026 •

edited

Loading