[DO NOT MERGE] Revert "Only apply 5-minute sleep workaround for known multi-arch repos"#5055
[DO NOT MERGE] Revert "Only apply 5-minute sleep workaround for known multi-arch repos"#5055kunalmemane wants to merge 4 commits intoopenshift:mainfrom
Conversation
This reverts commit 1d12ac7.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
/hold |
WalkthroughRemoved the predicate-based multi-arch workaround: call sites stopped passing a function-valued workaround predicate to Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Tools execution failed with the following error: Failed to run tools: 14 UNAVAILABLE: read ECONNRESET Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kunalmemane The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/steps/source.go`:
- Line 493: The call to handleBuild(ctx, buildClient, podClient, b, true)
unconditionally enables the 5-minute multi-arch delay for all builds; revert
this by restoring the original gating logic (e.g., check the repository
allowlist or the predicate that previously determined if the multi-arch
workaround is needed) and only pass true when that check returns true. Locate
the callers in handleBuilds/handleBuild and replace the hardcoded true with the
original condition (for example needsMultiArchWorkaround(b.Repo) or
repoAllowlistContains(b.Repo)) or a feature-flag variable so only affected
repositories sleep 5 minutes. Ensure the symbolic names handleBuild,
handleBuilds and the repo allowlist/predicate are used to find and update the
code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1dacf753-4d04-4642-b731-b8c57fef3213
📒 Files selected for processing (3)
pkg/steps/bundle_source.gopkg/steps/project_image.gopkg/steps/source.go
pkg/steps/source.go
Outdated
| defer wg.Done() | ||
| metricsAgent.AddNodeWorkload(ctx, b.Namespace, fmt.Sprintf("%s-build", b.Name), b.Name, podClient) | ||
| if err := handleBuild(ctx, buildClient, podClient, b, needsMultiArchWorkaround); err != nil { | ||
| if err := handleBuild(ctx, buildClient, podClient, b, true); err != nil { |
There was a problem hiding this comment.
Unconditional 5-minute delay applied to all multi-arch builds.
Passing true here means every build routed through handleBuilds will now sleep 5 minutes, regardless of whether the repository actually needs the multi-arch workaround. Previously, this was gated by a repository allowlist check.
Given the PR title indicates "DO NOT MERGE," this appears intentional for testing/investigation, but merging this would introduce a significant build-time regression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/steps/source.go` at line 493, The call to handleBuild(ctx, buildClient,
podClient, b, true) unconditionally enables the 5-minute multi-arch delay for
all builds; revert this by restoring the original gating logic (e.g., check the
repository allowlist or the predicate that previously determined if the
multi-arch workaround is needed) and only pass true when that check returns
true. Locate the callers in handleBuilds/handleBuild and replace the hardcoded
true with the original condition (for example needsMultiArchWorkaround(b.Repo)
or repoAllowlistContains(b.Repo)) or a feature-flag variable so only affected
repositories sleep 5 minutes. Ensure the symbolic names handleBuild,
handleBuilds and the repo allowlist/predicate are used to find and update the
code.
|
/retest images |
|
/test images |
This reverts commit 8cf9cb7.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/steps/source.go (1)
469-475:⚠️ Potential issue | 🟠 MajorRestore the per-build gate for the 5-minute workaround.
Removing
NeedsMultiArchWorkaroundand sleeping unconditionally on Line 558 puts everyhandleBuildcaller on the slow path, including the bundle-only callers inpkg/steps/bundle_source.go(Lines 82-85) andpkg/steps/project_image.go(Lines 73-79) that previously opted out. Because this sleep sits inside the retry loop, a flaky build can now burn 25 minutes before retries even recreate the build. Please thread the old predicate/allowlist back throughhandleBuilds/handleBuildand keep bundle callers on a constantfalse.Minimal fix direction
type ImageBuildOptions struct { Architectures []string + NeedsMultiArchWorkaround func() bool }- if err := handleBuild(ctx, buildClient, podClient, b); err != nil { + if err := handleBuild(ctx, buildClient, podClient, b, o.NeedsMultiArchWorkaround); err != nil { errChan <- fmt.Errorf("error occurred handling build %s: %w", b.Name, err) }-func handleBuild(ctx context.Context, client BuildClient, podClient kubernetes.PodClient, build buildapi.Build) error { +func handleBuild(ctx context.Context, client BuildClient, podClient kubernetes.PodClient, build buildapi.Build, needsMultiArchWorkaround func() bool) error { const attempts = 5 ns, name := build.Namespace, build.Name var errs []error if err := wait.ExponentialBackoff(wait.Backoff{Duration: time.Minute, Factor: 1.5, Steps: attempts}, func() (bool, error) { var attempt buildapi.Build - // builds are using older src image, adding wait to avoid race condition - time.Sleep(5 * time.Minute) + if needsMultiArchWorkaround != nil && needsMultiArchWorkaround() { + // builds are using older src image, adding wait to avoid race condition + time.Sleep(5 * time.Minute) + } build.DeepCopyInto(&attempt)As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
Also applies to: 493-493, 551-558
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/steps/source.go` around lines 469 - 475, The unconditional 5-minute sleep was reintroduced and affects all handleBuild callers; restore a per-build gate by reintroducing the NeedsMultiArchWorkaround predicate and thread it through handleBuilds -> handleBuild (and any build option structs like ImageBuildOptions) so only builds that need the multi-arch workaround sleep; ensure bundle-only callers (the callers in bundle_source.go and project_image.go) pass/propagate a false value (or opt out) so they remain on the fast path and the retry loop no longer incurs the long sleep for them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/steps/source.go`:
- Around line 469-475: The unconditional 5-minute sleep was reintroduced and
affects all handleBuild callers; restore a per-build gate by reintroducing the
NeedsMultiArchWorkaround predicate and thread it through handleBuilds ->
handleBuild (and any build option structs like ImageBuildOptions) so only builds
that need the multi-arch workaround sleep; ensure bundle-only callers (the
callers in bundle_source.go and project_image.go) pass/propagate a false value
(or opt out) so they remain on the fast path and the retry loop no longer incurs
the long sleep for them.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f5e9dac1-7a50-452a-8991-e39200dea768
📒 Files selected for processing (3)
pkg/steps/bundle_source.gopkg/steps/project_image.gopkg/steps/source.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/steps/project_image.go
This reverts commit a3864a1.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/steps/source.go (1)
551-558:⚠️ Potential issue | 🟠 MajorRestore conditional gating for the pre-create sleep
Line 558 now sleeps unconditionally for every build attempt, and with the signature change at Line 551 there is no longer a way to scope this workaround to affected repos/builds. This adds avoidable latency to all builds and can materially slow CI throughput.
Proposed direction
type ImageBuildOptions struct { Architectures []string + NeedsMultiArchWorkaround bool } -func handleBuild(ctx context.Context, client BuildClient, podClient kubernetes.PodClient, build buildapi.Build) error { +func handleBuild(ctx context.Context, client BuildClient, podClient kubernetes.PodClient, build buildapi.Build, needsMultiArchWorkaround bool) error { const attempts = 5 ... - // builds are using older src image, adding wait to avoid race condition - time.Sleep(1 * time.Minute) + if needsMultiArchWorkaround { + // builds are using older src image, adding wait to avoid race condition + time.Sleep(1 * time.Minute) + } ... }As per coding guidelines, "-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/steps/source.go` around lines 551 - 558, The unconditional time.Sleep(1 * time.Minute) inside handleBuild (within wait.ExponentialBackoff) causes every build attempt to pause and must be gated; change handleBuild to only perform the pre-create sleep for builds that actually need the older-src-image workaround (e.g., inspect build.Metadata/Annotations/Labels or build.Spec.SourceImage/version) and only before the first attempt rather than on every retry. Locate the sleep in handleBuild and replace it with a conditional check (or a helper like needsOldSrcImage(build)) so the delay runs only for affected builds; ensure the logic uses existing build fields (labels/annotations/spec) and does not add global latency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/steps/source.go`:
- Around line 551-558: The unconditional time.Sleep(1 * time.Minute) inside
handleBuild (within wait.ExponentialBackoff) causes every build attempt to pause
and must be gated; change handleBuild to only perform the pre-create sleep for
builds that actually need the older-src-image workaround (e.g., inspect
build.Metadata/Annotations/Labels or build.Spec.SourceImage/version) and only
before the first attempt rather than on every retry. Locate the sleep in
handleBuild and replace it with a conditional check (or a helper like
needsOldSrcImage(build)) so the delay runs only for affected builds; ensure the
logic uses existing build fields (labels/annotations/spec) and does not add
global latency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 805e612d-abb1-4645-8564-0070ed445643
📒 Files selected for processing (1)
pkg/steps/source.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/steps/source.go`:
- Around line 551-559: In handleBuild, remove the commented-out sleep and its
explanatory comment (the dead code around the "builds are using older src image"
sleep) or replace it with a deterministic wait: implement a proper retry/wait
that checks the build/pod image readiness before proceeding (e.g., use
wait.PollImmediate or extend the existing ExponentialBackoff to poll the
relevant image/version field on the attempt/build or associated pod via
podClient until it reflects the expected src image), ensuring you update the
logic around build.DeepCopyInto(&attempt) accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c8023165-a9f4-4c49-b930-d6d27164d2c2
📒 Files selected for processing (1)
pkg/steps/source.go
| func handleBuild(ctx context.Context, client BuildClient, podClient kubernetes.PodClient, build buildapi.Build) error { | ||
| const attempts = 5 | ||
| ns, name := build.Namespace, build.Name | ||
| var errs []error | ||
| if err := wait.ExponentialBackoff(wait.Backoff{Duration: time.Minute, Factor: 1.5, Steps: attempts}, func() (bool, error) { | ||
| var attempt buildapi.Build | ||
| // TODO: This is a workaround to avoid race condition with multiple ci-operator instances building different architectures | ||
| // See https://issues.redhat.com/browse/OCPBUGS-65845 | ||
| if needsMultiArchWorkaround != nil && needsMultiArchWorkaround() { | ||
| time.Sleep(5 * time.Minute) | ||
| } | ||
| // builds are using older src image, adding wait to avoid race condition | ||
| //time.Sleep(1 * time.Minute) | ||
| build.DeepCopyInto(&attempt) |
There was a problem hiding this comment.
Commented-out race condition fix is a concern if merged.
Lines 557-558 contain dead code with a comment acknowledging a race condition ("builds are using older src image"). The sleep that previously mitigated this is now commented out.
Given the PR title "DO NOT MERGE" and /hold label, this appears intentional for investigation. However, if this were to be merged:
- The race condition described in the comment would remain unaddressed
- The commented-out code should be removed entirely or replaced with a proper fix
Consider removing the dead code and comment, or implementing a proper solution before any future merge.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/steps/source.go` around lines 551 - 559, In handleBuild, remove the
commented-out sleep and its explanatory comment (the dead code around the
"builds are using older src image" sleep) or replace it with a deterministic
wait: implement a proper retry/wait that checks the build/pod image readiness
before proceeding (e.g., use wait.PollImmediate or extend the existing
ExponentialBackoff to poll the relevant image/version field on the attempt/build
or associated pod via podClient until it reflects the expected src image),
ensuring you update the logic around build.DeepCopyInto(&attempt) accordingly.
|
@kunalmemane: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This reverts commit 1d12ac7.