OCPBUGS-77056: Make external cert validation asynchronous#745
Conversation
|
Skipping CI for Draft Pull Request. |
|
@bentito: This pull request references Jira Issue OCPBUGS-77056, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
325f313 to
214b603
Compare
|
/hold |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughSemaphore-throttled, cached SAR validation for external TLS; new SharedSecretManager with per-namespace informers; SAR-completion route status events; condition-equality fixes; WriterLease multi-worker support; synchronous test refactors and small build/client wiring tweaks. ChangesExternal Certificate SAR Validation and Route Status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/router/controller/route_secret_manager.go`:
- Around line 288-298: The current Add handler unconditionally calls
RecordRouteRejection when a secret is replayed; change it to mirror the
UpdateFunc/SAR-completion logic: check the route's admission (via
populateRouteTLSFromSecret result or route.Status.Conditions) and if the route
is already admitted emit a normal update event instead of calling
RecordRouteRejection, otherwise record the rejection; extract the string
"ExternalCertificateSecretLoaded" into a shared constant (e.g.,
ExtCrtStatusReasonSecretLoaded) and add that constant to the
ignoreIngressConditionReason set in contention.go so the admission-ignoring
logic treats this replay reason as benign. Ensure you update references to
RecordRouteRejection, populateRouteTLSFromSecret, UpdateFunc,
ExternalCertificateSecretLoaded, and ignoreIngressConditionReason accordingly.
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 503-556: The async SAR flow stores only a single pending error
(pendingErr) in asyncSARCache for a cacheKey, so subsequent calls (in validate
flow) that hit the cache drop their onComplete callback and never get
re-enqueued; change the cache value to include a list of waiting callbacks (or a
small struct with errs + []onComplete callbacks) keyed by asyncSARCache so that
when you detect a pending entry you append the current onComplete for
route.Namespace/secretName, and in the goroutine after
asyncSARCache.Store(cacheKey, errs) iterate and invoke all stored callbacks (not
just the first) and then replace the cache entry with errs only; update code
paths that read the cache (the cache-hit branch and the goroutine completion) to
handle this new struct and ensure onComplete is invoked for every waiting route.
- Around line 485-490: The global asyncSARCache currently stores completed
validation results forever; change it to avoid permanent sticky entries by (a)
replacing the raw sync.Map value with a small struct that includes the cached
field.ErrorList plus an expiration timestamp or a source type flag, (b) only
writing non-transient successful validation results (or results with a short
TTL) into asyncSARCache, and (c) adding explicit invalidation helpers (e.g.,
InvalidateAsyncSARCache(namespace, secretName) and
InvalidateAllAsyncSARCacheForSecret(namespace, secretName)) and call them from
the secret add/update/delete/recreate paths in the secret manager
(route_secret_manager.go) so secret changes force revalidation; also update
ClearAsyncSARCacheForTest to clear the new structure. Ensure references to
asyncSARCache and ClearAsyncSARCacheForTest (and the code paths mentioned around
the 503-556 region) are updated to use the new value type and invalidation APIs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e07f375c-945a-4e58-b5ec-3347eaff109c
📒 Files selected for processing (5)
pkg/router/controller/contention.gopkg/router/controller/route_secret_manager.gopkg/router/controller/route_secret_manager_test.gopkg/router/routeapihelpers/validation.gopkg/router/routeapihelpers/validation_test.go
There was a problem hiding this comment.
♻️ Duplicate comments (2)
pkg/router/routeapihelpers/validation.go (1)
485-497:⚠️ Potential issue | 🟠 MajorCompleted SAR entries are still permanent.
Only secret add/update/delete clears this cache. A finished allow/deny result for
namespace/secretNamesurvives route deletion, later RBAC grants/revocations, and transient SAR/API failures, so a new route that reuses the same secret name can inherit a stale decision without performing a fresh check. Please put completed entries behind a TTL/generation and avoid caching transient internal errors indefinitely.Also applies to: 593-605
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/routeapihelpers/validation.go` around lines 485 - 497, The cache currently stores completed asyncSARResult entries forever; update asyncSARResult (and uses of asyncSARCache) to include a timestamp or generation counter and enforce a TTL/generation check before returning cached results so that completed allow/deny decisions expire after a short interval or when generation changes. Ensure only stable allow/deny outcomes are cached long-term; if asyncSARResult.errs contains transient/internal errors, set a much shorter TTL (or do not mark done) so they trigger fresh SARs. Update the cache read path to validate timestamp/generation and the write path to record it, and keep InvalidateAsyncSARCache(namespace, secretName) as-is to support manual invalidation.pkg/router/controller/route_secret_manager.go (1)
262-304:⚠️ Potential issue | 🟠 MajorDon't clear the fresh SAR result on the initial
SecretLoadedreplay.
RegisterRoute()is only reached aftervalidate()has already completed successfully, so this non-deletedAddFuncpath is replaying an already-validated secret. ClearingasyncSARCachehere throws away that fresh result, defeats the per-secret cache for shared secrets, and can bounce the route back into a second"authorization check pending"/ExternalCertificateValidationFailedcycle on theSecretLoadedrequeue. Keep invalidation on actual secret changes (recreated/updated/deleted), but not on the initial load replay.Suggested shape
AddFunc: func(obj interface{}) { secret := obj.(*kapi.Secret) log.V(4).Info("Secret added for route", "namespace", namespace, "secret", secret.Name, "route", routeName) - routeapihelpers.InvalidateAsyncSARCache(namespace, secret.Name) // Secret re-creation scenario // Check if the route key exists in the deletedSecrets map, indicating that the secret was previously deleted for this route. // If it exists, it means the secret is being recreated. Remove the key from the map and proceed with handling the route. @@ key := generateKey(namespace, routeName) if _, deleted := p.deletedSecrets.LoadAndDelete(key); deleted { + routeapihelpers.InvalidateAsyncSARCache(namespace, secret.Name) log.V(4).Info("Secret recreated for route", "namespace", namespace, "secret", secret.Name, "route", routeName)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/controller/route_secret_manager.go` around lines 262 - 304, The AddFunc for secret events currently calls routeapihelpers.InvalidateAsyncSARCache unconditionally, which discards fresh per-secret async SAR results during the initial cache-sync replay; to fix, remove the unconditional call to routeapihelpers.InvalidateAsyncSARCache from the top of the AddFunc in RegisterRoute's secret handler and only invoke routeapihelpers.InvalidateAsyncSARCache when the secret is actually changed/recreated/deleted (e.g., inside the p.deletedSecrets.LoadAndDelete "recreated" branch and in the Update/Delete handlers), using generateKey(namespace, routeName) and p.deletedSecrets to determine real recreation so you don't invalidate the SAR cache on the initial load.
🧹 Nitpick comments (1)
pkg/router/controller/route_secret_manager_test.go (1)
1355-1372: Put a timeout around these async waits.These bare channel receives will hang the whole suite on any missed informer callback. Now that the tests intentionally depend on async completion, a tiny helper with
select+time.Afterwill turn those deadlocks into normal test failures.Suggested helper
+func waitForRecorderEvent(t *testing.T, ch <-chan struct{}) { + t.Helper() + select { + case <-ch: + case <-time.After(2 * time.Second): + t.Fatal("timeout waiting for recorder event") + } +}Then replace each bare receive with:
-<-recorder.doneCh +waitForRecorderEvent(t, recorder.doneCh)Also applies to: 1434-1444, 1496-1516
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/controller/route_secret_manager_test.go` around lines 1355 - 1372, Replace bare channel receives on recorder.doneCh with a timed wait helper to avoid test hangs: add a small helper (e.g., waitForDone(ctxTimeout, ch) or waitWithTimeout(t, ch, duration)) that uses select with the channel and time.After to fail the test on timeout, then update every occurrence of "<-recorder.doneCh" in route_secret_manager_test.go (including the occurrences around the Secret Update flow that use kubeClient.CoreV1().Secrets(...).Update and the initial Add wait) to call that helper instead so missed informer callbacks produce a deterministic test failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/router/controller/route_secret_manager.go`:
- Around line 262-304: The AddFunc for secret events currently calls
routeapihelpers.InvalidateAsyncSARCache unconditionally, which discards fresh
per-secret async SAR results during the initial cache-sync replay; to fix,
remove the unconditional call to routeapihelpers.InvalidateAsyncSARCache from
the top of the AddFunc in RegisterRoute's secret handler and only invoke
routeapihelpers.InvalidateAsyncSARCache when the secret is actually
changed/recreated/deleted (e.g., inside the p.deletedSecrets.LoadAndDelete
"recreated" branch and in the Update/Delete handlers), using
generateKey(namespace, routeName) and p.deletedSecrets to determine real
recreation so you don't invalidate the SAR cache on the initial load.
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 485-497: The cache currently stores completed asyncSARResult
entries forever; update asyncSARResult (and uses of asyncSARCache) to include a
timestamp or generation counter and enforce a TTL/generation check before
returning cached results so that completed allow/deny decisions expire after a
short interval or when generation changes. Ensure only stable allow/deny
outcomes are cached long-term; if asyncSARResult.errs contains
transient/internal errors, set a much shorter TTL (or do not mark done) so they
trigger fresh SARs. Update the cache read path to validate timestamp/generation
and the write path to record it, and keep InvalidateAsyncSARCache(namespace,
secretName) as-is to support manual invalidation.
---
Nitpick comments:
In `@pkg/router/controller/route_secret_manager_test.go`:
- Around line 1355-1372: Replace bare channel receives on recorder.doneCh with a
timed wait helper to avoid test hangs: add a small helper (e.g.,
waitForDone(ctxTimeout, ch) or waitWithTimeout(t, ch, duration)) that uses
select with the channel and time.After to fail the test on timeout, then update
every occurrence of "<-recorder.doneCh" in route_secret_manager_test.go
(including the occurrences around the Secret Update flow that use
kubeClient.CoreV1().Secrets(...).Update and the initial Add wait) to call that
helper instead so missed informer callbacks produce a deterministic test
failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c800f527-b5e1-4386-b6cf-e20eb4726a31
📒 Files selected for processing (4)
pkg/router/controller/contention.gopkg/router/controller/route_secret_manager.gopkg/router/controller/route_secret_manager_test.gopkg/router/routeapihelpers/validation.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/router/controller/contention.go
|
/retest-required |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 561-568: The semaphore (asyncSARSemaphore) is being acquired on
the caller path before launching the goroutine, causing the caller to block when
the limit is reached; move the acquire into the goroutine so the caller can
spawn all per-secret goroutines without blocking, then release the token inside
that goroutine (use defer to ensure release). Concretely: start the goroutine
immediately, perform asyncSARSemaphore <- struct{}{} at the top of the goroutine
body, and keep the existing defer func() { <-asyncSARSemaphore }() to guarantee
release; ensure any early returns in the goroutine still release the token.
- Around line 532-535: The current guard that returns nil when sarc or
secretsGetter is nil hides wiring bugs and skips RBAC/secret validation;
instead, change the conditional in validation.go so that if sarc == nil ||
secretsGetter == nil you return an internal error (e.g., fmt.Errorf or the
package's internal/server error type) describing "missing validation clients"
rather than nil, or alternatively gate this behind a test-only seam/flag so
production never silently succeeds; update callers to propagate/handle the
returned error as needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 31462f9b-48bb-466f-8d90-5c399b5fab8b
📒 Files selected for processing (1)
pkg/router/routeapihelpers/validation.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 526-529: The code currently returns a synthetic pending error on
cache miss without registering a callback if onComplete is nil; change the logic
in the validation path (the block that manipulates cached.callbacks and returns
cached.errs) so that onComplete must never be nil: either (A) make onComplete a
required parameter at the API boundary, or (B) when onComplete == nil execute
the async validation synchronously (drive the same validation routine on the
current goroutine, wait for completion, populate cached.errs, and then return
the real errors) instead of just returning the pending placeholder; if you
choose (B) ensure you reuse the same validation function that enqueues callbacks
(so cached state is updated) and that cached.callbacks is appended only when
onComplete is non-nil or after synchronous completion you return the finalized
errors.
- Around line 572-599: Wrap each SAR and secret GET with a per-request timeout
context (use context.WithTimeout and defer cancel) instead of context.TODO();
for the SAR checks stop using authorizationutil.Authorize and call the SAR
client Create() directly (e.g., sarClient.Create) with that timeout context so
you can inspect both (response, err) separately, treat API/transport errors as
retryable/internal (append field.InternalError or a distinct cached error) and
only append field.Forbidden when the SAR response explicitly denies, and call
secretsGetter.Secrets(route.Namespace).Get with the same bounded context and map
Get errors to NotFound vs InternalError accordingly. Ensure you still reference
routerServiceAccount, secretName, fldPath and preserve existing
field.Forbidden/field.InternalError/field.NotFound usages when classifying the
outcome.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bbebc97e-5cff-40e0-80d7-bff7bf16a68c
📒 Files selected for processing (1)
pkg/router/routeapihelpers/validation.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/router/routeapihelpers/validation.go (1)
637-639: Consider recovering from callback panics to ensure all callbacks are invoked.If a callback panics, the remaining callbacks in the slice won't be invoked. This is unlikely but could leave some routes stuck in a pending state if one route's callback handler has a bug.
🛡️ Proposed defensive callback invocation
for _, cb := range callbacks { - cb(route.Namespace, secretName) + func() { + defer func() { + if r := recover(); r != nil { + // Log panic but continue invoking remaining callbacks + } + }() + cb(route.Namespace, secretName) + }() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/routeapihelpers/validation.go` around lines 637 - 639, Wrap each callback invocation in a deferred recover to prevent a single panic from aborting the loop: inside the loop over callbacks (callbacks, cb) call each cb via a small closure that uses defer + recover to catch any panic, log or report the panic along with the callback context (e.g., route.Namespace and secretName) and continue to the next cb; ensure the recovered panic does not re-panic so all callbacks are invoked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 637-639: Wrap each callback invocation in a deferred recover to
prevent a single panic from aborting the loop: inside the loop over callbacks
(callbacks, cb) call each cb via a small closure that uses defer + recover to
catch any panic, log or report the panic along with the callback context (e.g.,
route.Namespace and secretName) and continue to the next cb; ensure the
recovered panic does not re-panic so all callbacks are invoked.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c9c78ea5-dba5-483e-8d75-13dc6f9905d7
📒 Files selected for processing (1)
pkg/router/routeapihelpers/validation.go
716e042 to
0a99ee0
Compare
Signed-off-by: Brett Tofel <btofel@redhat.com>
|
/retest |
1 similar comment
|
/retest |
|
/retest |
…prevent status flapping
|
/retest |
…us from async goroutines
|
/retest |
…to clear rejections
|
/retest |
1 similar comment
|
/retest |
|
/retest |
…mparison bug The synchronous logic refactoring missed explicit status updates for ExternalCertificateSARCompleted, which are required by E2E tests. Additionally, conditionsEqual in contention.go had a bug where it would incorrectly ignore status changes (e.g., from Rejected to Admitted) if an ignored reason was present. This prevented routes from becoming active after successful validation. - Corrected conditionsEqual to only ignore reasons if Type and Status match. - Restored RecordRouteUpdate calls with SARCompleted reason. - Added SARCompleted to ignored reasons in contention tracker. - Updated unit tests to verify status updates.
|
/retest |
1 similar comment
|
/retest |
… comparison The StatusAdmitter was inadvertently stripping meaningful reasons (like SARCompleted) from route status during standard reconciliation. This occurred because recordIngressCondition used literal struct comparison (!=), bypassing the 'ignored reasons' logic. This resulted in a status flapping loop where RouteSecretManager would set a reason, and HandleRoute would immediately clear it, causing high API churn and Server-Side Apply conflicts in CI. - Updated recordIngressCondition to use conditionsEqual helper. - Added pre-checks to RecordRouteUpdate/Rejection to skip redundant updates. - Added a unit test to verify that ignored reasons are preserved.
…essions The StatusAdmitter was inadvertently stripping meaningful reasons (like SARCompleted) from route status during standard reconciliation. This occurred because recordIngressCondition used literal struct comparison (!=), bypassing the 'ignored reasons' logic. Additionally, the previous optimization attempt was too aggressive, as it only checked conditions and ignored changes to other ingress fields like Host or WildcardPolicy, which broke standard tests. - Updated recordIngressCondition to use conditionsEqual helper. - Removed redundant top-level checks in RecordRouteUpdate/Rejection to ensure all ingress fields are correctly synced. - Added a unit test to verify that ignored reasons are preserved.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/router/routeapihelpers/validation.go (1)
236-257:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject public/certificate PEM blocks in
spec.tls.keybeforetls.X509KeyPair
ExtendedValidateRoutesanitizestlsConfig.KeywithsanitizePEM(which preservesPUBLIC KEY/CERTIFICATEblocks) and then appends the entire sanitizedtlsConfig.KeyintokeyBytespassed totls.X509KeyPair(whentlsConfig.Certificateis set). This allowstls.keythat contains a private key plus certificate/public PEM blocks (see test case"A key field containing Public Key/certificate attributes should be rejected"withexpectedErrors: 0).
Strip/rejectPUBLIC KEY/CERTIFICATEblocks fromtlsConfig.Key(only allow private-key PEM) before callingtls.X509KeyPair.Suggested fix
if len(tlsConfig.Certificate) > 0 { if certBytes, keyBytes, err := splitCertKey([]byte(tlsConfig.Certificate)); err != nil { result = append(result, field.Invalid(tlsFieldPath.Child("certificate"), "redacted key data", err.Error())) } else { - // Use any private key that was found in either - // tlsConfig.Certificate or tlsConfig.Key. - keyBytes = append(keyBytes, []byte(tlsConfig.Key)...) + // Use any private key that was found in either + // tlsConfig.Certificate or tlsConfig.Key, but reject + // public/certificate PEM blocks in the key field. + keyPublicBytes, keyOnlyBytes, err := splitCertKey([]byte(tlsConfig.Key)) + if err != nil { + result = append(result, field.Invalid(tlsFieldPath.Child("key"), "redacted key data", err.Error())) + return result + } + if len(keyPublicBytes) > 0 { + result = append(result, field.Invalid(tlsFieldPath.Child("key"), "redacted key data", "key must not contain certificate/public PEM blocks")) + return result + } + keyBytes = append(keyBytes, keyOnlyBytes...) if len(keyBytes) == 0 { result = append(result, field.Invalid(tlsFieldPath.Child("key"), "", "no key specified")) } else {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/routeapihelpers/validation.go` around lines 236 - 257, ExtendedValidateRoute currently allows tlsConfig.Key that contains PUBLIC KEY or CERTIFICATE PEM blocks because sanitizePEM preserves them and they get appended to keyBytes before tls.X509KeyPair; modify the validation so that after sanitizing/parsing tlsConfig.Key you scan its PEM blocks and either strip non-private-key blocks or (preferably) reject the key if any PEM block has Type "PUBLIC KEY" or "CERTIFICATE". Concretely, in the branch handling tlsConfig.Certificate (the code that calls splitCertKey and then appends tlsConfig.Key into keyBytes before tls.X509KeyPair), parse tlsConfig.Key into PEM blocks, and if any block.Type equals "PUBLIC KEY" or "CERTIFICATE" append a field.Invalid on tlsFieldPath.Child("key") (similar to other errors) and do not call tls.X509KeyPair; otherwise proceed with only private-key PEM blocks (or the sanitized key) as now. Use the existing symbols tlsConfig.Key, tlsFieldPath, sanitizePEM, splitCertKey, tls.X509KeyPair and ExtendedValidateRoute to locate and implement the change.
🧹 Nitpick comments (2)
pkg/router/template/configmanager/haproxy/testing/haproxy.go (1)
49-49: ⚡ Quick winConsider using
os.TempDir()instead of hardcoded/tmp.The change from
os.TempDir()to"/tmp"reduces portability. While/tmpis available on Unix-like systems,os.TempDir()is more portable and respects system-specific temp directory configuration (e.g.,TMPDIRenvironment variable).♻️ Revert to os.TempDir() for better portability
- f, err := os.CreateTemp("/tmp", prefix) + f, err := os.CreateTemp(os.TempDir(), prefix)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/template/configmanager/haproxy/testing/haproxy.go` at line 49, Replace the hard-coded "/tmp" argument passed to os.CreateTemp with os.TempDir() to restore portability; locate the os.CreateTemp call (the line creating f, err := os.CreateTemp("/tmp", prefix) in haproxy.go) and change the directory argument to os.TempDir() so the system-configured temp directory (and TMPDIR) is respected.pkg/cmd/infra/router/clientcmd.go (1)
64-68: ⚡ Quick winVerify rate limit values are appropriate for production workloads.
The client-side rate limits have been increased from defaults (typically QPS=5, Burst=10) to QPS=50 and Burst=100 to support concurrent SAR validation during router startup. While these values seem reasonable for the use case, please confirm:
- These limits are appropriate for typical OpenShift cluster API server capacity
- The 10x increase in QPS won't contribute to API server overload during router startup storms (e.g., multiple routers restarting simultaneously)
- These values have been tested under realistic load conditions with many external certificate routes
Consider testing or confirming via documentation/SRE input that these rate limits align with API server capacity planning.
Based on learnings, the context snippet from
pkg/cmd/infra/router/template.go:645shows thisKubeConfig()result is used to create theSubjectAccessReviewClientfor external certificate SAR validation, so the increased rate limits directly affect the SAR request throughput path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/cmd/infra/router/clientcmd.go` around lines 64 - 68, The hardcoded increase of clientConfig.QPS to 50 and clientConfig.Burst to 100 may be too aggressive for production; make these values configurable and document/validate them: replace the fixed assignments around clientConfig.QPS and clientConfig.Burst with configurable parameters (e.g., flags or env vars) surfaced where KubeConfig() is used to build the SubjectAccessReviewClient, add reasonable defaults and validation (min/max), and add a short comment referencing SAR validation path so SRE can test; also update tests or add a unit/integration test that exercises creating the SubjectAccessReviewClient with custom QPS/Burst values and include guidance to verify against API server capacity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/router/controller/route_secret_manager.go`:
- Around line 119-120: The SARCompleted status is being emitted too early by
calling p.recorder.RecordRouteUpdate with ExtCrtStatusReasonSARCompleted (e.g.,
the calls that use route.Spec.TLS.ExternalCertificate.Name), which sets
RouteAdmitted=True before the downstream TLS validation (ExtendedValidator)
runs; move these RecordRouteUpdate calls so they execute only after the
downstream plugin chain (ExtendedValidator) has successfully validated the
tls.crt/tls.key, or alternatively check the current route admission and only
emit SARCompleted when the route is already admitted; update all similar sites
(the calls around the shown lines and the other occurrences you noted) to use
this gating or relocation so SARCompleted is only recorded post-validation.
In `@pkg/router/controller/status.go`:
- Around line 145-155: The status-update paths (the
isIngressConditionUpdateRequired check and related callers like the one invoking
performIngressConditionUpdate with routev1.RouteIngressCondition) currently rely
on the contention-oriented comparator that ignores the Reason field; change
these checks so they use a strict equality comparator that only ignores
LastTransitionTime (i.e., compare Type, Status, Reason, Message exactly) when
deciding to skip status writes, while leaving the existing
ignored-reason/conditionsEqual logic intact for contention detection only;
update the calls to isIngressConditionUpdateRequired (and the equivalent checks
at the other mentioned locations) to use the new strict comparator so
transitions like ExternalCertificateSecretUpdated →
ExternalCertificateSARCompleted are persisted.
In `@pkg/router/writerlease/writerlease.go`:
- Around line 156-174: Run() defers l.queue.ShutDown(), which delays shutting
the queue until after wg.Wait(), causing workers blocked in l.queue.Get() to
deadlock; change this by removing the defer and explicitly calling
l.queue.ShutDown() immediately after <-stopCh (before wg.Wait()) so workers
receive shutdown=true and can exit, then call wg.Wait() to join them; update the
Run function (the goroutine loop and shutdown ordering) to reflect this explicit
shutdown ordering.
---
Outside diff comments:
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 236-257: ExtendedValidateRoute currently allows tlsConfig.Key that
contains PUBLIC KEY or CERTIFICATE PEM blocks because sanitizePEM preserves them
and they get appended to keyBytes before tls.X509KeyPair; modify the validation
so that after sanitizing/parsing tlsConfig.Key you scan its PEM blocks and
either strip non-private-key blocks or (preferably) reject the key if any PEM
block has Type "PUBLIC KEY" or "CERTIFICATE". Concretely, in the branch handling
tlsConfig.Certificate (the code that calls splitCertKey and then appends
tlsConfig.Key into keyBytes before tls.X509KeyPair), parse tlsConfig.Key into
PEM blocks, and if any block.Type equals "PUBLIC KEY" or "CERTIFICATE" append a
field.Invalid on tlsFieldPath.Child("key") (similar to other errors) and do not
call tls.X509KeyPair; otherwise proceed with only private-key PEM blocks (or the
sanitized key) as now. Use the existing symbols tlsConfig.Key, tlsFieldPath,
sanitizePEM, splitCertKey, tls.X509KeyPair and ExtendedValidateRoute to locate
and implement the change.
---
Nitpick comments:
In `@pkg/cmd/infra/router/clientcmd.go`:
- Around line 64-68: The hardcoded increase of clientConfig.QPS to 50 and
clientConfig.Burst to 100 may be too aggressive for production; make these
values configurable and document/validate them: replace the fixed assignments
around clientConfig.QPS and clientConfig.Burst with configurable parameters
(e.g., flags or env vars) surfaced where KubeConfig() is used to build the
SubjectAccessReviewClient, add reasonable defaults and validation (min/max), and
add a short comment referencing SAR validation path so SRE can test; also update
tests or add a unit/integration test that exercises creating the
SubjectAccessReviewClient with custom QPS/Burst values and include guidance to
verify against API server capacity.
In `@pkg/router/template/configmanager/haproxy/testing/haproxy.go`:
- Line 49: Replace the hard-coded "/tmp" argument passed to os.CreateTemp with
os.TempDir() to restore portability; locate the os.CreateTemp call (the line
creating f, err := os.CreateTemp("/tmp", prefix) in haproxy.go) and change the
directory argument to os.TempDir() so the system-configured temp directory (and
TMPDIR) is respected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e0491b73-90e9-45d9-a4cd-53ee9310aca8
⛔ Files ignored due to path filters (6)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/library-go/pkg/authorization/authorizationutil/subject.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/library-go/pkg/authorization/authorizationutil/util.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/library-go/pkg/route/secretmanager/manager.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/library-go/pkg/secret/secret_monitor.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (16)
go.modhack/Makefile.debugpkg/cmd/infra/router/clientcmd.gopkg/cmd/infra/router/template.gopkg/router/controller/contention.gopkg/router/controller/route_secret_manager.gopkg/router/controller/route_secret_manager_test.gopkg/router/controller/shared_secret_manager.gopkg/router/controller/status.gopkg/router/controller/status_test.gopkg/router/routeapihelpers/validation.gopkg/router/routeapihelpers/validation_test.gopkg/router/router_test.gopkg/router/template/configmanager/haproxy/testing/haproxy.gopkg/router/writerlease/writerlease.gopkg/router/writerlease/writerlease_test.go
|
@bentito: This pull request references Jira Issue OCPBUGS-77056, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
pkg/router/controller/status.go (1)
145-150:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStatus write path can still drop SAR reason transitions.
These paths now always call
performIngressConditionUpdate, but Line 342 still usesconditionsEqual(contention comparator) to decide whether to mutate status. Because that comparator ignores reasons inignoreIngressConditionReason, transitions likeExternalCertificateSecretUpdated→ExternalCertificateSARCompletedcan still be treated as unchanged and not persisted.💡 Minimal fix
+func conditionsEqualForStatusWrite(a, b *routev1.RouteIngressCondition) bool { + return a.Type == b.Type && + a.Status == b.Status && + a.Reason == b.Reason && + a.Message == b.Message +} + func recordIngressCondition(route *routev1.Route, name, hostName string, condition routev1.RouteIngressCondition) (changed, created bool, at time.Time, latest, original *routev1.RouteIngress) { @@ if existingCondition != nil { condition.LastTransitionTime = existingCondition.LastTransitionTime - if !conditionsEqual(existingCondition, &condition) { + if !conditionsEqualForStatusWrite(existingCondition, &condition) { changed = true } } else {Also applies to: 157-162
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/controller/status.go` around lines 145 - 150, The status write path is still skipping updates because conditionsEqual ignores condition Reason via ignoreIngressConditionReason; update the comparator so Reason changes that matter (e.g., transitions between ExternalCertificateSecretUpdated and ExternalCertificateSARCompleted) cause a mutation. Concretely, modify conditionsEqual (or add a small wrapper used before calling performIngressConditionUpdate) to return false when the existing Condition.Reason != new Condition.Reason for RouteAdmitted SAR-related reasons (such as "ExternalCertificateSecretUpdated" and "ExternalCertificateSARCompleted"), so performIngressConditionUpdate is invoked and the new reason is persisted; keep ignoreIngressConditionReason behavior for other benign cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/router/controller/status.go`:
- Around line 145-150: The status write path is still skipping updates because
conditionsEqual ignores condition Reason via ignoreIngressConditionReason;
update the comparator so Reason changes that matter (e.g., transitions between
ExternalCertificateSecretUpdated and ExternalCertificateSARCompleted) cause a
mutation. Concretely, modify conditionsEqual (or add a small wrapper used before
calling performIngressConditionUpdate) to return false when the existing
Condition.Reason != new Condition.Reason for RouteAdmitted SAR-related reasons
(such as "ExternalCertificateSecretUpdated" and
"ExternalCertificateSARCompleted"), so performIngressConditionUpdate is invoked
and the new reason is persisted; keep ignoreIngressConditionReason behavior for
other benign cases.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 647baa90-9f64-40e5-8c98-c7599fdfaa7a
📒 Files selected for processing (1)
pkg/router/controller/status.go
The StatusAdmitter was holding a non-reentrant sync.Mutex across the entire plugin chain execution (HandleRoute). When a downstream plugin (like RouteSecretManager) needed to update the route status synchronously via RecordRouteUpdate, it attempted to reacquire the same lock, causing a deadlock that halted the event loop. This deadlock was masked in previous versions because secret validation occurred asynchronously. With the transition to synchronous validation, the deadlock became deterministic on cache sync, leading to pervasive E2E timeouts. The StatusAdmitter does not mutate any fields after initialization and delegates state updates to the thread-safe writerlease queue, rendering the lock unnecessary. - Removed the sync.Mutex from StatusAdmitter. - Stripped lock acquisition from HandleRoute and status update methods.
…hronous The SharedSecretManager previously used a blocking WaitForCacheSync during RegisterRoute. Because RegisterRoute was called synchronously from the router's main event loop (HandleRoute), the entire router could hang indefinitely if the API server was slow or if RBAC permissions prevented the informer from ever syncing. This caused timeouts in core router E2E tests (like 'HAProxy should run even if it has no routes'). This commit removes the blocking wait from RegisterRoute, allowing the main event loop to proceed immediately. It also updates GetSecret to use a fast-path cache check and fall back to a direct API call if the cache is not yet synced or the secret is missing. This maintains the performance benefits of the cache while guaranteeing that the router will never block on informer initialization.
- route_secret_manager: Gate SARCompleted status emission on route admission to prevent marking structurally invalid routes as active before ExtendedValidator runs. - status: Use strict equality for explicit status writes while protecting meaningful ignored reasons from empty overwrites. This ensures valid transitions like SecretUpdated -> SARCompleted are persisted without flapping. - writerlease: Fix shutdown deadlock by explicitly calling queue.ShutDown() before waiting on the worker WaitGroup. - Makefile.debug: Pass GOARCH to image builder for cross-platform support.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/router/controller/shared_secret_manager.go (1)
98-103: Per-namespace informers are never stopped; the passedctxis ignored.The informer goroutine is launched with
context.Background().Done()and is never torn down.UnregisterRouteonly removes the registry entry, so the goroutine and its namespace-wide secrets watch persist for the router's lifetime even after the last route in a namespace is unregistered. This accumulates one liveLIST/WATCHover all secrets per namespace ever seen — relevant given this PR targets large-scale behavior and clusters with namespace churn. Note also that thectxargument toRegisterRouteis unused here; a stored per-namespace stop function (or deriving fromctx) would letUnregisterRoutereclaim empty-namespace informers.Since informers cache every secret in a watched namespace, consider tracking a per-namespace route count and a
stopCh/cancel func soUnregisterRoutecan stop the informer once the namespace has no remaining registrations. This bounds memory and watch connections over time.Want me to draft the per-namespace refcount + stop-channel cleanup in
UnregisterRoute, or open a tracking issue for the deferred "unregister empty namespaces" optimization noted in the comment?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/controller/shared_secret_manager.go` around lines 98 - 103, The informer goroutine currently uses context.Background().Done() and is never stopped; modify RegisterRoute to use the provided ctx (or create a per-namespace context with cancel) and store a stop/cancel function and a refcount alongside the informer in m.informers (e.g., replace the raw inf entry with a small struct containing inf, refCount, cancelFunc/stopCh). Increment the refcount when RegisterRoute is called for an existing namespace and start the informer only when first created using ctx.Done(); in UnregisterRoute, decrement the namespace refcount and when it reaches zero call the stored cancelFunc/close stopCh and remove the informer entry so the informer goroutine and LIST/WATCH are stopped and resources reclaimed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/router/controller/shared_secret_manager.go`:
- Around line 182-196: SharedSecretManager.GetSecret returns informer-store
object directly on cache hits; change the cache-hit path in GetSecret to return
a caller-owned copy (use secret.DeepCopy()) to avoid mutability of shared cache
objects, and keep the existing API-call fallback when the informer isn’t synced
or the object is missing; also update the comment in populateRouteTLSFromSecret
to no longer claim RegisterRoute blocks until cache sync and to note that
GetSecret will fallback to a direct API call when the cache isn't synced or the
secret is absent; finally, review RegisterRoute/UnregisterRoute to either honor
the input ctx for informer lifetime or implement informer
shutdown/Garbage-collection when no routes remain (ensure UnregisterRoute stops
the informer for that namespace or uses ctx.Done()).
---
Nitpick comments:
In `@pkg/router/controller/shared_secret_manager.go`:
- Around line 98-103: The informer goroutine currently uses
context.Background().Done() and is never stopped; modify RegisterRoute to use
the provided ctx (or create a per-namespace context with cancel) and store a
stop/cancel function and a refcount alongside the informer in m.informers (e.g.,
replace the raw inf entry with a small struct containing inf, refCount,
cancelFunc/stopCh). Increment the refcount when RegisterRoute is called for an
existing namespace and start the informer only when first created using
ctx.Done(); in UnregisterRoute, decrement the namespace refcount and when it
reaches zero call the stored cancelFunc/close stopCh and remove the informer
entry so the informer goroutine and LIST/WATCH are stopped and resources
reclaimed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f848a726-608b-4bc3-8193-4c9f52a6e2e7
📒 Files selected for processing (2)
pkg/router/controller/shared_secret_manager.gopkg/router/controller/status.go
💤 Files with no reviewable changes (1)
- pkg/router/controller/status.go
| // Try to get from cache first | ||
| if inf.HasSynced() { | ||
| obj, exists, err := inf.GetStore().GetByKey(namespace + "/" + ref.secretName) | ||
| if err == nil && exists { | ||
| return obj.(*corev1.Secret), nil | ||
| } | ||
| } | ||
|
|
||
| // Fallback to direct API call if cache is not synced or object is missing | ||
| secret, err := m.kubeClient.CoreV1().Secrets(namespace).Get(ctx, ref.secretName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| return secret, nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find GetSecret consumers and inspect how the returned secret is used.
rg -nP --type=go -C5 '\.GetSecret\s*\('
# Locate the stale "blocks until cache sync" comment.
rg -nP --type=go -C2 'blocks until cache sync'Repository: openshift/router
Length of output: 3066
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show full body usage around the known call site
sed -n '340,470p' pkg/router/controller/route_secret_manager.go | nl -ba | sed -n '1,140p'
# Find all GetSecret call sites (method call form) in the non-vendor code
rg -n --type go '\.GetSecret\s*\(' pkg/router | sed -n '1,200p'
# Find any places that mutate the returned secret variable in populateRouteTLSFromSecret
# (heuristic: after 'secret, err :=' look for 'secret.' writes)
rg -n --type go 'populateRouteTLSFromSecret|secret\.' pkg/router/controller/route_secret_manager.go
# Find any direct writes to secret.Data in the repo
rg -n --type go 'secret\.Data\s*\[' pkg/router | sed -n '1,200p'Repository: openshift/router
Length of output: 103
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="pkg/router/controller/route_secret_manager.go"
echo "== populateRouteTLSFromSecret context =="
# Print the whole function region conservatively
sed -n '350,460p' "$FILE" | cat -n
echo "== All GetSecret call sites under pkg/router (non-vendor) =="
rg -n --type go '\.GetSecret\s*\(' pkg/router | head -n 200
echo "== Mutations in populateRouteTLSFromSecret (secret.* writes) =="
rg -n --type go 'populateRouteTLSFromSecret|secret\.' "$FILE"
echo "== Any secret.Data[...] writes under pkg/router (non-vendor) =="
rg -n --type go 'secret\.Data\s*\[' pkg/router | head -n 200Repository: openshift/router
Length of output: 7132
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Global GetSecret call sites (non-vendor) =="
rg -n --type go '\.GetSecret\s*\(' . --glob '!vendor/**' | head -n 200
echo "== Any secret.Data[...] writes (non-vendor) =="
rg -n --type go 'secret\.Data\s*\[' . --glob '!vendor/**' | head -n 200
echo "== Inspect shared_secret_manager.go RegisterRoute and informer lifecycle =="
sed -n '1,260p' pkg/router/controller/shared_secret_manager.go | cat -nRepository: openshift/router
Length of output: 7757
Harden GetSecret cache-hit mutability and update the stale “cache sync” comment
-
SharedSecretManager.GetSecretreturns the shared informer-store object on the cache-hit path; returnDeepCopy()there to keep a consistent “caller-owned” mutability contract (currentpopulateRouteTLSFromSecretonly readssecret.Data["tls.crt"]/["tls.key"], so this is defensive now).♻️ Return a copy from the cache hit
// Try to get from cache first if inf.HasSynced() { obj, exists, err := inf.GetStore().GetByKey(namespace + "/" + ref.secretName) if err == nil && exists { - return obj.(*corev1.Secret), nil + return obj.(*corev1.Secret).DeepCopy(), nil } } -
Update the stale comment in
populateRouteTLSFromSecret: it saysRegisterRouteblocks until cache sync, butGetSecretcan fall back to a direct API call when cache isn’t synced or the secret is missing. -
RegisterRouteruns the informer withcontext.Background().Done()andUnregisterRoutedoesn’t stop/GC the informer when a namespace becomes empty; consider honoringctxor implementing informer shutdown when no routes remain.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Try to get from cache first | |
| if inf.HasSynced() { | |
| obj, exists, err := inf.GetStore().GetByKey(namespace + "/" + ref.secretName) | |
| if err == nil && exists { | |
| return obj.(*corev1.Secret), nil | |
| } | |
| } | |
| // Fallback to direct API call if cache is not synced or object is missing | |
| secret, err := m.kubeClient.CoreV1().Secrets(namespace).Get(ctx, ref.secretName, metav1.GetOptions{}) | |
| if err != nil { | |
| return nil, err | |
| } | |
| return secret, nil | |
| } | |
| // Try to get from cache first | |
| if inf.HasSynced() { | |
| obj, exists, err := inf.GetStore().GetByKey(namespace + "/" + ref.secretName) | |
| if err == nil && exists { | |
| return obj.(*corev1.Secret).DeepCopy(), nil | |
| } | |
| } | |
| // Fallback to direct API call if cache is not synced or object is missing | |
| secret, err := m.kubeClient.CoreV1().Secrets(namespace).Get(ctx, ref.secretName, metav1.GetOptions{}) | |
| if err != nil { | |
| return nil, err | |
| } | |
| return secret, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/router/controller/shared_secret_manager.go` around lines 182 - 196,
SharedSecretManager.GetSecret returns informer-store object directly on cache
hits; change the cache-hit path in GetSecret to return a caller-owned copy (use
secret.DeepCopy()) to avoid mutability of shared cache objects, and keep the
existing API-call fallback when the informer isn’t synced or the object is
missing; also update the comment in populateRouteTLSFromSecret to no longer
claim RegisterRoute blocks until cache sync and to note that GetSecret will
fallback to a direct API call when the cache isn't synced or the secret is
absent; finally, review RegisterRoute/UnregisterRoute to either honor the input
ctx for informer lifetime or implement informer shutdown/Garbage-collection when
no routes remain (ensure UnregisterRoute stops the informer for that namespace
or uses ctx.Done()).
|
@bentito: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR implements asynchronous and cached
SubjectAccessReviewchecks forspec.tls.externalCertificateduring router startup.Previously, each external certificate triggered synchronous API validations that blocked the main thread. When scaling to thousands of external cert routes, the router hit
O(N * API_latency)delays and would hitCrashLoopBackOfffrom failing liveness probes.This fix maintains full RBAC security checks but executes them asynchronously in the background. A global
sync.Mapacts as anasyncSARCacheto drastically speed up repeated checks for the same underlying secret.Fixes: OCPBUGS-77056
Summary by CodeRabbit
New Features
Improvements
Bug Fixes