OCPBUGS-86246: Clean up old ovnkube-client certificates#4120
Conversation
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Excluded labels (none allowed) (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a best-effort certificate cleanup to the Windows controller: after reconcileServices and before waitUntilNodeReady the controller calls sc.cleanupOldCertificates(), which globs ovnkube-client-*.pem in windows.CniConfDir, sorts matches newest-first by filename, keeps the two most recent files, and deletes older matches. Individual deletion errors are logged as warnings and do not fail reconciliation. Tests create timestamped cert files in temp dirs, run cleanupOldCertificatesInDir, and verify the expected remaining files and ordering. 🚥 Pre-merge checks | ✅ 18 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (18 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @ranjithrajaram. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/daemon/controller/controller.go (1)
315-318: ⚡ Quick winAlign cleanup logging with project's logr standard.
The cleanup paths use
klog.*fcalls, but other controllers (csr.go,nodeconfig.go) uselogr.Loggerwith structured logging. Switch lines 315–318 and 450–458 to logr-stylelog.Info(),log.Error(), andlog.V(1).Info()for consistency. The manager is already configured withklogr(line 191), so logr is available.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/daemon/controller/controller.go` around lines 315 - 318, Replace the klog.*f calls around the certificate cleanup with the package's logr Logger calls: when calling sc.cleanupOldCertificates() use log.Error(err, "failed to cleanup old certificates") on error and log.V(1).Info("completed old certificate cleanup") (or log.Info for an important informational message) on success; do the same for the other certificate cleanup block currently using klog (the later cleanup that removes old cert files) by converting its klog.Infof/klog.Warningf calls to log.Info / log.Error / log.V(1).Info and pass the error as the first argument to log.Error plus short structured fields (e.g., "path" or "count") as needed so the logging matches csr.go/nodeconfig.go style.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/daemon/controller/controller_test.go`:
- Around line 1090-1094: The test assumes fileTimes are in newest-first order
but filepath.Glob yields lexical order, so sort the slice before asserting:
apply a descending-time sort to the fileTimes slice (e.g., using sort.Slice or
sort.SliceStable comparing fileTimes[i].After(fileTimes[j])) right before the
loop that checks ordering; keep the variable name fileTimes and the existing
assertion loop unchanged so the subsequent asserts verify the sorted, descending
list.
---
Nitpick comments:
In `@pkg/daemon/controller/controller.go`:
- Around line 315-318: Replace the klog.*f calls around the certificate cleanup
with the package's logr Logger calls: when calling sc.cleanupOldCertificates()
use log.Error(err, "failed to cleanup old certificates") on error and
log.V(1).Info("completed old certificate cleanup") (or log.Info for an important
informational message) on success; do the same for the other certificate cleanup
block currently using klog (the later cleanup that removes old cert files) by
converting its klog.Infof/klog.Warningf calls to log.Info / log.Error /
log.V(1).Info and pass the error as the first argument to log.Error plus short
structured fields (e.g., "path" or "count") as needed so the logging matches
csr.go/nodeconfig.go style.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: ad07803a-0f28-4c7b-a9b7-0ee4d2be7958
📒 Files selected for processing (2)
pkg/daemon/controller/controller.gopkg/daemon/controller/controller_test.go
|
/test ? |
jrvaldes
left a comment
There was a problem hiding this comment.
thanks @ranjithrajaram for working on this, PTAL at the comments
|
/test unit |
|
@ranjithrajaram add the jira id |
|
/retitle OCPBUGS-86246: Clean up old ovnkube-client certificates |
|
@ranjithrajaram: This pull request references Jira Issue OCPBUGS-86246, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Thanks for working on this @ranjithrajaram |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mansikulkarni96, ranjithrajaram The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@jrvaldes: This pull request references Jira Issue OCPBUGS-86246, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@ranjithrajaram: This pull request references Jira Issue OCPBUGS-86246, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/daemon/controller/controller_test.go`:
- Around line 1063-1100: The generated cert filenames in the test loop use (i+1)
directly so for large i (e.g. 150) the date portion becomes three digits and
breaks lexicographic ordering; change the filename generation in the test (where
fileName := fmt.Sprintf("ovnkube-client-2026-05-%02d-22-00-00.pem", i+1)) to
compute a bounded, fixed-width day value (e.g. day := (i % 28) + 1) and then use
that day with %02d when creating the file via createCertFile; no other test
logic (including cleanupOldCertificatesInDir checks and the filename-based
assertions) needs to change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5d2b46f9-5a41-4c48-a872-9e3c3f19eff1
📒 Files selected for processing (2)
pkg/daemon/controller/controller.gopkg/daemon/controller/controller_test.go
|
/hold |
WICD now periodically removes old ovnkube-client certificate files from the CNI config directory to prevent disk space exhaustion. The cleanup runs during the normal reconciliation loop (every 2 minutes) and keeps only the 2 most recent certificates, deleting older ones. Without this cleanup, certificate files accumulate indefinitely as the hybrid-overlay service generates a new timestamped certificate daily. This can lead to hundreds of certificate files consuming disk space. The cleanup uses filename-based sorting instead of expensive os.Stat() calls, leveraging the ISO-8601-like timestamp format in the filenames (ovnkube-client-YYYY-MM-DD-HH-MM-SS.pem) for efficient lexicographical comparison. The cleanup logic is implemented in a separate function for testability, with comprehensive unit tests covering various scenarios including the production case of 150+ accumulated files.
|
/test wicd-unit-vsphere |
1 similar comment
|
/test wicd-unit-vsphere |
|
/ok-to-test |
|
/test wicd-unit-vsphere |
|
/retest-required |
|
/override ci/prow/vsphere-disconnected-e2e-operator |
|
@jrvaldes: Overrode contexts on behalf of jrvaldes: ci/prow/vsphere-disconnected-e2e-operator DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@ranjithrajaram: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/hold cancel |
|
/lgtm |
|
@ranjithrajaram: Jira Issue OCPBUGS-86246: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-86246 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Trying to address OCPBUGS-86246
WICD now periodically removes old ovnkube-client certificate files from the CNI config directory to prevent disk space exhaustion. The cleanup runs during the normal reconciliation loop (every 2 minutes) and keeps only the 3 most recent certificates, deleting older ones.
Without this cleanup, certificate files accumulate indefinitely as the hybrid-overlay service generates a new timestamped certificate daily. This can lead to hundreds of certificate files consuming disk space.
The cleanup logic is implemented in a separate function for testability, with comprehensive unit tests covering various scenarios including the production case of 150+ accumulated files.
Summary by CodeRabbit
New Features
Tests