Skip to content

OCPBUGS-86246: Clean up old ovnkube-client certificates#4120

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
ranjithrajaram:master
May 29, 2026
Merged

OCPBUGS-86246: Clean up old ovnkube-client certificates#4120
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
ranjithrajaram:master

Conversation

@ranjithrajaram
Copy link
Copy Markdown

@ranjithrajaram ranjithrajaram commented May 20, 2026

Trying to address OCPBUGS-86246

WICD now periodically removes old ovnkube-client certificate files from the CNI config directory to prevent disk space exhaustion. The cleanup runs during the normal reconciliation loop (every 2 minutes) and keeps only the 3 most recent certificates, deleting older ones.

Without this cleanup, certificate files accumulate indefinitely as the hybrid-overlay service generates a new timestamped certificate daily. This can lead to hundreds of certificate files consuming disk space.

The cleanup logic is implemented in a separate function for testability, with comprehensive unit tests covering various scenarios including the production case of 150+ accumulated files.

Summary by CodeRabbit

  • New Features

    • Automatic cleanup of stale certificate files during Windows service reconciliation; keeps the 2 most recent files and removes older ones. Cleanup errors are logged as warnings and do not stop reconciliation.
  • Tests

    • Added unit tests covering the certificate cleanup behavior and retention logic.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (2)
  • do-not-merge/work-in-progress
  • do-not-merge/hold

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a0346926-36f3-4615-92bf-179f0a9e7cdf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a best-effort certificate cleanup to the Windows controller: after reconcileServices and before waitUntilNodeReady the controller calls sc.cleanupOldCertificates(), which globs ovnkube-client-*.pem in windows.CniConfDir, sorts matches newest-first by filename, keeps the two most recent files, and deletes older matches. Individual deletion errors are logged as warnings and do not fail reconciliation. Tests create timestamped cert files in temp dirs, run cleanupOldCertificatesInDir, and verify the expected remaining files and ordering.

🚥 Pre-merge checks | ✅ 18 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test creates invalid timestamps (day 150) instead of realistic dates, breaking proper validation of lexicographic sorting logic; flagged review comment fix not applied. Apply suggested fix from review: use baseTime.AddDate(0, 0, i) and proper 2006-01-02-15-04-05 format for realistic test data.
✅ Passed checks (18 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the primary change: implementing cleanup of old ovnkube-client certificate files, which directly matches the changeset additions in controller.go and controller_test.go.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Go Best Practices & Build Tags ✅ Passed Both files have correct Windows build tags; cleanup functions wrap errors with %w, handle removal errors with logging, and don't ignore errors or panic.
Security: Secrets, Ssh & Csr ✅ Passed Certificate cleanup safely handles ovnkube-client-*.pem files, logs only counts/paths/filenames without content, no secret exposure in errors, no SSH/SFTP/CSR logic present.
Kubernetes Controller Patterns ✅ Passed All controller patterns properly implemented: critical errors requeue, cleanup is idempotent, watch predicates filter correctly, best-effort cleanup doesn't block reconciliation.
Windows Service Management ✅ Passed This PR is not about Windows service management. Changes add certificate cleanup logic only; existing service handling (dependencies, descriptions, SCM interactions) remains unmodified.
Platform-Specific Requirements ✅ Passed PR is Windows-exclusive code (//go:build windows). Cleanup operates only on windows.CniConfDir. Cloud platforms don't apply to this Windows service controller logic.
Stable And Deterministic Test Names ✅ Passed Test file uses standard Go testing.T, not Ginkgo. Test function TestCleanupOldCertificates uses static, descriptive test case names with no dynamic values.
Microshift Test Compatibility ✅ Passed PR adds a standard Go unit test, not Ginkgo e2e tests. The custom check applies only to new Ginkgo tests, not unit tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds unit tests only (TestCleanupOldCertificates using Go's testing package), not Ginkgo e2e tests. SNO compatibility check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies Windows daemon controller to cleanup stale certificates; no deployment manifests or scheduling constraints introduced.
Ote Binary Stdout Contract ✅ Passed No stdout violations. Cleanup functions called from controller reconciliation (not process-level). All logging via klog, not fmt.Print or similar in process-level code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR adds only a standard Go unit test (TestCleanupOldCertificates using testing.T), not Ginkgo e2e tests. The check is not applicable.
No-Weak-Crypto ✅ Passed PR adds file cleanup functions using only standard library operations (filepath.Glob, os.Remove, sort). No cryptographic functions, weak crypto, or secret comparisons present.
Container-Privileges ✅ Passed PR contains only Go source code changes for certificate cleanup logic; no container manifests, security contexts, or privilege configurations are introduced or modified.
No-Sensitive-Data-In-Logs ✅ Passed Logging statements contain only non-sensitive data: file counts, system paths, timestamped filenames, and OS errors. No credentials or PII exposed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

Hi @ranjithrajaram. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/daemon/controller/controller.go (1)

315-318: ⚡ Quick win

Align cleanup logging with project's logr standard.

The cleanup paths use klog.*f calls, but other controllers (csr.go, nodeconfig.go) use logr.Logger with structured logging. Switch lines 315–318 and 450–458 to logr-style log.Info(), log.Error(), and log.V(1).Info() for consistency. The manager is already configured with klogr (line 191), so logr is available.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/daemon/controller/controller.go` around lines 315 - 318, Replace the
klog.*f calls around the certificate cleanup with the package's logr Logger
calls: when calling sc.cleanupOldCertificates() use log.Error(err, "failed to
cleanup old certificates") on error and log.V(1).Info("completed old certificate
cleanup") (or log.Info for an important informational message) on success; do
the same for the other certificate cleanup block currently using klog (the later
cleanup that removes old cert files) by converting its klog.Infof/klog.Warningf
calls to log.Info / log.Error / log.V(1).Info and pass the error as the first
argument to log.Error plus short structured fields (e.g., "path" or "count") as
needed so the logging matches csr.go/nodeconfig.go style.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/daemon/controller/controller_test.go`:
- Around line 1090-1094: The test assumes fileTimes are in newest-first order
but filepath.Glob yields lexical order, so sort the slice before asserting:
apply a descending-time sort to the fileTimes slice (e.g., using sort.Slice or
sort.SliceStable comparing fileTimes[i].After(fileTimes[j])) right before the
loop that checks ordering; keep the variable name fileTimes and the existing
assertion loop unchanged so the subsequent asserts verify the sorted, descending
list.

---

Nitpick comments:
In `@pkg/daemon/controller/controller.go`:
- Around line 315-318: Replace the klog.*f calls around the certificate cleanup
with the package's logr Logger calls: when calling sc.cleanupOldCertificates()
use log.Error(err, "failed to cleanup old certificates") on error and
log.V(1).Info("completed old certificate cleanup") (or log.Info for an important
informational message) on success; do the same for the other certificate cleanup
block currently using klog (the later cleanup that removes old cert files) by
converting its klog.Infof/klog.Warningf calls to log.Info / log.Error /
log.V(1).Info and pass the error as the first argument to log.Error plus short
structured fields (e.g., "path" or "count") as needed so the logging matches
csr.go/nodeconfig.go style.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ad07803a-0f28-4c7b-a9b7-0ee4d2be7958

📥 Commits

Reviewing files that changed from the base of the PR and between 8640d4e and 0bbced8.

📒 Files selected for processing (2)
  • pkg/daemon/controller/controller.go
  • pkg/daemon/controller/controller_test.go

Comment thread pkg/daemon/controller/controller_test.go Outdated
@jrvaldes
Copy link
Copy Markdown
Contributor

/test ?

Copy link
Copy Markdown
Contributor

@jrvaldes jrvaldes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ranjithrajaram for working on this, PTAL at the comments

Comment thread pkg/daemon/controller/controller.go Outdated
Comment thread pkg/daemon/controller/controller.go Outdated
Comment thread pkg/daemon/controller/controller.go Outdated
Comment thread pkg/daemon/controller/controller.go Outdated
Comment thread pkg/daemon/controller/controller.go Outdated
@jrvaldes
Copy link
Copy Markdown
Contributor

/test unit

@jrvaldes
Copy link
Copy Markdown
Contributor

@jrvaldes
Copy link
Copy Markdown
Contributor

/retitle OCPBUGS-86246: Clean up old ovnkube-client certificates

@openshift-ci openshift-ci Bot changed the title Clean up old ovnkube-client certificates OCPBUGS-86246: Clean up old ovnkube-client certificates May 20, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@ranjithrajaram: This pull request references Jira Issue OCPBUGS-86246, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Trying to address OCPBUGS-86246

WICD now periodically removes old ovnkube-client certificate files from the CNI config directory to prevent disk space exhaustion. The cleanup runs during the normal reconciliation loop (every 2 minutes) and keeps only the 3 most recent certificates, deleting older ones.

Without this cleanup, certificate files accumulate indefinitely as the hybrid-overlay service generates a new timestamped certificate daily. This can lead to hundreds of certificate files consuming disk space.

The cleanup logic is implemented in a separate function for testability, with comprehensive unit tests covering various scenarios including the production case of 150+ accumulated files.

Summary by CodeRabbit

  • New Features

  • Added automatic cleanup of old certificate files during service reconciliation. The system retains the 3 most recent certificates and removes older ones. Cleanup failures emit warnings but do not halt reconciliation.

  • Tests

  • Added test coverage for certificate cleanup functionality.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mansikulkarni96
Copy link
Copy Markdown
Member

Thanks for working on this @ranjithrajaram
/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mansikulkarni96, ranjithrajaram

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2026
Copy link
Copy Markdown
Contributor

@jrvaldes jrvaldes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

Comment thread pkg/daemon/controller/controller.go
Comment thread pkg/daemon/controller/controller.go Outdated
@openshift-ci openshift-ci Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 26, 2026
@jrvaldes
Copy link
Copy Markdown
Contributor

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@jrvaldes: This pull request references Jira Issue OCPBUGS-86246, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jrvaldes
Copy link
Copy Markdown
Contributor

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

✅ Actions performed

Full review triggered.

@openshift-ci-robot
Copy link
Copy Markdown

@ranjithrajaram: This pull request references Jira Issue OCPBUGS-86246, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Trying to address OCPBUGS-86246

WICD now periodically removes old ovnkube-client certificate files from the CNI config directory to prevent disk space exhaustion. The cleanup runs during the normal reconciliation loop (every 2 minutes) and keeps only the 3 most recent certificates, deleting older ones.

Without this cleanup, certificate files accumulate indefinitely as the hybrid-overlay service generates a new timestamped certificate daily. This can lead to hundreds of certificate files consuming disk space.

The cleanup logic is implemented in a separate function for testability, with comprehensive unit tests covering various scenarios including the production case of 150+ accumulated files.

Summary by CodeRabbit

  • New Features

  • Automatic cleanup of stale certificate files during Windows service reconciliation; keeps the 2 most recent files and removes older ones. Cleanup errors are logged as warnings and do not stop reconciliation.

  • Tests

  • Added unit tests covering the certificate cleanup behavior and retention logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/daemon/controller/controller_test.go`:
- Around line 1063-1100: The generated cert filenames in the test loop use (i+1)
directly so for large i (e.g. 150) the date portion becomes three digits and
breaks lexicographic ordering; change the filename generation in the test (where
fileName := fmt.Sprintf("ovnkube-client-2026-05-%02d-22-00-00.pem", i+1)) to
compute a bounded, fixed-width day value (e.g. day := (i % 28) + 1) and then use
that day with %02d when creating the file via createCertFile; no other test
logic (including cleanupOldCertificatesInDir checks and the filename-based
assertions) needs to change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5d2b46f9-5a41-4c48-a872-9e3c3f19eff1

📥 Commits

Reviewing files that changed from the base of the PR and between da71e0c and 86e8a59.

📒 Files selected for processing (2)
  • pkg/daemon/controller/controller.go
  • pkg/daemon/controller/controller_test.go

Comment thread pkg/daemon/controller/controller_test.go Outdated
@jrvaldes
Copy link
Copy Markdown
Contributor

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2026
WICD now periodically removes old ovnkube-client certificate files from
the CNI config directory to prevent disk space exhaustion. The cleanup
runs during the normal reconciliation loop (every 2 minutes) and keeps
only the 2 most recent certificates, deleting older ones.

Without this cleanup, certificate files accumulate indefinitely as the
hybrid-overlay service generates a new timestamped certificate daily.
This can lead to hundreds of certificate files consuming disk space.

The cleanup uses filename-based sorting instead of expensive os.Stat()
calls, leveraging the ISO-8601-like timestamp format in the filenames
(ovnkube-client-YYYY-MM-DD-HH-MM-SS.pem) for efficient lexicographical
comparison.

The cleanup logic is implemented in a separate function for testability,
with comprehensive unit tests covering various scenarios including the
production case of 150+ accumulated files.
@jrvaldes
Copy link
Copy Markdown
Contributor

/test wicd-unit-vsphere

1 similar comment
@jrvaldes
Copy link
Copy Markdown
Contributor

/test wicd-unit-vsphere

@jrvaldes
Copy link
Copy Markdown
Contributor

/ok-to-test

@jrvaldes
Copy link
Copy Markdown
Contributor

/test wicd-unit-vsphere

@mansikulkarni96
Copy link
Copy Markdown
Member

/retest-required

@jrvaldes
Copy link
Copy Markdown
Contributor

/override ci/prow/vsphere-disconnected-e2e-operator

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jrvaldes: Overrode contexts on behalf of jrvaldes: ci/prow/vsphere-disconnected-e2e-operator

Details

In response to this:

/override ci/prow/vsphere-disconnected-e2e-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@ranjithrajaram: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jrvaldes
Copy link
Copy Markdown
Contributor

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 29, 2026
@jrvaldes
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 29, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit a3f7352 into openshift:master May 29, 2026
20 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@ranjithrajaram: Jira Issue OCPBUGS-86246: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-86246 has been moved to the MODIFIED state.

Details

In response to this:

Trying to address OCPBUGS-86246

WICD now periodically removes old ovnkube-client certificate files from the CNI config directory to prevent disk space exhaustion. The cleanup runs during the normal reconciliation loop (every 2 minutes) and keeps only the 3 most recent certificates, deleting older ones.

Without this cleanup, certificate files accumulate indefinitely as the hybrid-overlay service generates a new timestamped certificate daily. This can lead to hundreds of certificate files consuming disk space.

The cleanup logic is implemented in a separate function for testability, with comprehensive unit tests covering various scenarios including the production case of 150+ accumulated files.

Summary by CodeRabbit

  • New Features

  • Automatic cleanup of stale certificate files during Windows service reconciliation; keeps the 2 most recent files and removes older ones. Cleanup errors are logged as warnings and do not stop reconciliation.

  • Tests

  • Added unit tests covering the certificate cleanup behavior and retention logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants