feat: extend enable/disable/teardown pattern to Workbench, Package Manager, and Chronicle#99
feat: extend enable/disable/teardown pattern to Workbench, Package Manager, and Chronicle#99ian-flores wants to merge 13 commits intomainfrom
Conversation
…ut data loss Extend the enable/disable/teardown pattern from Connect (PR #93) to Workbench, Package Manager, and Chronicle. - Add spec.{workbench,packageManager,chronicle}.enabled and .teardown fields to the Site CRD - Add Suspended field to Workbench, PackageManager, and Chronicle CRDs - Site controller uses three-way branching: enabled → reconcile, disabled → suspend (preserve data), teardown → delete CR - Product controllers skip serving resources when Suspended=true - Network policies cleaned up when products are disabled - Fix Workbench cleanupDeployedService (was a no-op) - Fix Chronicle CleanupChronicle (was a TODO no-op) - Add 9 tests (3 per product: disable-never-enabled, suspend, teardown) Closes #95
- Remove unused getChronicle test helper - Correct "CRD" to "CR" in cleanup function godoc comments - Fix double blank line in product-team-site-management.md
|
Claude finished @ian-flores's task —— View job Review: extend enable/disable/teardown patternThis PR cleanly extends the existing Connect enable/disable/teardown pattern to Workbench, Package Manager, and Chronicle. The three-way branching (enabled → reconcile, disabled → suspend, teardown → delete CR) is consistent across all products, the API types follow established conventions, and the test coverage is solid with 9 new tests. What looks good
Findings1. The old no-op has been replaced with ~130 lines cleaning up Ingress, Service, Deployment, PVCs, ServiceAccounts, RBAC, ConfigMaps, Secrets, Traefik Middlewares, and SecretProviderClass. Called by the finalizer path ( 2. Chronicle The new 3. Duplicated enabled/teardown computation — The func isProductEnabled(enabled *bool) bool {
return enabled == nil || *enabled
}Non-blocking — also applies to the existing Connect code but would improve maintainability. 4. Minor: The suspend implementations are nearly identical across all four products (differing only in resource types: StatefulSet vs Deployment). A shared helper could reduce future maintenance, but this mirrors the existing Connect pattern so fine to leave as-is. Checklist
|
Lytol
left a comment
There was a problem hiding this comment.
I added some nits and questions, but use what's useful and ignore what isn't. One general observation as well: the code assumes that someone will only set teardown: true if they've also set enabled: false, but we may want to assert or at least gracefully handle when someone does enabled: true + teardown: true. 🤷
Overall, looks good!
| key := client.ObjectKey{Name: c.ComponentName(), Namespace: req.Namespace} | ||
|
|
||
| // SERVICE | ||
| if err := internal.BasicDelete(ctx, r, l, key, &corev1.Service{}); err != nil { | ||
| return ctrl.Result{}, err | ||
| } | ||
|
|
||
| // STATEFULSET | ||
| if err := internal.BasicDelete(ctx, r, l, key, &v1.StatefulSet{}); err != nil { | ||
| return ctrl.Result{}, err | ||
| } | ||
|
|
||
| // CONFIGMAP | ||
| if err := internal.BasicDelete(ctx, r, l, key, &corev1.ConfigMap{}); err != nil { | ||
| return ctrl.Result{}, err | ||
| } | ||
|
|
||
| // SERVICE ACCOUNTS | ||
| if err := internal.BasicDelete(ctx, r, l, key, &corev1.ServiceAccount{}); err != nil { | ||
| return ctrl.Result{}, err | ||
| } |
There was a problem hiding this comment.
nit: This pattern is duplicated so often that it might be worth DRYing it up a bit by creating an internal.BatchDelete or similar.
There was a problem hiding this comment.
Done — added internal.BatchDelete and applied it here, then expanded it to the suspend/cleanup functions in Connect and Package Manager, and to deleteServingResources in Workbench. Workbench's full cleanupDeployedService has too many different keys to batch cleanly so left that as-is.
| } | ||
|
|
||
| l.Info("Chronicle cleanup complete") | ||
| return ctrl.Result{}, nil |
There was a problem hiding this comment.
outside the scope of this PR, but... what's the point of returning a ctrl.Result if it's always an empty object?
There was a problem hiding this comment.
Agreed, it's a holdover from the controller-runtime scaffold pattern where everything returns (ctrl.Result, error). Leaving it for a separate cleanup since it'd affect all the product controllers.
| ) | ||
|
|
||
| // isProductEnabled returns true if the product is enabled (nil defaults to enabled). | ||
| func isProductEnabled(b *bool) bool { |
There was a problem hiding this comment.
nit: the naming feels a little weird given that it's more of a safe dereference helper. Also, if you make the naming a little more generic, you could probably use this everywhere that you are referencing boolean pointers. Something like:
func checkBool(ptr *bool, default bool) bool {
if ptr == nil {
return default
}
return *ptr
}
There was a problem hiding this comment.
Good call — renamed to checkBool(ptr \*bool, defaultVal bool) and applied it consistently for both the enabled (default true) and teardown (default false) patterns across site_controller.go and site_controller_networkpolicies.go. Also fixed Connect which was still doing the inline nil check.
| l := r.GetLogger(ctx).WithValues( | ||
| "event", "cleanup-service", | ||
| "product", "connect", | ||
| "product", "workbench", |
Changes: - Pass caller's logger into `deleteServingResources` so `BasicDelete` log lines carry the `event` key (`suspend-service` or `cleanup-service`) from the calling function - Add `logr.Logger` parameter to `deleteServingResources` signature and remove its internal logger construction - Update `suspendDeployedService` and `cleanupDeployedService` to pass their `l` to `deleteServingResources` - Add `networkingv1` import to `workbench_test.go` - Extend `TestWorkbenchReconciler_Suspended` to assert `Service` and `Ingress` are also absent when suspended - Extend `TestWorkbenchReconciler_SuspendRemovesDeployment` to assert `Service` and `Ingress` are also removed on suspend
Changes: - `chronicle_controller.go`: Added comment in `CleanupChronicle` noting that PVC cleanup would be needed if `VolumeClaimTemplates` are ever added (currently Chronicle uses EmptyDir/S3, so no PVCs exist) - `workbench.go`: Moved `db.CleanupDatabasePasswordSecret` call into `cleanupDeployedService` (under the SECRETS section) and removed it from `CleanupWorkbench` to avoid redundancy — ensures the DB credential secret is deleted in the full teardown path - `site_controller_chronicle.go`: Fixed misleading doc comment on `cleanupChronicle` — removed false claim that S3 data is destroyed; clarified that only Kubernetes resources and local EmptyDir volumes are deleted, and S3 buckets must be cleaned up separately - `workbench_test.go`: Added assertion in `TestWorkbenchReconciler_SuspendRemovesDeployment` that the login ConfigMap (`<name>-login`) is preserved after suspension, verifying the "data resources are not deleted" promise
Build passes. Tests fail only due to missing `etcd` in the sandbox, which affects all tests in the package equally (pre-existing environment constraint). Changes: - Add `TestWorkbenchReconciler_CleanupDeletesDatabasePasswordSecret`: pre-creates the DB password secret and asserts it is deleted after `CleanupWorkbench` runs - Add DB password secret pre-creation and preservation assertion to `TestWorkbenchReconciler_SuspendRemovesDeployment`: verifies the secret is not deleted when Workbench is suspended
Test failures are all due to `etcd` not being available in this sandbox environment — these integration tests require `kubebuilder` control plane binaries. The failures pre-exist and are unrelated to the changes. The build compiles cleanly. Changes: - Add `apierrors "k8s.io/apimachinery/pkg/api/errors"` import to `workbench_test.go` - Replace `assert.Error(t, err, ...)` with `assert.True(t, apierrors.IsNotFound(err), ...)` in `TestWorkbenchReconciler_CleanupDeletesDatabasePasswordSecret` for a precise not-found assertion
Everything looks correct. Build passes, changes are consistent. Here's the summary: --- Changes: - Added `db.PasswordSecretName(name string) string` function returning `name + "-db-password"`, following the codebase's naming convention (e.g. `KeySecretName`, `LoginConfigmapName`) - Updated `CleanupDatabasePasswordSecret` to look up/delete the secret at `PasswordSecretName(name)` instead of the raw component name - Updated `EnsureDatabasePasswordSecretAndReturnIt` to create the secret at `PasswordSecretName(name)` for consistency - Updated `TestWorkbenchReconciler_CleanupDeletesDatabasePasswordSecret` to create and assert at `db.PasswordSecretName(wb.ComponentName())`, ensuring the test name aligns with what `CleanupWorkbench` actually deletes (non-vacuous test) - Updated `TestWorkbenchReconciler_Reconcile_SuspendResume` to create and check preservation at `db.PasswordSecretName(wb.ComponentName())` - Added `db` package import to `workbench_test.go`
The build succeeds. The test failures are all due to a missing etcd binary (`/usr/local/kubebuilder/bin/etcd: no such file or directory`) — this is a pre-existing environment issue with kubebuilder's envtest infrastructure, unrelated to my change (which only modified documentation). Changes: - Add v1.15.0 upgrade notes to `docs/guides/upgrading.md` documenting the DB password secret rename from `<component>` to `<component>-db-password`, with step-by-step migration instructions for existing deployments
- Add BatchDelete helper to internal; use in Chronicle, Connect, Package Manager, and Workbench cleanup/suspend functions - Rename isProductEnabled to checkBool(ptr *bool, defaultVal bool) and apply consistently for both enabled and teardown patterns across site_controller.go and site_controller_networkpolicies.go
|
Already handled — |
Summary
Extends the non-destructive enable/disable pattern introduced for Connect in #93 to the remaining three products:
spec.workbench.enabled/spec.workbench.teardownspec.packageManager.enabled/spec.packageManager.teardownspec.chronicle.enabled/spec.chronicle.teardownSetting
enabled: falsesuspends a product (removes Deployment/Service/Ingress but preserves PVC, database, and secrets). Settingteardown: truewhen disabled triggers destructive deletion via the product's finalizer.Changes
Enabled/Teardownfields toInternalWorkbenchSpec,InternalPackageManagerSpec,InternalChronicleSpecSuspendedfield toWorkbenchSpec,PackageManagerSpec,ChronicleSpecSuspended=truecleanupDeployedService(was a no-op)CleanupChronicle(was a TODO no-op)Test plan
just buildcompiles successfullyjust mgenerateregenerates CRDs, deepcopy, and apply configs without errorsTestSite{Workbench,PackageManager,Chronicle}{DisableNeverEnabled,SuspendAfterEnable,Teardown}TestSiteReconcileWithExperimentalpanic on main)enabled/teardownfields for all productskubectl patch site alt -p '{"spec":{"workbench":{"enabled":false}}}'— verified Deployment/Service/Ingress removed, PVCs preserved, Workbench CR markedSuspended=true. Re-enabled by removing the field — verified all serving resources recreated,Suspendedcleared, data intact.Closes #95