Skip to content

feat: extend enable/disable/teardown pattern to Workbench, Package Manager, and Chronicle#99

Open
ian-flores wants to merge 13 commits intomainfrom
extend-enable-disable-teardown
Open

feat: extend enable/disable/teardown pattern to Workbench, Package Manager, and Chronicle#99
ian-flores wants to merge 13 commits intomainfrom
extend-enable-disable-teardown

Conversation

@ian-flores
Copy link
Collaborator

@ian-flores ian-flores commented Feb 23, 2026

Summary

Extends the non-destructive enable/disable pattern introduced for Connect in #93 to the remaining three products:

  • Workbench: spec.workbench.enabled / spec.workbench.teardown
  • Package Manager: spec.packageManager.enabled / spec.packageManager.teardown
  • Chronicle: spec.chronicle.enabled / spec.chronicle.teardown

Setting enabled: false suspends a product (removes Deployment/Service/Ingress but preserves PVC, database, and secrets). Setting teardown: true when disabled triggers destructive deletion via the product's finalizer.

Changes

  • Add Enabled/Teardown fields to InternalWorkbenchSpec, InternalPackageManagerSpec, InternalChronicleSpec
  • Add Suspended field to WorkbenchSpec, PackageManagerSpec, ChronicleSpec
  • Site controller uses three-way branching per product: enabled → reconcile, disabled → suspend, teardown → delete CR
  • Product controllers skip serving resources when Suspended=true
  • Network policies cleaned up when products are disabled
  • Fix Workbench cleanupDeployedService (was a no-op)
  • Fix Chronicle CleanupChronicle (was a TODO no-op)
  • Add 9 tests (3 per product: disable-never-enabled, suspend-after-enable, teardown)
  • Update API reference, configuration guides, and site management docs

Test plan

  • just build compiles successfully
  • just mgenerate regenerates CRDs, deepcopy, and apply configs without errors
  • 9 new tests pass: TestSite{Workbench,PackageManager,Chronicle}{DisableNeverEnabled,SuspendAfterEnable,Teardown}
  • All existing tests pass (except pre-existing TestSiteReconcileWithExperimental panic on main)
  • CRDs include new enabled/teardown fields for all products
  • Tested on ganso01-staging: deployed adhoc image, disabled Workbench via kubectl patch site alt -p '{"spec":{"workbench":{"enabled":false}}}' — verified Deployment/Service/Ingress removed, PVCs preserved, Workbench CR marked Suspended=true. Re-enabled by removing the field — verified all serving resources recreated, Suspended cleared, data intact.
  • Verify teardown fully cleans up resources (not tested live — destructive)

Closes #95

…ut data loss

Extend the enable/disable/teardown pattern from Connect (PR #93) to
Workbench, Package Manager, and Chronicle.

- Add spec.{workbench,packageManager,chronicle}.enabled and .teardown
  fields to the Site CRD
- Add Suspended field to Workbench, PackageManager, and Chronicle CRDs
- Site controller uses three-way branching: enabled → reconcile,
  disabled → suspend (preserve data), teardown → delete CR
- Product controllers skip serving resources when Suspended=true
- Network policies cleaned up when products are disabled
- Fix Workbench cleanupDeployedService (was a no-op)
- Fix Chronicle CleanupChronicle (was a TODO no-op)
- Add 9 tests (3 per product: disable-never-enabled, suspend, teardown)

Closes #95
- Remove unused getChronicle test helper
- Correct "CRD" to "CR" in cleanup function godoc comments
- Fix double blank line in product-team-site-management.md
@claude
Copy link

claude bot commented Feb 23, 2026

Claude finished @ian-flores's task —— View job


Review: extend enable/disable/teardown pattern

This PR cleanly extends the existing Connect enable/disable/teardown pattern to Workbench, Package Manager, and Chronicle. The three-way branching (enabled → reconcile, disabled → suspend, teardown → delete CR) is consistent across all products, the API types follow established conventions, and the test coverage is solid with 9 new tests.

What looks good

  • Pattern is consistent with the Connect reference implementation from feat: allow disabling Connect without data loss, add explicit teardown field #93
  • Suspended is correctly cleared on re-enable — the reconcile* functions build fresh Spec structs that omit Suspended, defaulting to nil
  • Network policy cleanup when products are disabled
  • Tests cover the three key scenarios per product (never-enabled, suspend-after-enable, teardown)
  • Fixing the Workbench cleanupDeployedService no-op and Chronicle CleanupChronicle TODO

Findings

1. cleanupDeployedService naming is now misleadinginternal/controller/core/workbench.go:1060

The old no-op has been replaced with ~130 lines cleaning up Ingress, Service, Deployment, PVCs, ServiceAccounts, RBAC, ConfigMaps, Secrets, Traefik Middlewares, and SecretProviderClass. Called by the finalizer path (CleanupWorkbench at line 1022), which then also cleans up the database. The name cleanupDeployedService understates its scope — it deletes everything except the DB. Consider renaming to cleanupAllResources or cleanupNonDatabaseResources to distinguish from suspendDeployedService which truly only removes serving resources.

2. Chronicle CleanupChronicle may not clean up local storageinternal/controller/core/chronicle_controller.go:335

The new CleanupChronicle deletes Service, StatefulSet, ConfigMap, and ServiceAccounts — but not PVCs or local storage volumes. When LocalStorage.Enabled: true is configured, Chronicle uses a PVC for /var/lib/posit-chronicle/data. If this is mounted via a PVC managed outside the StatefulSet's volumeClaimTemplates, data could be orphaned after teardown. Worth confirming this is intentional or whether a PVC cleanup step is needed for the local-storage case.

3. Duplicated enabled/teardown computationinternal/controller/core/site_controller.go:165-181 and site_controller_networkpolicies.go:42-56

The enabled/teardown boolean checks are computed identically in both reconcileResources and reconcileNetworkPolicies. Consider extracting helpers:

func isProductEnabled(enabled *bool) bool {
    return enabled == nil || *enabled
}

Non-blocking — also applies to the existing Connect code but would improve maintainability.

4. Minor: suspendDeployedService duplication across products

The suspend implementations are nearly identical across all four products (differing only in resource types: StatefulSet vs Deployment). A shared helper could reduce future maintenance, but this mirrors the existing Connect pattern so fine to leave as-is.


Checklist

  • Kubebuilder annotations are correct (+kubebuilder:default, +optional)
  • New fields have sensible defaults (enabled=true, teardown=false)
  • Reconciliation is idempotent (disable is a no-op when already suspended)
  • Config flows from Site → Product correctly (Suspended cleared on re-enable)
  • Tests exist for all three scenarios per product
  • CRDs include new fields
  • DeepCopy generated code is correct
  • No security concerns (RBAC cleanup is appropriate, network policies cleaned up)

Note: I was unable to submit a formal GitHub review with inline comments due to tool permissions (add_comment_to_pending_review and create_and_submit_pull_request_review are not in my allowed tools list). To enable inline reviews, add these to --allowedTools.

@ian-flores ian-flores marked this pull request as ready for review February 24, 2026 19:55
Copy link

@Lytol Lytol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some nits and questions, but use what's useful and ignore what isn't. One general observation as well: the code assumes that someone will only set teardown: true if they've also set enabled: false, but we may want to assert or at least gracefully handle when someone does enabled: true + teardown: true. 🤷

Overall, looks good!

Comment on lines +341 to +361
key := client.ObjectKey{Name: c.ComponentName(), Namespace: req.Namespace}

// SERVICE
if err := internal.BasicDelete(ctx, r, l, key, &corev1.Service{}); err != nil {
return ctrl.Result{}, err
}

// STATEFULSET
if err := internal.BasicDelete(ctx, r, l, key, &v1.StatefulSet{}); err != nil {
return ctrl.Result{}, err
}

// CONFIGMAP
if err := internal.BasicDelete(ctx, r, l, key, &corev1.ConfigMap{}); err != nil {
return ctrl.Result{}, err
}

// SERVICE ACCOUNTS
if err := internal.BasicDelete(ctx, r, l, key, &corev1.ServiceAccount{}); err != nil {
return ctrl.Result{}, err
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This pattern is duplicated so often that it might be worth DRYing it up a bit by creating an internal.BatchDelete or similar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added internal.BatchDelete and applied it here, then expanded it to the suspend/cleanup functions in Connect and Package Manager, and to deleteServingResources in Workbench. Workbench's full cleanupDeployedService has too many different keys to batch cleanly so left that as-is.

}

l.Info("Chronicle cleanup complete")
return ctrl.Result{}, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outside the scope of this PR, but... what's the point of returning a ctrl.Result if it's always an empty object?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's a holdover from the controller-runtime scaffold pattern where everything returns (ctrl.Result, error). Leaving it for a separate cleanup since it'd affect all the product controllers.

)

// isProductEnabled returns true if the product is enabled (nil defaults to enabled).
func isProductEnabled(b *bool) bool {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the naming feels a little weird given that it's more of a safe dereference helper. Also, if you make the naming a little more generic, you could probably use this everywhere that you are referencing boolean pointers. Something like:

func checkBool(ptr *bool, default bool) bool {
  if ptr == nil {
    return default
  }
  return *ptr
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — renamed to checkBool(ptr \*bool, defaultVal bool) and applied it consistently for both the enabled (default true) and teardown (default false) patterns across site_controller.go and site_controller_networkpolicies.go. Also fixed Connect which was still doing the inline nil check.

l := r.GetLogger(ctx).WithValues(
"event", "cleanup-service",
"product", "connect",
"product", "workbench",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅

@Lytol Lytol self-requested a review February 26, 2026 22:38
Lytol
Lytol previously approved these changes Feb 26, 2026
Copy link

@Lytol Lytol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to approve, my bad!

Changes:
- Pass caller's logger into `deleteServingResources` so `BasicDelete` log lines carry the `event` key (`suspend-service` or `cleanup-service`) from the calling function
- Add `logr.Logger` parameter to `deleteServingResources` signature and remove its internal logger construction
- Update `suspendDeployedService` and `cleanupDeployedService` to pass their `l` to `deleteServingResources`
- Add `networkingv1` import to `workbench_test.go`
- Extend `TestWorkbenchReconciler_Suspended` to assert `Service` and `Ingress` are also absent when suspended
- Extend `TestWorkbenchReconciler_SuspendRemovesDeployment` to assert `Service` and `Ingress` are also removed on suspend
Changes:
- `chronicle_controller.go`: Added comment in `CleanupChronicle` noting that PVC cleanup would be needed if `VolumeClaimTemplates` are ever added (currently Chronicle uses EmptyDir/S3, so no PVCs exist)
- `workbench.go`: Moved `db.CleanupDatabasePasswordSecret` call into `cleanupDeployedService` (under the SECRETS section) and removed it from `CleanupWorkbench` to avoid redundancy — ensures the DB credential secret is deleted in the full teardown path
- `site_controller_chronicle.go`: Fixed misleading doc comment on `cleanupChronicle` — removed false claim that S3 data is destroyed; clarified that only Kubernetes resources and local EmptyDir volumes are deleted, and S3 buckets must be cleaned up separately
- `workbench_test.go`: Added assertion in `TestWorkbenchReconciler_SuspendRemovesDeployment` that the login ConfigMap (`<name>-login`) is preserved after suspension, verifying the "data resources are not deleted" promise
Build passes. Tests fail only due to missing `etcd` in the sandbox, which affects all tests in the package equally (pre-existing environment constraint).
Changes:
- Add `TestWorkbenchReconciler_CleanupDeletesDatabasePasswordSecret`: pre-creates the DB password secret and asserts it is deleted after `CleanupWorkbench` runs
- Add DB password secret pre-creation and preservation assertion to `TestWorkbenchReconciler_SuspendRemovesDeployment`: verifies the secret is not deleted when Workbench is suspended
Test failures are all due to `etcd` not being available in this sandbox environment — these integration tests require `kubebuilder` control plane binaries. The failures pre-exist and are unrelated to the changes. The build compiles cleanly.
Changes:
- Add `apierrors "k8s.io/apimachinery/pkg/api/errors"` import to `workbench_test.go`
- Replace `assert.Error(t, err, ...)` with `assert.True(t, apierrors.IsNotFound(err), ...)` in `TestWorkbenchReconciler_CleanupDeletesDatabasePasswordSecret` for a precise not-found assertion
Everything looks correct. Build passes, changes are consistent. Here's the summary:
---
Changes:
- Added `db.PasswordSecretName(name string) string` function returning `name + "-db-password"`, following the codebase's naming convention (e.g. `KeySecretName`, `LoginConfigmapName`)
- Updated `CleanupDatabasePasswordSecret` to look up/delete the secret at `PasswordSecretName(name)` instead of the raw component name
- Updated `EnsureDatabasePasswordSecretAndReturnIt` to create the secret at `PasswordSecretName(name)` for consistency
- Updated `TestWorkbenchReconciler_CleanupDeletesDatabasePasswordSecret` to create and assert at `db.PasswordSecretName(wb.ComponentName())`, ensuring the test name aligns with what `CleanupWorkbench` actually deletes (non-vacuous test)
- Updated `TestWorkbenchReconciler_Reconcile_SuspendResume` to create and check preservation at `db.PasswordSecretName(wb.ComponentName())`
- Added `db` package import to `workbench_test.go`
The build succeeds. The test failures are all due to a missing etcd binary (`/usr/local/kubebuilder/bin/etcd: no such file or directory`) — this is a pre-existing environment issue with kubebuilder's envtest infrastructure, unrelated to my change (which only modified documentation).
Changes:
- Add v1.15.0 upgrade notes to `docs/guides/upgrading.md` documenting the DB password secret rename from `<component>` to `<component>-db-password`, with step-by-step migration instructions for existing deployments
- Add BatchDelete helper to internal; use in Chronicle, Connect,
  Package Manager, and Workbench cleanup/suspend functions
- Rename isProductEnabled to checkBool(ptr *bool, defaultVal bool)
  and apply consistently for both enabled and teardown patterns
  across site_controller.go and site_controller_networkpolicies.go
@ian-flores
Copy link
Collaborator Author

Already handled — site_controller.go lines 169–188 check for teardown: true alongside enabled: true (or unset) and log a warning that teardown has no effect until enabled: false. This was already in place for Connect from #93; this PR adds the same guard for Workbench (line 175), Package Manager (line 181), and Chronicle (line 187).

@ian-flores ian-flores requested a review from Lytol February 27, 2026 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: extend enable/disable/teardown pattern to Workbench, Package Manager, and Chronicle

2 participants