Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
46e0cc5
feat: operator applies own CRDs at startup to ensure schema matches b…
ian-flores Feb 23, 2026
163ee01
Address review findings (job 54)
ian-flores Feb 23, 2026
89bc275
chore: wire verify-crds into verify-all to catch embedded CRD drift i…
ian-flores Feb 23, 2026
e004f96
Address review findings (job 56)
ian-flores Feb 23, 2026
c744011
Address review findings (job 59)
ian-flores Feb 23, 2026
d494e1a
fix: use kubebuilder RBAC marker for CRD permissions instead of manua…
ian-flores Feb 23, 2026
f795a3b
Merge branch 'main' into feat-self-managed-crds
ian-flores Feb 24, 2026
b983d81
Address review findings (job 61)
ian-flores Feb 23, 2026
4e1514f
Address review findings (job 67)
ian-flores Feb 23, 2026
d5e9e6c
Address review findings (job 71)
ian-flores Feb 23, 2026
3713c15
Address review findings (job 72)
ian-flores Feb 23, 2026
9c659b1
Address review findings (job 76)
ian-flores Feb 23, 2026
79a2132
Address review findings (job 83)
ian-flores Feb 23, 2026
65316c0
Address review findings (job 86)
ian-flores Feb 23, 2026
0224372
fix: address PR review findings for self-managed CRDs
ian-flores Feb 24, 2026
342a375
fix: remove create verb from CRD RBAC and sync generated files
ian-flores Feb 24, 2026
11b7c27
Address review findings (job 133)
ian-flores Feb 24, 2026
ea66625
Address review findings (job 79)
ian-flores Feb 24, 2026
52e2860
Address review findings (job 142)
ian-flores Feb 24, 2026
fb07eb5
Address review findings (job 140)
ian-flores Feb 24, 2026
e1929e6
fix: revert init() panic to newScheme() to fix CrashLoopBackOff
ian-flores Feb 24, 2026
0ae5483
Address review findings (job 141)
ian-flores Feb 24, 2026
ef93f29
Address review findings (job 146)
ian-flores Feb 24, 2026
780df4f
Address review findings (job 144)
ian-flores Feb 24, 2026
7432524
fix: remove hook-injected comments blocking kubebuilder RBAC marker
ian-flores Feb 24, 2026
3965505
fix: improve CRD apply circuit breaker visibility and error guidance
ian-flores Feb 25, 2026
9f946fc
fix: address Lytol review nits
ian-flores Feb 27, 2026
021a683
Merge branch 'main' into feat-self-managed-crds
ian-flores Feb 27, 2026
6dfeb34
fix: update TestManageCRDsFlag to use flag.BoolVar directly
ian-flores Feb 27, 2026
9b81ff4
fix: remove tracked .claude/tsc-cache files from git
ian-flores Feb 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,6 @@ go.work.sum
# Editor/IDE
# .idea/
# .vscode/

# Claude Code
.claude/tsc-cache/
14 changes: 12 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,21 @@ help: ## Display this help.
manifests: controller-gen ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.
$(CONTROLLER_GEN) rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases

.PHONY: copy-crds
copy-crds: manifests ## Copy generated CRDs to internal/crdapply/bases for embedding.
rm -f internal/crdapply/bases/*.yaml
cp config/crd/bases/*.yaml internal/crdapply/bases/

.PHONY: verify-crds
verify-crds: ## Verify that internal/crdapply/bases is in sync with config/crd/bases (fails if stale).
@diff -r --exclude='.*' config/crd/bases/ internal/crdapply/bases/ || \
(echo "internal/crdapply/bases/ is out of sync — run 'make copy-crds'" && exit 1)

.PHONY: generate-all
generate-all: generate generate-client generate-openapi

.PHONY: verify-all
verify-all: verify-apply verify-list verify-inform verify-client
verify-all: verify-apply verify-list verify-inform verify-client verify-crds

.PHONY: generate
generate: controller-gen ## Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
Expand Down Expand Up @@ -177,7 +187,7 @@ test-integration: go-test test-kind ## Run all tests (unit + integration).
##@ Build

.PHONY: build
build: manifests generate-all fmt vet ## Build manager binary.
build: copy-crds generate-all fmt vet ## Build manager binary.
go build -o bin/team-operator ./cmd/team-operator/main.go

.PHONY: docker-build
Expand Down
34 changes: 33 additions & 1 deletion cmd/team-operator/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,12 @@
package main

import (
"context"
"flag"
"fmt"
"os"
"strconv"
"time"

"github.com/posit-dev/team-operator/api/keycloak/v2alpha1"
"github.com/posit-dev/team-operator/api/product"
Expand All @@ -18,6 +21,7 @@ import (
// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/client-go/rest"
"k8s.io/klog/v2"

"k8s.io/apimachinery/pkg/runtime"
Expand All @@ -29,6 +33,7 @@ import (

positcov1beta1 "github.com/posit-dev/team-operator/api/core/v1beta1"
"github.com/posit-dev/team-operator/internal"
"github.com/posit-dev/team-operator/internal/crdapply"

corecontroller "github.com/posit-dev/team-operator/internal/controller/core"

Expand Down Expand Up @@ -73,11 +78,19 @@ func init() {
LoadSchemes(scheme)
}

func applyCRDs(ctx context.Context, timeout time.Duration, cfg *rest.Config) error {
crdCtx, crdCancel := context.WithTimeout(ctx, timeout)
defer crdCancel()
return crdapply.ApplyCRDs(crdCtx, cfg, setupLog)
}

func main() {
var (
metricsAddr string
enableLeaderElection bool
probeAddr string
manageCRDs bool
crdApplyTimeout time.Duration
)

flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
Expand All @@ -87,6 +100,11 @@ func main() {
"Enable leader election for team-operator. "+
"Enabling this will ensure there is only one active team-operator.")

flag.BoolVar(&manageCRDs, "manage-crds", true,
"Apply CRDs on startup to ensure schema is in sync with operator version")
flag.DurationVar(&crdApplyTimeout, "crd-apply-timeout", 60*time.Second,
"Timeout for applying CRDs at startup")

opts := zap.Options{Development: true}

opts.BindFlags(flag.CommandLine)
Expand Down Expand Up @@ -132,6 +150,20 @@ func main() {
os.Exit(1)
}

ctx := ctrl.SetupSignalHandler()

if manageCRDs {
if crdApplyTimeout <= 0 {
setupLog.Error(fmt.Errorf("--crd-apply-timeout must be positive, got %v", crdApplyTimeout), "invalid flag value")
os.Exit(1)
}
if err := applyCRDs(ctx, crdApplyTimeout, mgr.GetConfig()); err != nil {
setupLog.Error(err, "CRD apply failed after timeout; pod will exit and Kubernetes will restart with backoff",
"hint", "verify RBAC grants get/update/patch on customresourcedefinitions, or set --manage-crds=false to disable")
os.Exit(1)
}
Comment on lines +156 to +164
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alternatively, this seems like a wonderful candidate to encapsulate the retry logic inside a function.

}

if err = (&corecontroller.SiteReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Expand Down Expand Up @@ -207,7 +239,7 @@ func main() {
}

setupLog.Info("starting team-operator")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
if err := mgr.Start(ctx); err != nil {
setupLog.Error(err, "problem running team-operator")
os.Exit(1)
}
Expand Down
17 changes: 17 additions & 0 deletions cmd/team-operator/main_test.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package main

import (
"flag"
"testing"

"github.com/stretchr/testify/require"
Expand All @@ -16,3 +17,19 @@ func TestThings(t *testing.T) {
// this should probably include a comment...
require.Contains(t, gengo.StdGeneratedBy, "//")
}

func TestManageCRDsFlag(t *testing.T) {
fs := flag.NewFlagSet("test", flag.ContinueOnError)
var manageCRDs bool
fs.BoolVar(&manageCRDs, "manage-crds", true,
"Apply CRDs on startup to ensure schema is in sync with operator version")

// Default is true: CRD management is enabled out of the box.
require.Equal(t, "true", fs.Lookup("manage-crds").DefValue)
require.NoError(t, fs.Parse([]string{}))
require.True(t, manageCRDs)

// --manage-crds=false opts out of CRD management (e.g. for GitOps environments).
require.NoError(t, fs.Parse([]string{"--manage-crds=false"}))
require.False(t, manageCRDs)
}
8 changes: 8 additions & 0 deletions config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@ rules:
- patch
- update
- watch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- get
- patch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
Expand Down
8 changes: 8 additions & 0 deletions dist/chart/templates/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,14 @@ rules:
- patch
- update
- watch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- get
- patch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
Expand Down
7 changes: 7 additions & 0 deletions dist/chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,13 @@ controllerManager:
- "--leader-elect"
- "--metrics-bind-address=:8443"
- "--health-probe-bind-address=:8081"
# Operator applies its own CRDs at startup to ensure schema matches the binary version.
# This solves CRD drift when using adhoc images or emergency patches.
# Set to false if you manage CRDs externally (Flux, ArgoCD, or Helm with GitOps).
# - "--manage-crds=false"
# After this deadline the operator exits and Kubernetes restarts it with exponential backoff.
# Increase only if CRD apply is consistently slow (e.g., large schemas, slow API server).
# - "--crd-apply-timeout=60s"
env:
WATCH_NAMESPACES: "posit-team"
# AWS_REGION: "us-east-1" # Set if deploying on AWS
Expand Down
47 changes: 47 additions & 0 deletions docs/guides/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,53 @@

This guide provides comprehensive instructions for upgrading the Team Operator, including pre-upgrade preparation, upgrade procedures, version-specific migrations, and troubleshooting.

## CRD Management (v1.15+)

Starting with v1.15.0, the operator automatically applies its own CRDs at startup using server-side apply. This ensures the CRD schema always matches the running operator binary, even in cases where only the container image is updated without a full Helm chart upgrade (e.g., adhoc images for testing).

The operator uses the `--manage-crds` flag (default: `true`) to control this behavior. To opt out (for example, if you manage CRDs via Flux or ArgoCD), set:

```yaml
controllerManager:
container:
args:
- "--manage-crds=false"
```

When `--manage-crds=false`, the operator starts without touching CRDs, and you are responsible for keeping them in sync with the operator version.

**Benefits of automatic CRD management:**
- CRDs are always in sync with the operator version
- Works with adhoc images (e.g., PR branches) without requiring Helm chart changes
- Uses server-side apply (SSA) which is idempotent and only updates when schema differs
- No manual CRD management needed for most deployments

**When to disable:**
- GitOps workflows (Flux, ArgoCD) that manage CRDs separately
- Security policies requiring explicit CRD review before application
- Multi-tenant clusters where CRD updates require approval

**RBAC Permissions:**
The operator requires the following RBAC permissions on its own CRDs:
- `get` - to check if CRDs exist
- `patch` - to apply schema updates via server-side apply
- `update` - to modify CRD metadata

The Helm chart automatically grants these permissions. The operator intentionally omits the `delete` verb to prevent accidental data loss.

**Note on CRD deletion:** Because the operator's RBAC omits the `delete` verb for CRDs, if a future operator version removes a resource type, the now-orphaned CRD will remain in the cluster and must be removed manually:

```bash
kubectl delete crd <crd-name>.core.posit.team
```

Before deleting an orphaned CRD, ensure all custom resources of that type have been removed to avoid losing data:

```bash
kubectl get <resource-plural> -A # verify no instances remain
kubectl delete crd <crd-name>.core.posit.team
```

## Before Upgrading

### Backup Procedures
Expand Down
Loading