Adding templates for automatic scaling, and improving default template values

jc · jc · commit ea5103542756 · 2025-12-03T16:25:43.000-05:00
diff --git a/docs/fastapi-kubernetes-deployment.md b/docs/fastapi-kubernetes-deployment.md
@@ -696,29 +696,109 @@ API Forge provides comprehensive health endpoints:
 
 ### Resource Requests and Limits
 
-Set appropriate resource requests and limits:
+Resource configuration is managed through `values.yaml`. The templates dynamically read these values:
 
 ```yaml
-resources:
-  requests:
-    cpu: 250m
-    memory: 256Mi
-  limits:
-    cpu: 1000m
-    memory: 512Mi
+# infra/helm/api-forge/values.yaml
+app:
+  resources:
+    requests:
+      cpu: 250m
+      memory: 256Mi
+    limits:
+      cpu: 1000m
+      memory: 1Gi
+```
+
+**Production Sizing Guidelines**:
+
+| Component | Requests (CPU/Mem) | Limits (CPU/Mem) | Notes |
+|-----------|-------------------|------------------|-------|
+| App | 250m / 256Mi | 1000m / 1Gi | Scale horizontally with HPA |
+| Worker | 250m / 256Mi | 1000m / 1Gi | Conservative scale-down for workflows |
+| PostgreSQL | 500m / 1Gi | 2000m / 4Gi | Consider managed DB for HA |
+| Redis | 250m / 256Mi | 1000m / 1Gi | Match maxMemory config |
+| Temporal | 500m / 1Gi | 2000m / 4Gi | Single instance sufficient for most loads |
+
+### Horizontal Pod Autoscaling (HPA)
+
+The Helm chart includes built-in HPA support for the app and worker deployments. Enable autoscaling in `values.yaml`:
+
+```yaml
+# infra/helm/api-forge/values.yaml
+app:
+  replicas: 1  # Base replicas when HPA is disabled
+  autoscaling:
+    enabled: true
+    minReplicas: 1
+    maxReplicas: 5
+    targetCPUUtilizationPercentage: 70
+    targetMemoryUtilizationPercentage: 80
+    behavior:
+      scaleDown:
+        stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
+        percentValue: 10                  # Scale down 10% at a time
+        periodSeconds: 60
+      scaleUp:
+        stabilizationWindowSeconds: 0    # Scale up immediately
+        percentValue: 100
+        podsValue: 4                      # Add up to 4 pods at once
+        periodSeconds: 15
+
+worker:
+  autoscaling:
+    enabled: true
+    minReplicas: 1
+    maxReplicas: 5
+    behavior:
+      scaleDown:
+        stabilizationWindowSeconds: 600  # Workers scale down more conservatively
+        periodSeconds: 120               # to avoid disrupting running workflows
 ```
 
-**Guidelines**:
-- **Requests**: Minimum resources guaranteed
-- **Limits**: Maximum resources allowed
-- **FastAPI App**: 250m CPU, 256-512Mi memory
-- **Worker**: 250m CPU, 256-512Mi memory
-- **PostgreSQL**: 500m CPU, 1Gi memory
-- **Redis**: 100m CPU, 128Mi memory
+When `autoscaling.enabled: true`, the HPA controller manages replica count automatically based on CPU/memory metrics.
+
+**Check HPA status:**
+```bash
+kubectl get hpa -n api-forge-prod
+kubectl describe hpa app -n api-forge-prod
+```
+
+### Pod Disruption Budgets (PDB)
+
+PDBs ensure service availability during voluntary disruptions (node drains, upgrades). The chart includes PDBs for all services:
+
+```yaml
+# infra/helm/api-forge/values.yaml
+app:
+  podDisruptionBudget:
+    enabled: true
+    maxUnavailable: 1   # Allow 1 pod to be unavailable (works with any replica count)
+    # Or use minAvailable (but blocks eviction when replicas=1):
+    # minAvailable: 1
+
+postgres:
+  podDisruptionBudget:
+    enabled: true
+    maxUnavailable: 1
+
+redis:
+  podDisruptionBudget:
+    enabled: true
+    maxUnavailable: 1
+```
+
+> **Note:** Use `maxUnavailable` instead of `minAvailable` when running single-replica deployments. With `minAvailable: 1` and only 1 replica, Kubernetes cannot evict the pod during voluntary disruptions (node drains, upgrades), causing a deadlock.
+
+**Check PDB status:**
+```bash
+kubectl get pdb -n api-forge-prod
+kubectl describe pdb app -n api-forge-prod
+```
 
-### Horizontal Pod Autoscaling
+### Manual Horizontal Pod Autoscaling
 
-Scale based on CPU/memory utilization:
+If you prefer manual HPA configuration or need custom metrics:
 
 ```yaml
 apiVersion: autoscaling/v2
@@ -1246,18 +1326,19 @@ kubectl apply -f argocd-application.yaml
 
 1. **Use Helm for deployments** - Provides templating, versioning, and rollback capabilities
 2. **Sync config.yaml settings** - Let the CLI handle redis.enabled and temporal.enabled synchronization
-3. **Set resource requests and limits** - Define appropriate limits for all containers
-4. **Implement health checks** - Configure liveness and readiness probes
-5. **Use secrets properly** - Never store sensitive data in ConfigMaps or values.yaml
-6. **Enable NetworkPolicies** - Restrict pod-to-pod communication
-7. **Use Ingress with TLS** - Secure external access with TLS certificates
-8. **Implement HPA** - Enable Horizontal Pod Autoscaling for dynamic scaling
-9. **Use PersistentVolumes** - Ensure data persistence for stateful services
-10. **Tag images with versions** - Avoid using `latest` in production
-11. **Monitor and log** - Implement comprehensive monitoring and logging
-12. **Test locally first** - Use Minikube to test deployments before production
-13. **Use External Secrets Operator** - For production secret management
-14. **Leverage Helm rollbacks** - Easy rollback to previous releases if issues arise
+3. **Set resource requests and limits** - Configure in `values.yaml` for all containers
+4. **Enable HPA for production** - Set `app.autoscaling.enabled: true` for automatic scaling
+5. **Enable PDBs** - Ensure `podDisruptionBudget.enabled: true` for service availability during maintenance
+6. **Implement health checks** - Configure liveness and readiness probes
+7. **Use secrets properly** - Never store sensitive data in ConfigMaps or values.yaml
+8. **Enable NetworkPolicies** - Restrict pod-to-pod communication
+9. **Use Ingress with TLS** - Secure external access with TLS certificates
+10. **Use PersistentVolumes** - Ensure data persistence for stateful services
+11. **Tag images with versions** - Avoid using `latest` in production
+12. **Monitor and log** - Implement comprehensive monitoring and logging
+13. **Test locally first** - Use Minikube to test deployments before production
+14. **Use External Secrets Operator** - For production secret management
+15. **Leverage Helm rollbacks** - Use `deploy rollback` CLI command if issues arise
 
 ## Helm-Specific Tips
 
diff --git a/infra/helm/api-forge/templates/autoscaling/app-hpa.yaml b/infra/helm/api-forge/templates/autoscaling/app-hpa.yaml
@@ -0,0 +1,52 @@
+{{- if and .Values.app.enabled .Values.app.autoscaling.enabled }}
+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: app
+  namespace: {{ .Values.global.namespace }}
+  labels:
+    app.kubernetes.io/name: app
+    app.kubernetes.io/component: application
+    app.kubernetes.io/part-of: api-forge
+spec:
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: app
+  minReplicas: {{ .Values.app.autoscaling.minReplicas | default 1 }}
+  maxReplicas: {{ .Values.app.autoscaling.maxReplicas | default 5 }}
+  metrics:
+    {{- if .Values.app.autoscaling.targetCPUUtilizationPercentage }}
+    - type: Resource
+      resource:
+        name: cpu
+        target:
+          type: Utilization
+          averageUtilization: {{ .Values.app.autoscaling.targetCPUUtilizationPercentage }}
+    {{- end }}
+    {{- if .Values.app.autoscaling.targetMemoryUtilizationPercentage }}
+    - type: Resource
+      resource:
+        name: memory
+        target:
+          type: Utilization
+          averageUtilization: {{ .Values.app.autoscaling.targetMemoryUtilizationPercentage }}
+    {{- end }}
+  behavior:
+    scaleDown:
+      stabilizationWindowSeconds: {{ .Values.app.autoscaling.behavior.scaleDown.stabilizationWindowSeconds | default 300 }}
+      policies:
+        - type: Percent
+          value: {{ .Values.app.autoscaling.behavior.scaleDown.percentValue | default 10 }}
+          periodSeconds: {{ .Values.app.autoscaling.behavior.scaleDown.periodSeconds | default 60 }}
+    scaleUp:
+      stabilizationWindowSeconds: {{ .Values.app.autoscaling.behavior.scaleUp.stabilizationWindowSeconds | default 0 }}
+      policies:
+        - type: Percent
+          value: {{ .Values.app.autoscaling.behavior.scaleUp.percentValue | default 100 }}
+          periodSeconds: {{ .Values.app.autoscaling.behavior.scaleUp.periodSeconds | default 15 }}
+        - type: Pods
+          value: {{ .Values.app.autoscaling.behavior.scaleUp.podsValue | default 4 }}
+          periodSeconds: {{ .Values.app.autoscaling.behavior.scaleUp.periodSeconds | default 15 }}
+      selectPolicy: Max
+{{- end }}
diff --git a/infra/helm/api-forge/templates/autoscaling/worker-hpa.yaml b/infra/helm/api-forge/templates/autoscaling/worker-hpa.yaml
@@ -0,0 +1,53 @@
+{{- if and .Values.worker.enabled .Values.temporal.enabled .Values.worker.autoscaling.enabled }}
+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: worker
+  namespace: {{ .Values.global.namespace }}
+  labels:
+    app.kubernetes.io/name: worker
+    app.kubernetes.io/component: temporal-worker
+    app.kubernetes.io/part-of: api-forge
+spec:
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: worker
+  minReplicas: {{ .Values.worker.autoscaling.minReplicas | default 1 }}
+  maxReplicas: {{ .Values.worker.autoscaling.maxReplicas | default 5 }}
+  metrics:
+    {{- if .Values.worker.autoscaling.targetCPUUtilizationPercentage }}
+    - type: Resource
+      resource:
+        name: cpu
+        target:
+          type: Utilization
+          averageUtilization: {{ .Values.worker.autoscaling.targetCPUUtilizationPercentage }}
+    {{- end }}
+    {{- if .Values.worker.autoscaling.targetMemoryUtilizationPercentage }}
+    - type: Resource
+      resource:
+        name: memory
+        target:
+          type: Utilization
+          averageUtilization: {{ .Values.worker.autoscaling.targetMemoryUtilizationPercentage }}
+    {{- end }}
+  behavior:
+    scaleDown:
+      # Worker scale-down should be more conservative to avoid disrupting running workflows
+      stabilizationWindowSeconds: {{ .Values.worker.autoscaling.behavior.scaleDown.stabilizationWindowSeconds | default 600 }}
+      policies:
+        - type: Percent
+          value: {{ .Values.worker.autoscaling.behavior.scaleDown.percentValue | default 10 }}
+          periodSeconds: {{ .Values.worker.autoscaling.behavior.scaleDown.periodSeconds | default 120 }}
+    scaleUp:
+      stabilizationWindowSeconds: {{ .Values.worker.autoscaling.behavior.scaleUp.stabilizationWindowSeconds | default 0 }}
+      policies:
+        - type: Percent
+          value: {{ .Values.worker.autoscaling.behavior.scaleUp.percentValue | default 100 }}
+          periodSeconds: {{ .Values.worker.autoscaling.behavior.scaleUp.periodSeconds | default 15 }}
+        - type: Pods
+          value: {{ .Values.worker.autoscaling.behavior.scaleUp.podsValue | default 2 }}
+          periodSeconds: {{ .Values.worker.autoscaling.behavior.scaleUp.periodSeconds | default 15 }}
+      selectPolicy: Max
+{{- end }}
diff --git a/infra/helm/api-forge/templates/deployments/app.yaml b/infra/helm/api-forge/templates/deployments/app.yaml
@@ -106,11 +106,11 @@ spec:
             readOnlyRootFilesystem: false
           resources:
             requests:
-              cpu: 250m
-              memory: 128Mi
+              cpu: {{ .Values.app.resources.requests.cpu | default "250m" }}
+              memory: {{ .Values.app.resources.requests.memory | default "256Mi" }}
             limits:
-              cpu: 1000m
-              memory: 512Mi
+              cpu: {{ .Values.app.resources.limits.cpu | default "1000m" }}
+              memory: {{ .Values.app.resources.limits.memory | default "1Gi" }}
           volumeMounts:
             # Secrets - mounted as individual files
             - name: postgres-secrets
diff --git a/infra/helm/api-forge/templates/deployments/worker.yaml b/infra/helm/api-forge/templates/deployments/worker.yaml
@@ -105,11 +105,11 @@ spec:
             failureThreshold: 3
           resources:
             requests:
-              cpu: 250m
-              memory: 128Mi
+              cpu: {{ .Values.worker.resources.requests.cpu | default "250m" }}
+              memory: {{ .Values.worker.resources.requests.memory | default "256Mi" }}
             limits:
-              cpu: 1000m
-              memory: 512Mi
+              cpu: {{ .Values.worker.resources.limits.cpu | default "1000m" }}
+              memory: {{ .Values.worker.resources.limits.memory | default "1Gi" }}
           volumeMounts:
             # Secrets - mounted as individual files
             - name: postgres-secrets
diff --git a/infra/helm/api-forge/templates/poddisruptionbudgets/app-pdb.yaml b/infra/helm/api-forge/templates/poddisruptionbudgets/app-pdb.yaml
@@ -0,0 +1,23 @@
+{{- if and .Values.app.enabled .Values.app.podDisruptionBudget.enabled }}
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+  name: app
+  namespace: {{ .Values.global.namespace }}
+  labels:
+    app.kubernetes.io/name: app
+    app.kubernetes.io/component: application
+    app.kubernetes.io/part-of: api-forge
+spec:
+  {{- if .Values.app.podDisruptionBudget.minAvailable }}
+  minAvailable: {{ .Values.app.podDisruptionBudget.minAvailable }}
+  {{- else if .Values.app.podDisruptionBudget.maxUnavailable }}
+  maxUnavailable: {{ .Values.app.podDisruptionBudget.maxUnavailable }}
+  {{- else }}
+  # Default: allow 1 pod to be unavailable (works with any replica count)
+  maxUnavailable: 1
+  {{- end }}
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: app
+{{- end }}
diff --git a/infra/helm/api-forge/templates/poddisruptionbudgets/postgres-pdb.yaml b/infra/helm/api-forge/templates/poddisruptionbudgets/postgres-pdb.yaml
@@ -0,0 +1,23 @@
+{{- if and .Values.postgres.enabled .Values.postgres.podDisruptionBudget.enabled }}
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+  name: postgres
+  namespace: {{ .Values.global.namespace }}
+  labels:
+    app.kubernetes.io/name: postgres
+    app.kubernetes.io/component: database
+    app.kubernetes.io/part-of: api-forge
+spec:
+  {{- if .Values.postgres.podDisruptionBudget.minAvailable }}
+  minAvailable: {{ .Values.postgres.podDisruptionBudget.minAvailable }}
+  {{- else if .Values.postgres.podDisruptionBudget.maxUnavailable }}
+  maxUnavailable: {{ .Values.postgres.podDisruptionBudget.maxUnavailable }}
+  {{- else }}
+  # Default: PostgreSQL must always be available (single replica setup)
+  minAvailable: 1
+  {{- end }}
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: postgres
+{{- end }}
diff --git a/infra/helm/api-forge/templates/poddisruptionbudgets/redis-pdb.yaml b/infra/helm/api-forge/templates/poddisruptionbudgets/redis-pdb.yaml
@@ -0,0 +1,23 @@
+{{- if and .Values.redis.enabled .Values.redis.podDisruptionBudget.enabled }}
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+  name: redis
+  namespace: {{ .Values.global.namespace }}
+  labels:
+    app.kubernetes.io/name: redis
+    app.kubernetes.io/component: cache
+    app.kubernetes.io/part-of: api-forge
+spec:
+  {{- if .Values.redis.podDisruptionBudget.minAvailable }}
+  minAvailable: {{ .Values.redis.podDisruptionBudget.minAvailable }}
+  {{- else if .Values.redis.podDisruptionBudget.maxUnavailable }}
+  maxUnavailable: {{ .Values.redis.podDisruptionBudget.maxUnavailable }}
+  {{- else }}
+  # Default: Redis must always be available (single replica setup)
+  minAvailable: 1
+  {{- end }}
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: redis
+{{- end }}
diff --git a/infra/helm/api-forge/templates/poddisruptionbudgets/temporal-pdb.yaml b/infra/helm/api-forge/templates/poddisruptionbudgets/temporal-pdb.yaml
@@ -0,0 +1,23 @@
+{{- if and .Values.temporal.enabled .Values.temporal.podDisruptionBudget.enabled }}
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+  name: temporal
+  namespace: {{ .Values.global.namespace }}
+  labels:
+    app.kubernetes.io/name: temporal
+    app.kubernetes.io/component: workflow-engine
+    app.kubernetes.io/part-of: api-forge
+spec:
+  {{- if .Values.temporal.podDisruptionBudget.minAvailable }}
+  minAvailable: {{ .Values.temporal.podDisruptionBudget.minAvailable }}
+  {{- else if .Values.temporal.podDisruptionBudget.maxUnavailable }}
+  maxUnavailable: {{ .Values.temporal.podDisruptionBudget.maxUnavailable }}
+  {{- else }}
+  # Default: Temporal must always be available (single replica setup)
+  minAvailable: 1
+  {{- end }}
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: temporal
+{{- end }}
diff --git a/infra/helm/api-forge/templates/poddisruptionbudgets/worker-pdb.yaml b/infra/helm/api-forge/templates/poddisruptionbudgets/worker-pdb.yaml
diff --git a/infra/helm/api-forge/values.yaml b/infra/helm/api-forge/values.yaml