Skip to content

Commit ea51035

Browse files
author
jc
committed
Adding templates for automatic scaling, and improving default template values
1 parent e744cad commit ea51035

11 files changed

Lines changed: 450 additions & 48 deletions

File tree

docs/fastapi-kubernetes-deployment.md

Lines changed: 110 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -696,29 +696,109 @@ API Forge provides comprehensive health endpoints:
696696

697697
### Resource Requests and Limits
698698

699-
Set appropriate resource requests and limits:
699+
Resource configuration is managed through `values.yaml`. The templates dynamically read these values:
700700

701701
```yaml
702-
resources:
703-
requests:
704-
cpu: 250m
705-
memory: 256Mi
706-
limits:
707-
cpu: 1000m
708-
memory: 512Mi
702+
# infra/helm/api-forge/values.yaml
703+
app:
704+
resources:
705+
requests:
706+
cpu: 250m
707+
memory: 256Mi
708+
limits:
709+
cpu: 1000m
710+
memory: 1Gi
711+
```
712+
713+
**Production Sizing Guidelines**:
714+
715+
| Component | Requests (CPU/Mem) | Limits (CPU/Mem) | Notes |
716+
|-----------|-------------------|------------------|-------|
717+
| App | 250m / 256Mi | 1000m / 1Gi | Scale horizontally with HPA |
718+
| Worker | 250m / 256Mi | 1000m / 1Gi | Conservative scale-down for workflows |
719+
| PostgreSQL | 500m / 1Gi | 2000m / 4Gi | Consider managed DB for HA |
720+
| Redis | 250m / 256Mi | 1000m / 1Gi | Match maxMemory config |
721+
| Temporal | 500m / 1Gi | 2000m / 4Gi | Single instance sufficient for most loads |
722+
723+
### Horizontal Pod Autoscaling (HPA)
724+
725+
The Helm chart includes built-in HPA support for the app and worker deployments. Enable autoscaling in `values.yaml`:
726+
727+
```yaml
728+
# infra/helm/api-forge/values.yaml
729+
app:
730+
replicas: 1 # Base replicas when HPA is disabled
731+
autoscaling:
732+
enabled: true
733+
minReplicas: 1
734+
maxReplicas: 5
735+
targetCPUUtilizationPercentage: 70
736+
targetMemoryUtilizationPercentage: 80
737+
behavior:
738+
scaleDown:
739+
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
740+
percentValue: 10 # Scale down 10% at a time
741+
periodSeconds: 60
742+
scaleUp:
743+
stabilizationWindowSeconds: 0 # Scale up immediately
744+
percentValue: 100
745+
podsValue: 4 # Add up to 4 pods at once
746+
periodSeconds: 15
747+
748+
worker:
749+
autoscaling:
750+
enabled: true
751+
minReplicas: 1
752+
maxReplicas: 5
753+
behavior:
754+
scaleDown:
755+
stabilizationWindowSeconds: 600 # Workers scale down more conservatively
756+
periodSeconds: 120 # to avoid disrupting running workflows
709757
```
710758

711-
**Guidelines**:
712-
- **Requests**: Minimum resources guaranteed
713-
- **Limits**: Maximum resources allowed
714-
- **FastAPI App**: 250m CPU, 256-512Mi memory
715-
- **Worker**: 250m CPU, 256-512Mi memory
716-
- **PostgreSQL**: 500m CPU, 1Gi memory
717-
- **Redis**: 100m CPU, 128Mi memory
759+
When `autoscaling.enabled: true`, the HPA controller manages replica count automatically based on CPU/memory metrics.
760+
761+
**Check HPA status:**
762+
```bash
763+
kubectl get hpa -n api-forge-prod
764+
kubectl describe hpa app -n api-forge-prod
765+
```
766+
767+
### Pod Disruption Budgets (PDB)
768+
769+
PDBs ensure service availability during voluntary disruptions (node drains, upgrades). The chart includes PDBs for all services:
770+
771+
```yaml
772+
# infra/helm/api-forge/values.yaml
773+
app:
774+
podDisruptionBudget:
775+
enabled: true
776+
maxUnavailable: 1 # Allow 1 pod to be unavailable (works with any replica count)
777+
# Or use minAvailable (but blocks eviction when replicas=1):
778+
# minAvailable: 1
779+
780+
postgres:
781+
podDisruptionBudget:
782+
enabled: true
783+
maxUnavailable: 1
784+
785+
redis:
786+
podDisruptionBudget:
787+
enabled: true
788+
maxUnavailable: 1
789+
```
790+
791+
> **Note:** Use `maxUnavailable` instead of `minAvailable` when running single-replica deployments. With `minAvailable: 1` and only 1 replica, Kubernetes cannot evict the pod during voluntary disruptions (node drains, upgrades), causing a deadlock.
792+
793+
**Check PDB status:**
794+
```bash
795+
kubectl get pdb -n api-forge-prod
796+
kubectl describe pdb app -n api-forge-prod
797+
```
718798

719-
### Horizontal Pod Autoscaling
799+
### Manual Horizontal Pod Autoscaling
720800

721-
Scale based on CPU/memory utilization:
801+
If you prefer manual HPA configuration or need custom metrics:
722802

723803
```yaml
724804
apiVersion: autoscaling/v2
@@ -1246,18 +1326,19 @@ kubectl apply -f argocd-application.yaml
12461326

12471327
1. **Use Helm for deployments** - Provides templating, versioning, and rollback capabilities
12481328
2. **Sync config.yaml settings** - Let the CLI handle redis.enabled and temporal.enabled synchronization
1249-
3. **Set resource requests and limits** - Define appropriate limits for all containers
1250-
4. **Implement health checks** - Configure liveness and readiness probes
1251-
5. **Use secrets properly** - Never store sensitive data in ConfigMaps or values.yaml
1252-
6. **Enable NetworkPolicies** - Restrict pod-to-pod communication
1253-
7. **Use Ingress with TLS** - Secure external access with TLS certificates
1254-
8. **Implement HPA** - Enable Horizontal Pod Autoscaling for dynamic scaling
1255-
9. **Use PersistentVolumes** - Ensure data persistence for stateful services
1256-
10. **Tag images with versions** - Avoid using `latest` in production
1257-
11. **Monitor and log** - Implement comprehensive monitoring and logging
1258-
12. **Test locally first** - Use Minikube to test deployments before production
1259-
13. **Use External Secrets Operator** - For production secret management
1260-
14. **Leverage Helm rollbacks** - Easy rollback to previous releases if issues arise
1329+
3. **Set resource requests and limits** - Configure in `values.yaml` for all containers
1330+
4. **Enable HPA for production** - Set `app.autoscaling.enabled: true` for automatic scaling
1331+
5. **Enable PDBs** - Ensure `podDisruptionBudget.enabled: true` for service availability during maintenance
1332+
6. **Implement health checks** - Configure liveness and readiness probes
1333+
7. **Use secrets properly** - Never store sensitive data in ConfigMaps or values.yaml
1334+
8. **Enable NetworkPolicies** - Restrict pod-to-pod communication
1335+
9. **Use Ingress with TLS** - Secure external access with TLS certificates
1336+
10. **Use PersistentVolumes** - Ensure data persistence for stateful services
1337+
11. **Tag images with versions** - Avoid using `latest` in production
1338+
12. **Monitor and log** - Implement comprehensive monitoring and logging
1339+
13. **Test locally first** - Use Minikube to test deployments before production
1340+
14. **Use External Secrets Operator** - For production secret management
1341+
15. **Leverage Helm rollbacks** - Use `deploy rollback` CLI command if issues arise
12611342

12621343
## Helm-Specific Tips
12631344

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
{{- if and .Values.app.enabled .Values.app.autoscaling.enabled }}
2+
apiVersion: autoscaling/v2
3+
kind: HorizontalPodAutoscaler
4+
metadata:
5+
name: app
6+
namespace: {{ .Values.global.namespace }}
7+
labels:
8+
app.kubernetes.io/name: app
9+
app.kubernetes.io/component: application
10+
app.kubernetes.io/part-of: api-forge
11+
spec:
12+
scaleTargetRef:
13+
apiVersion: apps/v1
14+
kind: Deployment
15+
name: app
16+
minReplicas: {{ .Values.app.autoscaling.minReplicas | default 1 }}
17+
maxReplicas: {{ .Values.app.autoscaling.maxReplicas | default 5 }}
18+
metrics:
19+
{{- if .Values.app.autoscaling.targetCPUUtilizationPercentage }}
20+
- type: Resource
21+
resource:
22+
name: cpu
23+
target:
24+
type: Utilization
25+
averageUtilization: {{ .Values.app.autoscaling.targetCPUUtilizationPercentage }}
26+
{{- end }}
27+
{{- if .Values.app.autoscaling.targetMemoryUtilizationPercentage }}
28+
- type: Resource
29+
resource:
30+
name: memory
31+
target:
32+
type: Utilization
33+
averageUtilization: {{ .Values.app.autoscaling.targetMemoryUtilizationPercentage }}
34+
{{- end }}
35+
behavior:
36+
scaleDown:
37+
stabilizationWindowSeconds: {{ .Values.app.autoscaling.behavior.scaleDown.stabilizationWindowSeconds | default 300 }}
38+
policies:
39+
- type: Percent
40+
value: {{ .Values.app.autoscaling.behavior.scaleDown.percentValue | default 10 }}
41+
periodSeconds: {{ .Values.app.autoscaling.behavior.scaleDown.periodSeconds | default 60 }}
42+
scaleUp:
43+
stabilizationWindowSeconds: {{ .Values.app.autoscaling.behavior.scaleUp.stabilizationWindowSeconds | default 0 }}
44+
policies:
45+
- type: Percent
46+
value: {{ .Values.app.autoscaling.behavior.scaleUp.percentValue | default 100 }}
47+
periodSeconds: {{ .Values.app.autoscaling.behavior.scaleUp.periodSeconds | default 15 }}
48+
- type: Pods
49+
value: {{ .Values.app.autoscaling.behavior.scaleUp.podsValue | default 4 }}
50+
periodSeconds: {{ .Values.app.autoscaling.behavior.scaleUp.periodSeconds | default 15 }}
51+
selectPolicy: Max
52+
{{- end }}
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
{{- if and .Values.worker.enabled .Values.temporal.enabled .Values.worker.autoscaling.enabled }}
2+
apiVersion: autoscaling/v2
3+
kind: HorizontalPodAutoscaler
4+
metadata:
5+
name: worker
6+
namespace: {{ .Values.global.namespace }}
7+
labels:
8+
app.kubernetes.io/name: worker
9+
app.kubernetes.io/component: temporal-worker
10+
app.kubernetes.io/part-of: api-forge
11+
spec:
12+
scaleTargetRef:
13+
apiVersion: apps/v1
14+
kind: Deployment
15+
name: worker
16+
minReplicas: {{ .Values.worker.autoscaling.minReplicas | default 1 }}
17+
maxReplicas: {{ .Values.worker.autoscaling.maxReplicas | default 5 }}
18+
metrics:
19+
{{- if .Values.worker.autoscaling.targetCPUUtilizationPercentage }}
20+
- type: Resource
21+
resource:
22+
name: cpu
23+
target:
24+
type: Utilization
25+
averageUtilization: {{ .Values.worker.autoscaling.targetCPUUtilizationPercentage }}
26+
{{- end }}
27+
{{- if .Values.worker.autoscaling.targetMemoryUtilizationPercentage }}
28+
- type: Resource
29+
resource:
30+
name: memory
31+
target:
32+
type: Utilization
33+
averageUtilization: {{ .Values.worker.autoscaling.targetMemoryUtilizationPercentage }}
34+
{{- end }}
35+
behavior:
36+
scaleDown:
37+
# Worker scale-down should be more conservative to avoid disrupting running workflows
38+
stabilizationWindowSeconds: {{ .Values.worker.autoscaling.behavior.scaleDown.stabilizationWindowSeconds | default 600 }}
39+
policies:
40+
- type: Percent
41+
value: {{ .Values.worker.autoscaling.behavior.scaleDown.percentValue | default 10 }}
42+
periodSeconds: {{ .Values.worker.autoscaling.behavior.scaleDown.periodSeconds | default 120 }}
43+
scaleUp:
44+
stabilizationWindowSeconds: {{ .Values.worker.autoscaling.behavior.scaleUp.stabilizationWindowSeconds | default 0 }}
45+
policies:
46+
- type: Percent
47+
value: {{ .Values.worker.autoscaling.behavior.scaleUp.percentValue | default 100 }}
48+
periodSeconds: {{ .Values.worker.autoscaling.behavior.scaleUp.periodSeconds | default 15 }}
49+
- type: Pods
50+
value: {{ .Values.worker.autoscaling.behavior.scaleUp.podsValue | default 2 }}
51+
periodSeconds: {{ .Values.worker.autoscaling.behavior.scaleUp.periodSeconds | default 15 }}
52+
selectPolicy: Max
53+
{{- end }}

infra/helm/api-forge/templates/deployments/app.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -106,11 +106,11 @@ spec:
106106
readOnlyRootFilesystem: false
107107
resources:
108108
requests:
109-
cpu: 250m
110-
memory: 128Mi
109+
cpu: {{ .Values.app.resources.requests.cpu | default "250m" }}
110+
memory: {{ .Values.app.resources.requests.memory | default "256Mi" }}
111111
limits:
112-
cpu: 1000m
113-
memory: 512Mi
112+
cpu: {{ .Values.app.resources.limits.cpu | default "1000m" }}
113+
memory: {{ .Values.app.resources.limits.memory | default "1Gi" }}
114114
volumeMounts:
115115
# Secrets - mounted as individual files
116116
- name: postgres-secrets

infra/helm/api-forge/templates/deployments/worker.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -105,11 +105,11 @@ spec:
105105
failureThreshold: 3
106106
resources:
107107
requests:
108-
cpu: 250m
109-
memory: 128Mi
108+
cpu: {{ .Values.worker.resources.requests.cpu | default "250m" }}
109+
memory: {{ .Values.worker.resources.requests.memory | default "256Mi" }}
110110
limits:
111-
cpu: 1000m
112-
memory: 512Mi
111+
cpu: {{ .Values.worker.resources.limits.cpu | default "1000m" }}
112+
memory: {{ .Values.worker.resources.limits.memory | default "1Gi" }}
113113
volumeMounts:
114114
# Secrets - mounted as individual files
115115
- name: postgres-secrets
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{{- if and .Values.app.enabled .Values.app.podDisruptionBudget.enabled }}
2+
apiVersion: policy/v1
3+
kind: PodDisruptionBudget
4+
metadata:
5+
name: app
6+
namespace: {{ .Values.global.namespace }}
7+
labels:
8+
app.kubernetes.io/name: app
9+
app.kubernetes.io/component: application
10+
app.kubernetes.io/part-of: api-forge
11+
spec:
12+
{{- if .Values.app.podDisruptionBudget.minAvailable }}
13+
minAvailable: {{ .Values.app.podDisruptionBudget.minAvailable }}
14+
{{- else if .Values.app.podDisruptionBudget.maxUnavailable }}
15+
maxUnavailable: {{ .Values.app.podDisruptionBudget.maxUnavailable }}
16+
{{- else }}
17+
# Default: allow 1 pod to be unavailable (works with any replica count)
18+
maxUnavailable: 1
19+
{{- end }}
20+
selector:
21+
matchLabels:
22+
app.kubernetes.io/name: app
23+
{{- end }}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{{- if and .Values.postgres.enabled .Values.postgres.podDisruptionBudget.enabled }}
2+
apiVersion: policy/v1
3+
kind: PodDisruptionBudget
4+
metadata:
5+
name: postgres
6+
namespace: {{ .Values.global.namespace }}
7+
labels:
8+
app.kubernetes.io/name: postgres
9+
app.kubernetes.io/component: database
10+
app.kubernetes.io/part-of: api-forge
11+
spec:
12+
{{- if .Values.postgres.podDisruptionBudget.minAvailable }}
13+
minAvailable: {{ .Values.postgres.podDisruptionBudget.minAvailable }}
14+
{{- else if .Values.postgres.podDisruptionBudget.maxUnavailable }}
15+
maxUnavailable: {{ .Values.postgres.podDisruptionBudget.maxUnavailable }}
16+
{{- else }}
17+
# Default: PostgreSQL must always be available (single replica setup)
18+
minAvailable: 1
19+
{{- end }}
20+
selector:
21+
matchLabels:
22+
app.kubernetes.io/name: postgres
23+
{{- end }}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{{- if and .Values.redis.enabled .Values.redis.podDisruptionBudget.enabled }}
2+
apiVersion: policy/v1
3+
kind: PodDisruptionBudget
4+
metadata:
5+
name: redis
6+
namespace: {{ .Values.global.namespace }}
7+
labels:
8+
app.kubernetes.io/name: redis
9+
app.kubernetes.io/component: cache
10+
app.kubernetes.io/part-of: api-forge
11+
spec:
12+
{{- if .Values.redis.podDisruptionBudget.minAvailable }}
13+
minAvailable: {{ .Values.redis.podDisruptionBudget.minAvailable }}
14+
{{- else if .Values.redis.podDisruptionBudget.maxUnavailable }}
15+
maxUnavailable: {{ .Values.redis.podDisruptionBudget.maxUnavailable }}
16+
{{- else }}
17+
# Default: Redis must always be available (single replica setup)
18+
minAvailable: 1
19+
{{- end }}
20+
selector:
21+
matchLabels:
22+
app.kubernetes.io/name: redis
23+
{{- end }}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{{- if and .Values.temporal.enabled .Values.temporal.podDisruptionBudget.enabled }}
2+
apiVersion: policy/v1
3+
kind: PodDisruptionBudget
4+
metadata:
5+
name: temporal
6+
namespace: {{ .Values.global.namespace }}
7+
labels:
8+
app.kubernetes.io/name: temporal
9+
app.kubernetes.io/component: workflow-engine
10+
app.kubernetes.io/part-of: api-forge
11+
spec:
12+
{{- if .Values.temporal.podDisruptionBudget.minAvailable }}
13+
minAvailable: {{ .Values.temporal.podDisruptionBudget.minAvailable }}
14+
{{- else if .Values.temporal.podDisruptionBudget.maxUnavailable }}
15+
maxUnavailable: {{ .Values.temporal.podDisruptionBudget.maxUnavailable }}
16+
{{- else }}
17+
# Default: Temporal must always be available (single replica setup)
18+
minAvailable: 1
19+
{{- end }}
20+
selector:
21+
matchLabels:
22+
app.kubernetes.io/name: temporal
23+
{{- end }}

0 commit comments

Comments
 (0)