This document describes different strategies for deploying new versions of the CarePath AI services, particularly when introducing changes like a new LLM model.
Replace all pods one at a time with the new version until all pods are running the new version. This is Kubernetes' default deployment strategy.
- New version of the image is pushed to ECR
- Deployment manifest is updated with new image tag
- Kubernetes creates a new pod with the new version
- Once the new pod is healthy (passes readiness probes), Kubernetes terminates one old pod
- Process repeats until all pods are replaced
- At any given time, both old and new versions may be serving traffic
- Built into Kubernetes - no additional infrastructure needed
- Zero downtime deployment
- Automatic rollback capability via
kubectl rollout undo - Resource efficient - doesn't require double the pods
- Both old and new versions run simultaneously during rollout
- If new version has a bug, some traffic will hit it before rollback
- Limited control over traffic distribution
# Update terraform.tfvars with new image tag
db_api_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/carepath-chat-api:v2.0"
# Apply the change
cd infra/terraform/envs/demo && terraform apply
# Or use make target
make deploy-chat
# Monitor rollout
kubectl rollout status deployment/chat-api -n carepath-demo
# Rollback if needed
kubectl rollout undo deployment/chat-api -n carepath-demoIn Terraform deployment spec:
spec {
strategy {
type = "RollingUpdate"
rolling_update {
max_surge = 1 # Max number of extra pods during update
max_unavailable = 0 # Max number of pods that can be unavailable
}
}
}Deploy the new version alongside the old version and gradually shift traffic from old to new using percentage-based routing.
- Deploy new version as a separate deployment (e.g.,
chat-api-v2) - Use service mesh (Istio, Linkerd) or ingress controller to split traffic:
- Start: 95% to old version, 5% to new version
- Monitor metrics, error rates, latency
- Gradually increase traffic to new version: 10%, 25%, 50%, 75%, 100%
- Once 100% traffic is on new version, remove old deployment
- Gradual exposure of new version to users
- Can limit blast radius of bugs
- Detailed control over traffic distribution
- Can route specific users/requests to new version for testing
- Easy to halt rollout and rollback
- Requires service mesh or advanced ingress controller
- More complex infrastructure
- Requires monitoring and manual intervention
- Higher resource usage (both versions running simultaneously)
- Need to maintain compatibility between versions
- Service Mesh: Install Istio or Linkerd on EKS cluster
- Monitoring: Prometheus + Grafana to track metrics by version
- Traffic Splitting Configuration: VirtualService (Istio) or TrafficSplit (SMI)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: chat-api
spec:
hosts:
- chat-api
http:
- match:
- headers:
x-version:
exact: "canary"
route:
- destination:
host: chat-api-v2
weight: 100
- route:
- destination:
host: chat-api-v1
weight: 90
- destination:
host: chat-api-v2
weight: 10Maintain two complete, identical environments (staging/blue and production/green). Deploy to staging first, then switch traffic entirely when ready.
- Staging Environment: Deploy new version to staging cluster/namespace
- Testing: Run full test suite against staging
- Traffic Switch: Update DNS or load balancer to point to staging
- Monitoring: Watch metrics closely after switch
- Rollback: If issues arise, switch DNS back to production
- Cleanup: Once stable, promote staging to production, prepare for next deployment
- Complete isolation between versions
- Full testing on production-like environment before traffic switch
- Instant rollback by switching DNS/load balancer
- Zero downtime if done correctly
- Can keep old version running for safety
- Requires double the infrastructure (most expensive option)
- Database migrations can be complex
- DNS/load balancer switch may not be instant (DNS TTL)
- Need to manage state between environments
- Resource intensive
# Two complete environment configurations in Terraform
# infra/terraform/envs/staging/
# infra/terraform/envs/production/
# Deploy to staging
cd infra/terraform/envs/staging && terraform apply
# Run tests against staging
kubectl get service chat-api-service -n carepath-staging
# Switch traffic (update Route53 or ALB target group)
# aws elbv2 modify-listener ...
# Promote staging to production after validationFor the current MVP phase with limited resources, we recommend Option 1: Rolling Update.
- No Additional Infrastructure: Already supported by Kubernetes
- Low Complexity: Terraform changes only
- Cost Effective: No extra clusters or service mesh
- Good Enough: For MVP with limited users, rolling updates provide sufficient safety
- Easy Rollback:
kubectl rollout undois simple and fast
Consider Option 2 (Canary) or Option 3 (Blue-Green) when:
- User base grows significantly (>10,000 daily users)
- Downtime becomes expensive (SLA requirements)
- Need to test with real traffic before full rollout
- Have resources to maintain more complex infrastructure
- Compliance requires full environment separation
Regardless of deployment method, always have a rollback plan:
kubectl rollout undo deployment/chat-api -n carepath-demo# Set traffic back to 100% old version
kubectl apply -f traffic-split-old-100percent.yaml# Update DNS or load balancer back to previous environment
aws route53 change-resource-record-sets ...Key metrics to watch during any deployment:
- Error Rate: Track 5xx errors per service version
- Latency: p50, p95, p99 response times
- Pod Health: Readiness/liveness probe failures
- Resource Usage: CPU, memory spikes
- Business Metrics: Successful triage requests, user complaints
# Error rate by version
rate(http_requests_total{status=~"5..", deployment="chat-api"}[5m])
# Latency by version
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Pod restarts
rate(kube_pod_container_status_restarts_total{namespace="carepath-demo"}[5m])