Welcome to Lab 02 of my Cloud & Kubernetes DevOps portfolio! This project leverages AWS Elastic Kubernetes Service (EKS) and transforms it into a highly efficient, auto-scaling, and fully observable production environment.
📖 Project Overview
This lab demonstrates the orchestration of a Stateful Microservice Architecture within a highly dynamic, cost-optimized AWS EKS environment. Unlike standard stateless tutorials, this "Hard Mode" lab focuses on the complex integration of persistent data layers and hybrid-cloud security:
Compute Intelligence: Utilizes Karpenter to bypass traditional Auto Scaling Groups, provisioning right-sized Spot instances in seconds based on real-time pod resource requests.
Stateful Backbone: Implements Amazon RDS (PostgreSQL) for relational data and Amazon EFS via the CSI driver for shared file storage across scaling pods.
Security & Identity: Leverages IAM Roles for Service Accounts (IRSA) to provide pods with least-privilege access to AWS resources, eliminating the need for hardcoded secrets.
Hybrid Connectivity: The private data tier is hardened with security groups that exclusively allow management traffic from the Lab 01 VyOS BGP Tunnel, simulating a real-world enterprise management path.
graph TD
subgraph On-Premises
A[Vagrant Workload] -->|192.168.0.0/24| B[Lab 01 VyOS BGP Tunnel]
end
subgraph AWS Cloud Architecture
subgraph EKS Cluster
C[Ingress / Load Balancer] --> D[FastAPI Pods]
D -->|PVC Mount| E[(Amazon EFS)]
end
subgraph Private Data Tier
D -->|IRSA Auth, Port 5432| F[(Amazon RDS PostgreSQL)]
end
end
B ==>|Exclusive Management Path| F
B -->|Data Plane Traffic| C
- EKS & Terraform: Bootstrapped an EKS 1.29 cluster using the official Terraform modules, configuring OIDC and IRSA (IAM Roles for Service Accounts) for secure, least-privilege pod permissions.
- Karpenter: A high-performance Kubernetes cluster autoscaler. Configured to aggressively provision
t3andm5Spot Instances when pods go pending, bypassing the slower native Cluster Autoscaler. - Observability (kube-prometheus-stack): A robust Helm-deployed stack containing Prometheus and Grafana. Included a custom
ServiceMonitordesigned to scrape deep application metrics (e.g., HTTP 404 rates, DB Latency) directly from a Bun or WordPress application via the/metricsendpoint.
terraform/: Contains the infrastructure as code (EKS, VPC, OIDC, IRSA, and the initial Karpenter Helm Release).kubernetes/karpenter-nodepool.yaml: Defines the custom KarpenterNodePoolandEC2NodeClassprioritizing Spot capacity.kubernetes/observability.yaml: Helm deployment steps and the specificServiceMonitorlogic to integrate the monitoring loop.
In this advanced configuration, we introduced Stateful workloads (Amazon RDS and Amazon EFS) to our highly dynamic, stateless Karpenter-managed EKS cluster.
- Stateful Data in a Stateless Auto-Scaling Cluster:
While Karpenter rapidly provisions and scales down Spot instances based on traffic (managed seamlessly by our new FastAPI Horizontal Pod Autoscaler), our critical state is strictly decoupled. The FastAPI pods utilize the EFS CSI driver via PVCs for shared static assets, and IRSA (IAM Roles for Service Accounts) to securely authenticate to an external RDS PostgreSQL database without any hardcoded credentials. Karpenter is configured with node disruption budgets (
nodes: 1) to ensure aggressive consolidation doesn't sever all stateful database connections simultaneously. - The Exclusive Management Path:
The RDS database resides deep within a private subnet without a public IP. Its Security Group restricts access only to the EKS nodes and the On-Premises
192.168.0.0/24CIDR. Thus, the Lab 01 VyOS BGP Tunnel operates as the exclusive, secure management path into our private data tier from the local Vagrant environment.
Because the RDS instance has no public IP and restricting Security Group rules, the only way to manually access the database is via the Lab 01 On-Premises BGP tunnel.
-
SSH into the Vagrant Workload VM (from
Lab-01-AWS-VyOS-VPN/workload):vagrant ssh
-
Connect to the Remote Database:
psql -h <RDS_ENDPOINT> -U postgres -d postgres
Expected Output: You should successfully authenticate to the PostgreSQL shell, proving the route transverses the VyOS IPsec tunnel directly into the private AWS subnet.
-
Verify the State:
SELECT * FROM users;
You should see the users table that was seeded by the FastAPI
initContainer!
To truly see Karpenter in action, we can deliberately trigger a massive scale-up event.
-
Deploy the Infrastructure:
cd terraform/ terraform init && terraform apply -auto-approve aws eks update-kubeconfig --region us-east-1 --name lab02-eks-karpenter
-
Apply the NodePools:
kubectl apply -f kubernetes/karpenter-nodepool.yaml
-
The Scaling Demonstration: We will deploy a massive
pausedeployment (a container that does nothing but sleep) requesting specifically high CPU to force the cluster out of capacity.# Deploy a massive workload kubectl create deployment inflate --image=mcr.microsoft.com/oss/kubernetes/pause:3.6 --replicas=0 # Force the pods to require specific labels (e.g., targeting our dev/web spot instances) kubectl set env deployment inflate NODE_ENV=dev # Scale up drastically to break current capacity kubectl scale deployment inflate --replicas=10
-
Watch Karpenter React in Seconds: Immediately inspect the Karpenter logs and watch the nodes spin up:
kubectl logs -l app.kubernetes.io/name=karpenter -n karpenter -f kubectl get nodes -w
You will see Karpenter calculate the exact CPU bounds of the 10 pending pods, bypass Auto Scaling Groups, and instantly make raw EC2 Fleet API calls to provision the cheapest Spot instances available to fit the workload.
-
Trigger Scale-Down (Consolidation):
kubectl scale deployment inflate --replicas=0
Within seconds, Karpenter will identify that the Spot nodes are empty and terminate them, saving costs.