Lab 02: Production-Ready EKS with Karpenter and Observability

Welcome to Lab 02 of my Cloud & Kubernetes DevOps portfolio! This project leverages AWS Elastic Kubernetes Service (EKS) and transforms it into a highly efficient, auto-scaling, and fully observable production environment.

📖 Project Overview

This lab demonstrates the orchestration of a Stateful Microservice Architecture within a highly dynamic, cost-optimized AWS EKS environment. Unlike standard stateless tutorials, this "Hard Mode" lab focuses on the complex integration of persistent data layers and hybrid-cloud security:

Compute Intelligence: Utilizes Karpenter to bypass traditional Auto Scaling Groups, provisioning right-sized Spot instances in seconds based on real-time pod resource requests.

Stateful Backbone: Implements Amazon RDS (PostgreSQL) for relational data and Amazon EFS via the CSI driver for shared file storage across scaling pods.

Security & Identity: Leverages IAM Roles for Service Accounts (IRSA) to provide pods with least-privilege access to AWS resources, eliminating the need for hardcoded secrets.

Hybrid Connectivity: The private data tier is hardened with security groups that exclusively allow management traffic from the Lab 01 VyOS BGP Tunnel, simulating a real-world enterprise management path.

📌 Architecture Diagram

graph TD
  subgraph On-Premises
    A[Vagrant Workload] -->|192.168.0.0/24| B[Lab 01 VyOS BGP Tunnel]
  end

  subgraph AWS Cloud Architecture
    subgraph EKS Cluster
      C[Ingress / Load Balancer] --> D[FastAPI Pods]
      D -->|PVC Mount| E[(Amazon EFS)]
    end
    
    subgraph Private Data Tier
      D -->|IRSA Auth, Port 5432| F[(Amazon RDS PostgreSQL)]
    end
  end

  B ==>|Exclusive Management Path| F
  B -->|Data Plane Traffic| C

🛠️ Components & Technologies

EKS & Terraform: Bootstrapped an EKS 1.29 cluster using the official Terraform modules, configuring OIDC and IRSA (IAM Roles for Service Accounts) for secure, least-privilege pod permissions.
Karpenter: A high-performance Kubernetes cluster autoscaler. Configured to aggressively provision t3 and m5 Spot Instances when pods go pending, bypassing the slower native Cluster Autoscaler.
Observability (kube-prometheus-stack): A robust Helm-deployed stack containing Prometheus and Grafana. Included a custom ServiceMonitor designed to scrape deep application metrics (e.g., HTTP 404 rates, DB Latency) directly from a Bun or WordPress application via the /metrics endpoint.

🗂️ Repository Structure

terraform/: Contains the infrastructure as code (EKS, VPC, OIDC, IRSA, and the initial Karpenter Helm Release).
kubernetes/karpenter-nodepool.yaml: Defines the custom Karpenter NodePool and EC2NodeClass prioritizing Spot capacity.
kubernetes/observability.yaml: Helm deployment steps and the specific ServiceMonitor logic to integrate the monitoring loop.

🌪️ Hard-Mode: Stateful Microservices

In this advanced configuration, we introduced Stateful workloads (Amazon RDS and Amazon EFS) to our highly dynamic, stateless Karpenter-managed EKS cluster.

Stateful Data in a Stateless Auto-Scaling Cluster: While Karpenter rapidly provisions and scales down Spot instances based on traffic (managed seamlessly by our new FastAPI Horizontal Pod Autoscaler), our critical state is strictly decoupled. The FastAPI pods utilize the EFS CSI driver via PVCs for shared static assets, and IRSA (IAM Roles for Service Accounts) to securely authenticate to an external RDS PostgreSQL database without any hardcoded credentials. Karpenter is configured with node disruption budgets (nodes: 1) to ensure aggressive consolidation doesn't sever all stateful database connections simultaneously.
The Exclusive Management Path: The RDS database resides deep within a private subnet without a public IP. Its Security Group restricts access only to the EKS nodes and the On-Premises 192.168.0.0/24 CIDR. Thus, the Lab 01 VyOS BGP Tunnel operates as the exclusive, secure management path into our private data tier from the local Vagrant environment.

🔒 Validating the Exclusive Management Path

Because the RDS instance has no public IP and restricting Security Group rules, the only way to manually access the database is via the Lab 01 On-Premises BGP tunnel.

SSH into the Vagrant Workload VM (from Lab-01-AWS-VyOS-VPN/workload):
```
vagrant ssh
```
Connect to the Remote Database:
```
psql -h <RDS_ENDPOINT> -U postgres -d postgres
```
Expected Output: You should successfully authenticate to the PostgreSQL shell, proving the route transverses the VyOS IPsec tunnel directly into the private AWS subnet.
Verify the State:
```
SELECT * FROM users;
```
You should see the users table that was seeded by the FastAPI initContainer!

🚀 How to Deploy & Test the Scaling (Scaling Demonstration)

To truly see Karpenter in action, we can deliberately trigger a massive scale-up event.

Deploy the Infrastructure:

cd terraform/
terraform init && terraform apply -auto-approve
aws eks update-kubeconfig --region us-east-1 --name lab02-eks-karpenter

Apply the NodePools:

kubectl apply -f kubernetes/karpenter-nodepool.yaml

The Scaling Demonstration: We will deploy a massive pause deployment (a container that does nothing but sleep) requesting specifically high CPU to force the cluster out of capacity.

# Deploy a massive workload
kubectl create deployment inflate --image=mcr.microsoft.com/oss/kubernetes/pause:3.6 --replicas=0

# Force the pods to require specific labels (e.g., targeting our dev/web spot instances)
kubectl set env deployment inflate NODE_ENV=dev

# Scale up drastically to break current capacity
kubectl scale deployment inflate --replicas=10

Watch Karpenter React in Seconds: Immediately inspect the Karpenter logs and watch the nodes spin up:
```
kubectl logs -l app.kubernetes.io/name=karpenter -n karpenter -f
kubectl get nodes -w
```
You will see Karpenter calculate the exact CPU bounds of the 10 pending pods, bypass Auto Scaling Groups, and instantly make raw EC2 Fleet API calls to provision the cheapest Spot instances available to fit the workload.
Trigger Scale-Down (Consolidation):
```
kubectl scale deployment inflate --replicas=0
```
Within seconds, Karpenter will identify that the Spot nodes are empty and terminate them, saving costs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
kubernetes		kubernetes
terraform		terraform
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lab 02: Production-Ready EKS with Karpenter and Observability

📌 Architecture Diagram

🛠️ Components & Technologies

🗂️ Repository Structure

🌪️ Hard-Mode: Stateful Microservices

🔒 Validating the Exclusive Management Path

🚀 How to Deploy & Test the Scaling (Scaling Demonstration)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lab 02: Production-Ready EKS with Karpenter and Observability

📌 Architecture Diagram

🛠️ Components & Technologies

🗂️ Repository Structure

🌪️ Hard-Mode: Stateful Microservices

🔒 Validating the Exclusive Management Path

🚀 How to Deploy & Test the Scaling (Scaling Demonstration)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages