A comprehensive Kubernetes-native home infrastructure platform
π GitOps β’ π Security-First β’ π€ Fully Automated
This repository contains the complete infrastructure-as-code (IaC) configuration for my home operations platform. Built on modern cloud-native principles, it demonstrates enterprise-grade practices scaled down for home use, featuring:
- ποΈ Kubernetes-Native Architecture: Built on Talos Linux for immutable infrastructure
- β‘ GitOps Workflow: Managed by Flux CD for declarative, Git-driven deployments
- π Zero-Trust Security: Comprehensive authentication, authorization, and secrets management
- π€ Full Automation: From hardware provisioning to application deployment
- π Complete Observability: Metrics, logs, traces, and alerting across the stack
- π Smart Home Integration: IoT, automation, and media management platform
- Infrastructure as Code: Everything defined declaratively in Git
- GitOps: Git as the single source of truth for cluster state
- Security by Design: Zero-trust networking, encrypted secrets, automated updates
- Cloud-Native: Kubernetes-first, microservices architecture
- Observability: Comprehensive monitoring and alerting
- Automation: Minimal manual intervention required
The platform runs on a high-availability Kubernetes cluster powered by Talos Linux:
| Component | Details |
|---|---|
| OS | Talos Linux v1.12.3 - Immutable, API-driven Linux |
| Kubernetes | v1.35.1 - Latest stable Kubernetes |
| CNI | Cilium - eBPF-based networking and security |
| Ingress | Envoy Gateway - Gateway API-based ingress |
| Nodes | 3x Control Plane + 2x Workers |
| High Availability | Virtual IP, distributed etcd, automated failover |
| Device | Count | CPU | Cores | RAM | OS | Data | Purpose |
|---|---|---|---|---|---|---|---|
| Intel NUC12WSHi7 | 2 | i7-1265P | 12 (16 threads) | 64GB | 1TB SSD | 1TB NVMe | Kubernetes Control Plane |
| Intel NUC11PAHi7 | 1 | i7-1165G7 | 4 (8 threads) | 64GB | 1TB SSD | 1TB NVMe | Kubernetes Control Plane |
| Intel NUC11PAHi7 | 1 | i7-1165G7 | 4 (8 threads) | 64GB | 1TB SSD | 1TB NVMe | Kubernetes Worker Node |
| Minisforum MS-01 | 1 | i9-13900H | 14 (20 threads) | 96GB | 1TB NVMe | 2TB NVMe | Kubernetes Worker Node |
| Synology RS1219+ | 1 | Atom C2538 | - | 4GB | - | 6Γ16TB | NAS Storage |
| Synology DVA1622 | 1 | Atom C3508 | - | 4GB | - | 2Γ4TB | NVR/Security Cameras |
| UniFi UXG-Pro | 1 | - | - | - | - | Gateway/Router | |
| UniFi US-48-500W | 1 | - | - | - | - | 48-Port PoE Switch | |
| APC SMC1000I-2UC | 1 | - | - | - | - | UPS Power Management |
- Management VLAN (VLAN 80):
10.0.80.0/21- Kubernetes nodes - Trusted VLAN (VLAN 10):
10.0.10.0/24- Home devices, secondary k8s interfaces - Cluster Networking:
- Pod CIDR:
10.69.0.0/16 - Service CIDR:
10.96.0.0/16 - LoadBalancer VIP:
10.0.80.99
- Pod CIDR:
The platform hosts 70+ applications across multiple categories:
- Home Assistant - Comprehensive home automation platform
- ESPHome - ESP8266/ESP32 device management
- Zigbee2MQTT - Zigbee device bridge
- Mosquitto - MQTT message broker
- Frigate - AI-powered network video recorder
- go2rtc - Real-time streaming server
- EVCC - EV charging management
- TeslaMate - Tesla vehicle data logging and analytics
- Fernwood Booker - Custom multi-tenant appointment booking system
- Change Detection - Website monitoring
- Plex - Media server and streaming platform
- Sonarr + Sonarr 4K - TV series management
- Radarr + Radarr 4K - Movie management
- Prowlarr - Indexer aggregator
- Bazarr - Subtitle management
- SABnzbd - Usenet downloader
- qBittorrent - BitTorrent client
- Jellyseerr - Media request management
- Tautulli - Plex analytics and monitoring
- Unpackerr - Archive extraction automation
- xTeVe - IPTV proxy server
- Booklore - Book library management
- Shelfmark - Calibre-based book management
- Qui - Autobrr web UI
- Plexovic Gatus - Media service monitoring and status page
- Atuin - Shell history sync and search
- Memos - Privacy-first note-taking
- Miniflux - Minimalist RSS reader
- Paperless-NGX - Document management system
- Baby Buddy - Baby care tracking
- CloudNative-PG - PostgreSQL operator
- PgAdmin - PostgreSQL administration
- Dragonfly - In-memory data store, drop-in Redis replacement
- Garage - Distributed S3-compatible object storage
- Authelia - Authentication and authorization server
- LLDAP - Lightweight LDAP implementation
- External Secrets - Secrets management with 1Password
- cert-manager - Automatic TLS certificate management
- Cilium - eBPF-based CNI and security
- Envoy Gateway - Gateway API-based ingress (Internal + External gateways)
- Cloudflared - Secure tunnels to Cloudflare
- External DNS - Automatic DNS record management (Cloudflare for external, UniFi for internal)
- CoreDNS - Cluster DNS with conditional forwarding for internal domains
- AdGuard Home - Network-wide ad blocking
- Multus - Multiple network interfaces
- NTPd - Network time protocol server
- SMTP Relay - Outbound email service
- Echo Server - HTTP echo/debug server
- Prometheus - Metrics collection and alerting
- Grafana - Metrics visualization and dashboards
- Loki - Log aggregation and analysis
- Vector - Log collection and routing
- InfluxDB - Time-series database
- UnPoller - UniFi metrics collection
- Silence Operator - Alertmanager silence management
- Rook-Ceph - Distributed block and object storage
- OpenEBS - Local persistent volumes
- VolSync - Volume backup and synchronization
- Snapshot Controller - Volume snapshot management
- Kopia - Backup repository web UI
- Reloader - Automatic pod restarts on config changes
- Descheduler - Pod rescheduling optimization
- Spegel - Local container registry mirror
- Intel Device Plugin - GPU and hardware acceleration
- Node Feature Discovery - Hardware feature detection
- Metrics Server - Resource usage metrics
- Tuppr - Automated Talos Linux and Kubernetes upgrades
- Actions Runner Controller - Self-hosted GitHub Actions runners
- Enables image pre-pulling to Talos nodes via
talosctl - Scales 0-3 runners dynamically based on workflow demand
- Authenticates via GitHub App with cluster-admin and os:admin permissions
- Enables image pre-pulling to Talos nodes via
graph TD
A[Developer] -->|Git Push| B[GitHub Repository]
B -->|Webhook| C[Flux CD]
C -->|Pull Changes| B
C -->|Apply Manifests| D[Kubernetes Cluster]
D -->|Sync Status| C
E[Renovate Bot] -->|Dependency Updates| B
F[External Secrets] -->|Fetch Secrets| G[1Password]
F -->|Create K8s Secrets| D
Flux CD continuously monitors the Git repository and automatically applies changes to the cluster:
- Source Controller - Monitors Git repositories and Helm charts
- Kustomize Controller - Applies Kustomize configurations
- Helm Controller - Manages Helm releases
- Image Automation - Automatically updates container images
graph TD
A[Internet] -->|HTTPS| B[Cloudflare]
B -->|Cloudflare Tunnel| C[Envoy Gateway]
A -->|HTTPS| C
C -->|Ext Auth| D[Authelia]
D -->|LDAP Auth| E[LLDAP]
D -->|Authorized| F[Application]
G[External Secrets] -->|API| H[1Password Connect]
G -->|K8s Secrets| F
- Zero-Trust Network: All traffic encrypted and authenticated
- Multi-Factor Authentication: TOTP, WebAuthn, and Duo support
- Secrets Management: Encrypted at rest with SOPS, fetched from 1Password
- Certificate Management: Automated TLS with Let's Encrypt
- Network Policies: Microsegmentation with Cilium
graph TD
A[Applications] -->|RWO Volumes| B[Rook-Ceph RBD]
A -->|RWX Volumes| C[Rook-Ceph FS]
A -->|Local Volumes| D[OpenEBS LocalPV]
B -->|Backup| E[VolSync]
C -->|Backup| E
E -->|Kopia| F[NFS Repository]
E -->|Restic| G[Cloudflare R2]
H[NAS] -->|NFS| A
- Distributed Storage: Rook-Ceph across all nodes for redundancy
- Local Storage: OpenEBS for high-performance local volumes
- Network Storage: NFS mounts from Synology NAS
- Backup Strategy: Dual-storage β Kopia hourly to NFS, Restic daily to Cloudflare R2 for off-site disaster recovery
- CNI: Cilium with eBPF for high-performance networking (deployed without kube-proxy)
- Load Balancing: Cilium LB IPAM for bare-metal LoadBalancer services
- Ingress: Envoy Gateway with dual Gateways (internal + external) using the Kubernetes Gateway API
- DNS: AdGuard Home for network-wide filtering, UniFi for internal DNS records, Cloudflare for external DNS, CoreDNS with conditional forwarding for in-cluster resolution
- Multi-Homing: Multus CNI for additional network interfaces (IoT VLAN access)
- Tunnel: Cloudflared for secure external access through Cloudflare
The repository includes comprehensive Taskfile automation:
# Cluster operations
task talos:generate # Generate Talos configuration
task talos:apply # Apply Talos configuration
task talos:bootstrap # Bootstrap new cluster
task talos:fetch-kubeconfig # Generate talos kubeconfig
task talos:upgrade # Upgrade Talos on a node (requires: node=<ip>)
task talos:upgrade-rollout # Rolling Talos upgrade on all nodes
task talos:upgrade-k8s # Upgrade Kubernetes version (requires: node=<ip> to=<version>)
task talos:reboot-node # Reboot node (requires: IP=<ip>)
task talos:nuke # Reset nodes to maintenance mode (DESTRUCTIVE!)
# Volume backup operations
task volsync:check # Check volsync repo (requires: app=<name>)
task volsync:debug # Debug restic (requires: app=<name>)
task volsync:list # List snapshots (requires: app=<name>)
task volsync:unlock # Unlock restic repository (requires: app=<name>)
task volsync:snapshot # Create snapshot (requires: app=<name>)
task volsync:restore # Restore from snapshot (requires: app=<name>)
task volsync:cleanup # Delete volume populator PVCs
# Kubernetes operations
task k8s:delete-failed-pods # Delete pods with failed status- Talos OS: Automated rolling upgrades via Tuppr or manual via
task talos:upgrade node=<ip> - Kubernetes: Automated via Tuppr or manual coordinated upgrades following compatibility matrix
- Applications: Automated via Renovate bot + Flux CD
- Full documentation: See docs/UPGRADE.md
Complete cluster rebuild capability:
- Hardware Reset: PXE boot into Talos maintenance mode
- Cluster Bootstrap: Automated via bootstrap scripts
- Backup Restoration: VolSync automatically restores from Kopia (NFS) or Restic (R2) snapshots
- Full documentation: See docs/BOOTSTRAP.md
π kubernetes/
βββ π apps/ # Application deployments organized by namespace
β βββ π actions-runner-system/ # Self-hosted GitHub Actions runners
β βββ π automation/ # Home automation stack
β βββ π cert-manager/ # Certificate management
β βββ π database/ # Database services
β βββ π default/ # Default namespace apps (atuin, memos, etc.)
β βββ π external-secrets/ # Secrets management with 1Password
β βββ π flux-system/ # Flux operator and instance
β βββ π kube-system/ # Core cluster services (cilium, metrics, etc.)
β βββ π media/ # Media management applications
β βββ π network/ # Networking and DNS services
β βββ π observability/ # Monitoring and logging
β βββ π openebs-system/ # OpenEBS storage
β βββ π rook-ceph/ # Rook-Ceph distributed storage
β βββ π security/ # Authentication and security
β βββ π storage/ # Garage object storage
β βββ π system-upgrade/ # Automated Talos/K8s upgrades (Tuppr)
β βββ π volsync-system/ # Volume backup services
βββ π components/ # Reusable Kustomize components
β βββ π authelia-proxy/ # Authelia ext-auth security policy
β βββ π common/ # Common configurations
β βββ π volsync/ # VolSync components
βββ π flux/ # Flux system configuration
βββ π cluster/ # Cluster-wide configurations
π talos/ # Talos Linux configuration
βββ talconfig.yaml # Node definitions (managed by talhelper)
βββ talenv.yaml # Talos environment vars
βββ talsecret.yaml # Talos secrets (encrypted)
βββ π clusterconfig/ # Generated cluster configs (do not edit)
βββ π patches/ # Configuration patches
βββ π controller/ # Controller-specific patches
βββ π global/ # Global patches
π bootstrap/ # Initial cluster bootstrapping
βββ helmfile.yaml # Helmfile for bootstrapping
βββ resources.yaml.j2 # Template for resources
π scripts/ # Helper scripts
βββ π lib/ # Script libraries
π docs/ # Documentation
βββ BOOTSTRAP.md # Bootstrap procedures
βββ NODE-REPLACEMENT.md # Node replacement guide
βββ UPGRADE.md # Upgrade procedures
π .taskfiles/ # Task automation scripts
βββ π Kubernetes/ # Kubernetes tasks
βββ π Talos/ # Talos tasks and scripts
βββ π VolSync/ # VolSync tasks and templates
Taskfile.yaml # Main task definitions
Each application follows a consistent structure:
app-name/
βββ app/ # Application manifests
β βββ helmrelease.yaml # Helm chart configuration
β βββ kustomization.yaml # Kustomize configuration
β βββ externalsecret.yaml # Secret management (if needed)
β βββ configs/ # Additional config files (optional)
βββ ks.yaml # Flux Kustomization
- Hardware: Minimum 3x bare-metal servers or VMs with 16GB+ RAM
- Network: VLAN-capable switch and router/firewall
- DNS: Domain name with Cloudflare DNS management (external), UniFi gateway for internal DNS
- Secrets: 1Password account for secrets management
- Tools:
talosctl,kubectl,flux,task,age(for SOPS)
- Fork this repository and customize for your environment
- Configure secrets: Set up SOPS age key and 1Password Connect
- Prepare hardware: Install Talos Linux on your nodes
- Bootstrap cluster:
./scripts/bootstrap-cluster.sh
- Monitor deployment: Applications will automatically deploy via GitOps
Key files to customize for your environment:
talos/talconfig.yaml- Hardware and network configurationkubernetes/components/common/vars/cluster-settings.yaml- Cluster-wide configurationkubernetes/components/common/vars/cluster-secrets.sops.yaml- Encrypted secrets
| Service | Purpose | Cost |
|---|---|---|
| 1Password | Secrets management via External Secrets | ~$100/year |
| Cloudflare | DNS, CDN, R2 storage, and secure tunnels | Free |
| GitHub | Source control and CI/CD | Free |
| Total | ~$8/month |
This repository builds upon the excellent work of the k8s-at-home community. Special thanks to:
- onedr0p/cluster-template - GitOps cluster template
- k8s-at-home/charts - Kubernetes Helm charts
- Talos Linux Community - Modern Kubernetes platform
This project is licensed under the MIT License - see the LICENSE file for details.
β If you find this repository helpful, please consider giving it a star!
π Report Bug β’ π‘ Request Feature β’ π¬ Discussions