👋 Welcome to my Home Operations repository. This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Ansible, Kubernetes, Flux, Renovate and GitHub Actions.
If you like this project, please consider supporting the work of onedr0p and bjw-s.
My Kubernetes cluster is a hyper-converged cluster deployed with Talos on three bare-metal nodes. Workloads and block storage share the same available resources backed by Rook Ceph on Samsung SSDs, while I have a separate server with NFS shares for media libraries and backups.
- actions-runner-controller: Self-hosted Github runners using Renovate.
- cert-manager: Automates SSL/TLS certificate management.
- cilium: eBPF-based Kubernetes CNI.
- cloudflared: Enables Cloudflare's Zero Trust Network Access.
- external-dns: Automatically syncs DNS records to my DNS provider.
- external-secrets: Managed Kubernetes secrets using aKeyless.
- generic-device-plugin: Allocates linux devices to pods (squat.ai/tun).
- envoy-gateway: Envoy Proxy to manage service-to-service communication and proxying.
- nvidia-device-plugin: Provides nvidia.com/gpu resource to pods.
- openebs: CNI for ephemeral local storage.
- rook: Distributed block storage for peristent storage.
- spegel: Stateless cluster local OCI registry mirror.
- tuppr: Automatic Talos and Kubernetes upgrades.
- unifi-dns: External-DNS Webhook to manage UniFi DNS Records.
- volsync: Backup and recovery of persistent volume claims.
- alertmanager: Handles processing and sending alerts.
- blackbox-exporter: Probe external endpoint ports for success/failure.
- fluent-bit: Log processor.
- gatus: High level status dashboard.
- grafana: Data visualization platform.
- karma: Alertmanager dashboard, based on Cloudflare's unsee.
- keda: Autoscales containers on events (i.e. blackbox reports NFS share is down).
- kromgo: Expose prometheus metrics "safely" to GitHub.
- silence-operator: Manages Alertmanager silences via custom resources.
- unpoller: Collect UniFi Controller data for Prometheus.
- kube-prometheus-stack: Prometheus + Grafana + AlertManager stack for metrics.
- victoriaLogs: Database for logs.
- AirVPN: VPN service.
- aKeyless: Managing secrets via external-secrets.
- Cloudflare: Tunnels for exposing services and DNS provider.
- Cloudinary: Image hosting for plex newsletter posters.
- Backblaze B2: Daily backups from volsync and cnpg.
- Amazon SES: Sending system emails.
- Pushover: Sending push notifications to mobile.
Flux watches the clusters in my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.
The way Flux works for me here is it will recursively search the kubernetes/apps folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml will generally only have a namespace resource and one or many Flux kustomizations (ks.yaml). Under the control of those Flux kustomizations there will be a HelmRelease or other resources related to the application which will be applied.
Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When I merge those PRs, Flux applies the changes to my cluster.
Just files are used to call on repetative commands or batches of commands, grouped into receipes. The root directory has a .justfile which imports three modules (bootstrap, kube, and talos) while providing shared logging utilities and enforces bash error handling.
This Git repository contains the following directories.
📁 docker
├── 📁 unraid # docker deployments on unraid
├── 📁 truenas # docker deployments on truenas
└── 📁 ai3090 # docker deployments on RTX3090
📁 kubernetes
├── 📁 apps # applications organized by namespace
├── 📁 bootstrap # exactly what it sounds like
│ ├── 📁 cnpg # cnpg patch to run at bootstrap
│ ├── 📁 helmfile.d # helmreleases required at bootstrap
│ ├── 📁 scripts # some janky hacks
│ └── 📝 mod.just # .justfile Bootstrap module
├── 📁 components # re-useable kustomize components
├── 📁 flux # flux system configuration
├── 📁 talos # node OS configurations
│ ├── 📁 nodes # Override configurations for individual nodes
│ ├── 📝 machineconfig.yaml.j2 # Base configuration template for all nodes
│ ├── 📝 mod.just # .justfile Talos module
│ └── 📝 schematic.yaml.j2 # Talos image factory schematic
└── 📝 mod.just # .justfile Kubernetes module
📝 .justfile
📝 .mise.toml
📝 kubeconfig
📝 talosconfig| Name | Device | CPU | OS Disk | Local Disk | Rook Disk | RAM | OS | Purpose |
|---|---|---|---|---|---|---|---|---|
| k8s-1 | M70q-Gen3 | i7-12700T | 500GB NVMe | - | 1.92TB SSD | 64GB | Talos | k8s control-plane |
| k8s-2 | M70q-Gen3 | i7-12700T | 512GB NVMe | - | 1.92TB SSD | 64GB | Talos | k8s control-plane |
| k8s-3 | M70q-Gen3 | i7-12700T | 512GB NVMe | - | 1.92TB SSD | 64GB | Talos | k8s control-plane |
| ai3090 | Precision Tower 3620 | i7-7700K | 256GB NVMe | - | - | 16GB | Talos | k8s worker (LLM) |
| Name | Device | CPU | OS Disk | Data Disk | RAM | OS | Purpose |
|---|---|---|---|---|---|---|---|
| TrueNAS | X10SDV-8C + KTN-STL3 | Xeon D-1541 | 512GB NVMe | 2x18TB 2x14TB 2x10TB | 64GB | Truenas | NAS/NFS |
| UnRAID | Dell R510 | Xeon E5640 | 16GB USB | Mixture of 12x6TB+ | 64GB | Unraid | Backup |
This cluster comes from the people who have shared their clusters using the k8s-at-home GitHub topic. Be sure to check out the awesome Kubesearch tool for ideas on how to deploy applications or get ideas on what you can deploy.
There is a template over at onedr0p/cluster-template if you want to try and follow along with some of the practices I use here.
See LICENSE