Decision: Use Valkey (Redis-compatible) for caching layer.
Alternatives Considered:
- No caching (query DB directly)
- Memcached
- PostgreSQL native caching (pg_caching)
- Redis (original)
- KeyDB (Redis fork with multi-threading)
- DragonflyDB (modern, higher performance)
Reasons:
- Performance: In-memory caching is 10-100x faster than DB queries
- TTL Support: Built-in expiration for cache invalidation
- Industry Standard: Well-understood, well-documented patterns
- Already Available: Included in docker-compose stack (valkey service)
- Graceful Fallback: Cache module handles missing Redis (no errors, just slower)
- Open Source Governance: Redis moved to RSPLv2 license (SSPL) in 2024 - Valkey is fully open source (BSD)
- Future-Proof: Avoid potential vendor lock-in from Redis Ltd.
- API Compatibility: 100% Redis-compatible - drop-in replacement
- Active Development: Backed by cloud providers, Linux Foundation
- No Code Changes: Existing Redis clients work without modification
- Lower Cloud Cost: No license fees - cheaper to run in cloud environments
Trade-offs:
- Additional infrastructure (need to maintain Valkey)
- Cache invalidation complexity
- Memory usage
- Smaller community than Redis (but growing)
- Fewer third-party integrations
- Newer project (less battle-tested)
Decision: Deploy to Kubernetes (k3s) instead of Docker Compose for production.
Alternatives Considered:
- Docker Compose (simpler)
- Nomad (simpler than k8s)
- AWS ECS/Fargate (managed)
- Terraform + cloud VMs (simpler)
Reasons:
- Scaling: Horizontal pod autoscaling, load balancing built-in
- Self-Healing: Automatic restarts, health checks, node management
- Industry Standard: Most common orchestrator, widely understood
- Declarative: Infrastructure as Code - GitOps friendly
- Namespace Isolation: Better multi-environment separation (dev/staging/prod)
- Rolling Updates: Zero-downtime deployments out of the box
- Ephemeral Storage: Stateless app design - pods can be replaced freely
Trade-offs:
- Steeper learning curve
- More complex setup
- Higher resource overhead
- Requires cluster management (k3s light-weight but still extra work)
Decision: Use Prometheus for metrics, Grafana for visualization.
Alternatives Considered:
- Datadog (proprietary, expensive)
- CloudWatch (AWS-only)
- ELK Stack (more for logs than metrics)
- Self-hosted alternatives (VictoriaMetrics, Thanos)
Reasons:
- Industry Standard: Most common in SRE/DevOps
- Open Source: Free, no vendor lock-in
- Integrations: Lots of exporters (DB, Redis, nginx, etc.)
- Grafana: Best-in-class visualization, works with many data sources
- Already Configured: docker-compose includes these services
Trade-offs:
- Need to understand PromQL (learning curve)
- Storage can grow large (retention policies needed)
- Push vs Pull model (Prometheus = pull)
Decision: Use GitHub Actions for continuous integration.
Alternatives Considered:
- Jenkins (self-hosted, more setup)
- CircleCI (free tier limited)
- GitLab CI (requires GitLab)
- Travis CI (not free for private repos)
Reasons:
- Already Using GitHub: No separate account needed
- Generous Free Tier: Unlimited minutes for public repos
- Easy Setup: YAML-based, lots of templates
- Built-in: Integrated with PRs, branches, secrets
Trade-offs:
- Limited to GitHub (lock-in)
- Can hit rate limits on heavy usage
- Some features only on paid plans