Skip to content

[REVISIT] Reliable PostgreSQL Backups & DR with CloudNativePG and Cloudflare R2 #3792

@binaryn3xus

Description

@binaryn3xus

[REVISIT] Reliable PostgreSQL Backups & DR with CloudNativePG and Cloudflare R2

🚀 Overview

Re-implement the database backup strategy using CloudNativePG (CNPG) and Cloudflare R2. The goal is to move away from unreliable volume-based backups and adopt a declarative, WAL-archiving approach that allows for Point-in-Time Recovery (PITR).

🎯 Goals

  • Credential Sync: Configure ExternalSecrets to pull R2 Access Keys into the database namespace.
  • WAL Archiving: Configure CNPG to stream Write-Ahead Logs (WAL) to R2 every few minutes.
  • Scheduled Backups: Set up daily base backups to R2 with a 30-day retention policy.
  • Recovery Drill: Document and test the "Bootstrap from External Cluster" method to ensure we can restore to a new cluster name.

🏗️ The "Restore" Logic (The Missing Piece)

When a disaster occurs:

  1. We do not run pg_restore.
  2. We create a new Cluster manifest.
  3. We define the dead cluster as an externalCluster.
  4. We set bootstrap.recovery.source to the name of that external cluster.

📋 Implementation Task List

Phase 1: R2 Bucket & Credentials

  • Create a dedicated R2 bucket (e.g., homeops-postgres-backups).
  • Create an API token with Object Read & Write permissions.
  • Deploy a Kubernetes Secret r2-backups-creds containing ACCESS_KEY_ID and ACCESS_SECRET_KEY.

Phase 2: Cluster Configuration

  • Update existing Cluster manifests to include the backup block:
    spec:
      backup:
        barmanObjectStore:
          destinationPath: "s3://homeops-postgres-backups/"
          endpointURL: "https://<account-id>.r2.cloudflarestorage.com"
          s3Credentials:
            accessKeyId: { name: "r2-backups-creds", key: "ID" }
            secretAccessKey: { name: "r2-backups-creds", key: "SECRET" }
          wal:
            compression: gzip
    

Phase 3: The Disaster Recovery Guide

  • ​[ ] Create docs/dr-postgres.md with a template for bootstrapping a new cluster from the R2 bucket.
  • ​[ ] Perform a "live" test: Create a dummy DB, wait for a backup, delete the cluster, and restore it as dummy-db-v2.

​📚 References
​Official CNPG Backup & Recovery Docs
​Bootstrap from Backup (Recovery)
​Cloudflare R2 S3 Compatibility
​Notes: Cloudflare R2 is ideal because it has zero egress fees, making frequent recovery tests free of charge.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions