Add automated backup and restore via dedicated CRDs#2015
Open
discostur wants to merge 1 commit into
Open
Conversation
3ce23ba to
69632ec
Compare
Contributor
Author
|
Pushed an update extending the feature with several enhancements:
I also corrected the replicated-restore path to issue the schema |
69632ec to
56b62c0
Compare
Introduces operator-managed backup and restore for ClickHouse using clickhouse-backup, exposed through three new custom resources in the clickhouse.altinity.com/v1 API group: - ClickHouseBackup (chb): one-off backup -> Kubernetes Job - ClickHouseBackupSchedule (chbs): recurring backup -> managed CronJob - ClickHouseRestore (chr): one-off restore -> Kubernetes Job The controllers follow the existing ClickHouseKeeper controller-runtime pattern. clickhouse-backup runs as a sidecar (a documented prerequisite); the generated jobs trigger it remotely through the system.backup_actions integration table, so no backup logic is reimplemented in the operator. Cluster-aware: backs up one replica per shard for Replicated* tables (AllReplicas opt-in for non-replicated data); on restore it applies the schema on the first replica per shard via ON CLUSTER (requires the sidecar's restore_schema_on_cluster) and the data on the first replica, letting native replication synchronize the remaining replicas. Restore safety follows the conventions of mature DB operators: preflight validation (target CHI Completed, topology reachable) and an overwrite guard that refuses a non-empty target unless overwrite=true. Also adds: selective (tables/partitions) and incremental (--diff-from-remote) backups; remote-backup retention (keepLastRemote); optional post-backup verification; Prometheus metrics on the operator's existing :9999 endpoint plus Kubernetes Events; and annotation-driven bootstrap-from-backup for new installations. Compression and encryption are documented as clickhouse-backup sidecar settings. Includes the CRDs, RBAC (incl. batch jobs/cronjobs), regenerated install bundles and Helm chart, documentation and examples, Go unit tests and a TestFlows e2e test. Refs Altinity#1795, Altinity#862. Supersedes the gRPC-plugin approach of Altinity#1798. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Kilian Ries <mail@kilian-ries.de>
56b62c0 to
d5f2eb2
Compare
Contributor
Author
|
Tested locally in dev k8s cluster ... happy to get some feedback from the maintainers ;) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add automated backup & restore via dedicated CRDs
Closes the long-standing request for operator-managed backups (#1795, #862). This
supersedes the gRPC-plugin approach of #1798 with a lighter, CRD-driven design that
wraps
clickhouse-backupand mirrors theexisting
ClickHouseKeepercontroller-runtime pattern.What it adds
Three new custom resources in the
clickhouse.altinity.com/v1group:ClickHouseBackupchbJobClickHouseBackupSchedulechbsCronJobClickHouseRestorechrJobDesign
clickhouse-backupruns as a sidecar (documentedprerequisite,
API_CREATE_INTEGRATION_TABLES=true); the operator-generated jobs triggerit remotely through the
system.backup_actionsintegration table. The new controllersstay fully decoupled from CHI reconciliation.
pkg/controller/chk. Jobs/CronJobs are owned by theCR (automatic GC + status tracking via
status.conditions).CronJob; remote retention is delegated to
clickhouse-backup(BACKUPS_TO_KEEP_REMOTE).Cluster awareness
FirstPerShard) — correct andstorage-efficient for
Replicated*tables.AllReplicasis available for clusters withnon-replicated/local tables.
shard, letting native replication synchronize the rest.
Restore safety (per mature DB-operator conventions, e.g. CloudNativePG)
Completed, topology reachable) surfaced via conditions.overwrite: true.(
backoffLimit: 0,restartPolicy: Never).Included
dev/run_code_generator.sh).thread_backup.go).batchjobs/cronjobs); regenerated installbundles and Helm chart.
docs/backup.md+docs/chb-examples/.(
tests/e2e/test_backup_restore.py) doing a backup→restore round-trip with replica-syncverification.
Notes / follow-ups
layoutcounts using the default namingscheme; explicit shard/replica lists / custom host names are a planned follow-up.
{shard}macro(e.g.
S3_PATH: backup/shard-{shard}).🤖 Generated with Claude Code