From ec8cdb44d7d0231728a8222b4bb2b76aaff90386 Mon Sep 17 00:00:00 2001 From: cytong Date: Fri, 8 May 2026 13:52:06 +0800 Subject: [PATCH 1/2] docs(harbor-dr): clarify runtime component failover scope --- ...How_to_perform_disaster_recovery_for_harbor.md | 14 +++++++++++++- ...How_to_perform_disaster_recovery_for_harbor.md | 15 ++++++++++++++- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md b/docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md index 7880917d..5cae6793 100644 --- a/docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md +++ b/docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md @@ -14,6 +14,8 @@ id: KB251000012 This solution describes how to build a Harbor disaster recovery solution based on Object Storage and PostgreSQL disaster recovery capabilities. The solution primarily focuses on data disaster recovery processing, and users need to implement their own Harbor access address switching mechanism. +The current design only covers disaster recovery for PostgreSQL and Object Storage. `jobservice`, `trivy`, and `redis` do not have cross-cluster data replication or hot-standby failover configured. As a result, `jobservice` job logs, `trivy` local cache/vulnerability database, and `redis` cache or session-style data may be lost during failover. This does not affect Harbor core functions such as project access, image push, or image pull, because these runtime data sets are rebuilt on demand after the secondary cluster is activated. + ## Environment Harbor CE Operator: >=v2.12.4 @@ -58,12 +60,20 @@ The solution leverages two independent data synchronization mechanisms: 1. **Database Layer**: PostgreSQL streaming replication ensures real-time transaction log synchronization between primary and secondary databases 2. **Storage Layer**: Object storage replication maintains data consistency across primary and secondary storage systems +#### Components Outside the DR Sync Scope + +- **jobservice**: Job execution state and historical job logs are not synchronized. After failover, in-flight jobs or logs that were not persisted elsewhere may be lost, but jobs can be retriggered and Harbor service availability is not affected. +- **trivy**: The local vulnerability database and scan cache are not synchronized. After failover, the secondary cluster must download or rebuild them again, which may affect the latency of the first scan but does not affect image push or pull. +- **redis**: Cache, session, and queue-like transient data are not synchronized. This in-memory state is lost after failover, but Harbor rebuilds the runtime state on the new primary cluster. + #### Disaster Recovery Configuration 1. **Deploy Primary Harbor**: Configure the primary instance to connect to the primary PostgreSQL database and use primary object storage as the registry backend 2. **Deploy Secondary Harbor**: Configure the secondary instance to connect to the secondary PostgreSQL database and use secondary object storage as the registry backend 3. **Initialize Standby State**: Set replica count of all secondary Harbor components to 0 to prevent unnecessary background operations and resource consumption +With this setup, the persistent DR scope only includes Harbor metadata in PostgreSQL and image artifacts in Object Storage. `jobservice`, `trivy`, and `redis` are treated as runtime components and are reinitialized after the secondary cluster is activated. + #### Failover Procedure When a disaster occurs, the following steps ensure transition to the secondary environment: @@ -310,6 +320,7 @@ spec: 5. Test image push and pull to verify that Harbor is working properly. 6. Switch external access addresses to Secondary Harbor. +7. Check whether `jobservice`, `trivy`, and `redis` have rebuilt the required runtime state, for example by confirming that new jobs can be queued, the Trivy database is available or updated as expected, and Redis-backed cache connections are healthy. ### Disaster Recovery @@ -346,7 +357,8 @@ The RPO represents the maximum acceptable data loss during a disaster recovery s - **Database Layer**: Near-zero data loss due to PostgreSQL hot standby with streaming replication - **Storage Layer**: Near-zero data loss due to synchronous object storage replication -- **Overall RPO**: Near-zero data loss due to synchronous replication of both database and object storage layers +- **jobservice / trivy / redis**: These components are outside the cross-cluster replication scope and have a small but real runtime data loss risk +- **Overall RPO**: Near-zero for Harbor metadata and image artifacts; non-zero for job logs, vulnerability database cache, and Redis transient data **Factors affecting RPO:** diff --git a/docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md b/docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md index adea6f95..0c6ee8cb 100644 --- a/docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md +++ b/docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md @@ -15,6 +15,8 @@ sourceSHA: f505b4bf1ca71fbde03bd845afe8cdb0d48f456ac817c0e5fa7d4d3045a0bcbc 本解决方案描述了如何基于对象存储和 PostgreSQL 灾难恢复能力构建 Harbor 灾难恢复解决方案。该解决方案主要关注数据灾难恢复处理,用户需要实现自己的 Harbor 访问地址切换机制。 +当前方案仅覆盖 PostgreSQL 和对象存储的容灾。`jobservice`、`trivy` 和 `redis` 未配置跨集群数据同步或热备切换:`jobservice` 的任务日志、`trivy` 的本地缓存/漏洞数据库以及 `redis` 中的缓存与会话类数据在切换后可能丢失,但 Harbor 的核心功能(如项目访问、镜像推送和拉取)不受影响,相关运行时数据会在备集群启动后按需重建。 + ## 环境 Harbor CE Operator: >=v2.12.4 @@ -59,12 +61,20 @@ Harbor 灾难恢复解决方案实现了 Harbor 服务的 **冷备架构** 和 * 1. **数据库层**:PostgreSQL 流复制确保主数据库和备数据库之间的实时事务日志同步 2. **存储层**:对象存储复制保持主存储和备存储系统之间的数据一致性 +#### 未纳入容灾同步的组件 + +- **jobservice**:未同步任务执行状态与历史任务日志。故障转移后,正在执行或尚未落库的任务状态可能丢失,但可重新触发任务,不影响 Harbor 对外服务。 +- **trivy**:未同步本地漏洞数据库和扫描缓存。故障转移后需要在备集群重新下载或重建相关数据,可能影响首次扫描时延,但不影响镜像推送和拉取。 +- **redis**:未同步缓存、会话和队列类临时数据。故障转移后这部分内存数据会丢失,但 Harbor 会在新主集群重新建立运行时状态。 + #### 灾难恢复配置 1. **部署主 Harbor**:配置主实例以连接到主 PostgreSQL 数据库,并使用主对象存储作为注册表后端 2. **部署备 Harbor**:配置备实例以连接到备 PostgreSQL 数据库,并使用备对象存储作为注册表后端 3. **初始化待命状态**:将所有备 Harbor 组件的副本数设置为 0,以防止不必要的后台操作和资源消耗 +在该配置下,Harbor 的持久化容灾范围仅包括 PostgreSQL 中的元数据和对象存储中的镜像制品。`jobservice`、`trivy` 和 `redis` 作为运行时组件在备集群激活后重新初始化。 + #### 故障转移程序 当发生灾难时,以下步骤确保切换到备环境: @@ -315,6 +325,8 @@ spec: 6. 切换外部访问地址到备 Harbor。 +7. 根据业务需要检查 `jobservice`、`trivy` 和 `redis` 的重建状态,例如确认任务队列已恢复接收新任务、Trivy 数据库可正常更新或离线可用、Redis 已重新建立缓存连接。 + ### 灾难恢复 当主集群从灾难中恢复时,您可以将原主 Harbor 恢复为备 Harbor。按照以下步骤执行恢复: @@ -350,7 +362,8 @@ RPO 表示在灾难恢复场景中可接受的最大数据丢失量。在此 Har - **数据库层**:由于 PostgreSQL 热备和流复制,数据丢失接近零 - **存储层**:由于同步对象存储复制,数据丢失接近零 -- **整体 RPO**:由于数据库和对象存储层的同步复制,数据丢失接近零 +- **jobservice / trivy / redis**:这些组件未做跨集群数据同步,存在少量运行时数据丢失风险 +- **整体 RPO**:对于 Harbor 元数据和镜像制品接近零;对于任务日志、漏洞库缓存和 Redis 临时数据为非零 **影响 RPO 的因素:** From 5e58e657871f3e47d51d1d8f17d0716a527c61b2 Mon Sep 17 00:00:00 2001 From: cytong Date: Fri, 8 May 2026 14:03:54 +0800 Subject: [PATCH 2/2] docs(harbor-dr): remove runtime state rebuild notes --- .../How_to_perform_disaster_recovery_for_harbor.md | 6 ++---- .../How_to_perform_disaster_recovery_for_harbor.md | 6 ++---- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md b/docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md index 5cae6793..a6ab7089 100644 --- a/docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md +++ b/docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md @@ -14,7 +14,7 @@ id: KB251000012 This solution describes how to build a Harbor disaster recovery solution based on Object Storage and PostgreSQL disaster recovery capabilities. The solution primarily focuses on data disaster recovery processing, and users need to implement their own Harbor access address switching mechanism. -The current design only covers disaster recovery for PostgreSQL and Object Storage. `jobservice`, `trivy`, and `redis` do not have cross-cluster data replication or hot-standby failover configured. As a result, `jobservice` job logs, `trivy` local cache/vulnerability database, and `redis` cache or session-style data may be lost during failover. This does not affect Harbor core functions such as project access, image push, or image pull, because these runtime data sets are rebuilt on demand after the secondary cluster is activated. +The current design only covers disaster recovery for PostgreSQL and Object Storage. `jobservice`, `trivy`, and `redis` do not have cross-cluster data replication or hot-standby failover configured. As a result, `jobservice` job logs, `trivy` local cache/vulnerability database, and `redis` cache or session-style data may be lost during failover. This does not affect Harbor core functions such as project access, image push, or image pull. ## Environment @@ -72,7 +72,7 @@ The solution leverages two independent data synchronization mechanisms: 2. **Deploy Secondary Harbor**: Configure the secondary instance to connect to the secondary PostgreSQL database and use secondary object storage as the registry backend 3. **Initialize Standby State**: Set replica count of all secondary Harbor components to 0 to prevent unnecessary background operations and resource consumption -With this setup, the persistent DR scope only includes Harbor metadata in PostgreSQL and image artifacts in Object Storage. `jobservice`, `trivy`, and `redis` are treated as runtime components and are reinitialized after the secondary cluster is activated. +With this setup, the persistent DR scope only includes Harbor metadata in PostgreSQL and image artifacts in Object Storage. #### Failover Procedure @@ -320,8 +320,6 @@ spec: 5. Test image push and pull to verify that Harbor is working properly. 6. Switch external access addresses to Secondary Harbor. -7. Check whether `jobservice`, `trivy`, and `redis` have rebuilt the required runtime state, for example by confirming that new jobs can be queued, the Trivy database is available or updated as expected, and Redis-backed cache connections are healthy. - ### Disaster Recovery When the primary cluster recovers from a disaster, you can restore the original Primary Harbor to operate as a Secondary Harbor. Follow these steps to perform the recovery: diff --git a/docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md b/docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md index 0c6ee8cb..db3e92ce 100644 --- a/docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md +++ b/docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md @@ -15,7 +15,7 @@ sourceSHA: f505b4bf1ca71fbde03bd845afe8cdb0d48f456ac817c0e5fa7d4d3045a0bcbc 本解决方案描述了如何基于对象存储和 PostgreSQL 灾难恢复能力构建 Harbor 灾难恢复解决方案。该解决方案主要关注数据灾难恢复处理,用户需要实现自己的 Harbor 访问地址切换机制。 -当前方案仅覆盖 PostgreSQL 和对象存储的容灾。`jobservice`、`trivy` 和 `redis` 未配置跨集群数据同步或热备切换:`jobservice` 的任务日志、`trivy` 的本地缓存/漏洞数据库以及 `redis` 中的缓存与会话类数据在切换后可能丢失,但 Harbor 的核心功能(如项目访问、镜像推送和拉取)不受影响,相关运行时数据会在备集群启动后按需重建。 +当前方案仅覆盖 PostgreSQL 和对象存储的容灾。`jobservice`、`trivy` 和 `redis` 未配置跨集群数据同步或热备切换:`jobservice` 的任务日志、`trivy` 的本地缓存/漏洞数据库以及 `redis` 中的缓存与会话类数据在切换后可能丢失,但 Harbor 的核心功能(如项目访问、镜像推送和拉取)不受影响。 ## 环境 @@ -73,7 +73,7 @@ Harbor 灾难恢复解决方案实现了 Harbor 服务的 **冷备架构** 和 * 2. **部署备 Harbor**:配置备实例以连接到备 PostgreSQL 数据库,并使用备对象存储作为注册表后端 3. **初始化待命状态**:将所有备 Harbor 组件的副本数设置为 0,以防止不必要的后台操作和资源消耗 -在该配置下,Harbor 的持久化容灾范围仅包括 PostgreSQL 中的元数据和对象存储中的镜像制品。`jobservice`、`trivy` 和 `redis` 作为运行时组件在备集群激活后重新初始化。 +在该配置下,Harbor 的持久化容灾范围仅包括 PostgreSQL 中的元数据和对象存储中的镜像制品。 #### 故障转移程序 @@ -325,8 +325,6 @@ spec: 6. 切换外部访问地址到备 Harbor。 -7. 根据业务需要检查 `jobservice`、`trivy` 和 `redis` 的重建状态,例如确认任务队列已恢复接收新任务、Trivy 数据库可正常更新或离线可用、Redis 已重新建立缓存连接。 - ### 灾难恢复 当主集群从灾难中恢复时,您可以将原主 Harbor 恢复为备 Harbor。按照以下步骤执行恢复: