Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ id: KB251000012

This solution describes how to build a Harbor disaster recovery solution based on Object Storage and PostgreSQL disaster recovery capabilities. The solution primarily focuses on data disaster recovery processing, and users need to implement their own Harbor access address switching mechanism.

The current design only covers disaster recovery for PostgreSQL and Object Storage. `jobservice`, `trivy`, and `redis` do not have cross-cluster data replication or hot-standby failover configured. As a result, `jobservice` job logs, `trivy` local cache/vulnerability database, and `redis` cache or session-style data may be lost during failover. This does not affect Harbor core functions such as project access, image push, or image pull.

## Environment

Harbor CE Operator: >=v2.12.4
Expand Down Expand Up @@ -58,12 +60,20 @@ The solution leverages two independent data synchronization mechanisms:
1. **Database Layer**: PostgreSQL streaming replication ensures real-time transaction log synchronization between primary and secondary databases
2. **Storage Layer**: Object storage replication maintains data consistency across primary and secondary storage systems

#### Components Outside the DR Sync Scope

- **jobservice**: Job execution state and historical job logs are not synchronized. After failover, in-flight jobs or logs that were not persisted elsewhere may be lost, but jobs can be retriggered and Harbor service availability is not affected.
- **trivy**: The local vulnerability database and scan cache are not synchronized. After failover, the secondary cluster must download or rebuild them again, which may affect the latency of the first scan but does not affect image push or pull.
- **redis**: Cache, session, and queue-like transient data are not synchronized. This in-memory state is lost after failover, but Harbor rebuilds the runtime state on the new primary cluster.

#### Disaster Recovery Configuration

1. **Deploy Primary Harbor**: Configure the primary instance to connect to the primary PostgreSQL database and use primary object storage as the registry backend
2. **Deploy Secondary Harbor**: Configure the secondary instance to connect to the secondary PostgreSQL database and use secondary object storage as the registry backend
3. **Initialize Standby State**: Set replica count of all secondary Harbor components to 0 to prevent unnecessary background operations and resource consumption

With this setup, the persistent DR scope only includes Harbor metadata in PostgreSQL and image artifacts in Object Storage.

#### Failover Procedure

When a disaster occurs, the following steps ensure transition to the secondary environment:
Expand Down Expand Up @@ -310,7 +320,6 @@ spec:

5. Test image push and pull to verify that Harbor is working properly.
6. Switch external access addresses to Secondary Harbor.

### Disaster Recovery

When the primary cluster recovers from a disaster, you can restore the original Primary Harbor to operate as a Secondary Harbor. Follow these steps to perform the recovery:
Expand Down Expand Up @@ -346,7 +355,8 @@ The RPO represents the maximum acceptable data loss during a disaster recovery s

- **Database Layer**: Near-zero data loss due to PostgreSQL hot standby with streaming replication
- **Storage Layer**: Near-zero data loss due to synchronous object storage replication
- **Overall RPO**: Near-zero data loss due to synchronous replication of both database and object storage layers
- **jobservice / trivy / redis**: These components are outside the cross-cluster replication scope and have a small but real runtime data loss risk
- **Overall RPO**: Near-zero for Harbor metadata and image artifacts; non-zero for job logs, vulnerability database cache, and Redis transient data

**Factors affecting RPO:**

Expand Down
13 changes: 12 additions & 1 deletion docs/zh/solutions/How_to_perform_disaster_recovery_for_harbor.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ sourceSHA: f505b4bf1ca71fbde03bd845afe8cdb0d48f456ac817c0e5fa7d4d3045a0bcbc

本解决方案描述了如何基于对象存储和 PostgreSQL 灾难恢复能力构建 Harbor 灾难恢复解决方案。该解决方案主要关注数据灾难恢复处理,用户需要实现自己的 Harbor 访问地址切换机制。

当前方案仅覆盖 PostgreSQL 和对象存储的容灾。`jobservice`、`trivy` 和 `redis` 未配置跨集群数据同步或热备切换:`jobservice` 的任务日志、`trivy` 的本地缓存/漏洞数据库以及 `redis` 中的缓存与会话类数据在切换后可能丢失,但 Harbor 的核心功能(如项目访问、镜像推送和拉取)不受影响。

## 环境

Harbor CE Operator: >=v2.12.4
Expand Down Expand Up @@ -59,12 +61,20 @@ Harbor 灾难恢复解决方案实现了 Harbor 服务的 **冷备架构** 和 *
1. **数据库层**:PostgreSQL 流复制确保主数据库和备数据库之间的实时事务日志同步
2. **存储层**:对象存储复制保持主存储和备存储系统之间的数据一致性

#### 未纳入容灾同步的组件

- **jobservice**:未同步任务执行状态与历史任务日志。故障转移后,正在执行或尚未落库的任务状态可能丢失,但可重新触发任务,不影响 Harbor 对外服务。
- **trivy**:未同步本地漏洞数据库和扫描缓存。故障转移后需要在备集群重新下载或重建相关数据,可能影响首次扫描时延,但不影响镜像推送和拉取。
- **redis**:未同步缓存、会话和队列类临时数据。故障转移后这部分内存数据会丢失,但 Harbor 会在新主集群重新建立运行时状态。

#### 灾难恢复配置

1. **部署主 Harbor**:配置主实例以连接到主 PostgreSQL 数据库,并使用主对象存储作为注册表后端
2. **部署备 Harbor**:配置备实例以连接到备 PostgreSQL 数据库,并使用备对象存储作为注册表后端
3. **初始化待命状态**:将所有备 Harbor 组件的副本数设置为 0,以防止不必要的后台操作和资源消耗

在该配置下,Harbor 的持久化容灾范围仅包括 PostgreSQL 中的元数据和对象存储中的镜像制品。

#### 故障转移程序

当发生灾难时,以下步骤确保切换到备环境:
Expand Down Expand Up @@ -350,7 +360,8 @@ RPO 表示在灾难恢复场景中可接受的最大数据丢失量。在此 Har

- **数据库层**:由于 PostgreSQL 热备和流复制,数据丢失接近零
- **存储层**:由于同步对象存储复制,数据丢失接近零
- **整体 RPO**:由于数据库和对象存储层的同步复制,数据丢失接近零
- **jobservice / trivy / redis**:这些组件未做跨集群数据同步,存在少量运行时数据丢失风险
- **整体 RPO**:对于 Harbor 元数据和镜像制品接近零;对于任务日志、漏洞库缓存和 Redis 临时数据为非零

**影响 RPO 的因素:**

Expand Down
Loading