Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 18 additions & 74 deletions docs/deployments/deployment-preparation/erasure-coding-scheme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ weight: 30100

Choosing the appropriate **erasure coding scheme** is crucial when deploying a simplyblock storage cluster, as it
directly impacts **data redundancy, storage efficiency, and overall system performance**. Simplyblock currently supports
the following erasure coding schemes: **1+1**, **2+1**, **4+1**, **1+2**, **2+2**, and **4+2**. Understanding the
the following erasure coding schemes: **1+0**, **1+1**, **2+1**, **4+1**, **1+2**, **2+2**, and **4+2**. Understanding the
trade-offs between redundancy and storage utilization will help determine the best option for your workload. All schemas
have been performance-optimized by specialized algorithms. There is, however, a remaining capacity-to-performance
trade-off.
Expand All @@ -22,79 +22,23 @@ allowing data recovery in case of hardware failures. The notation **k+m** repres
If you need more information on erasure coding, see the dedicated concept page for
[erasure coding](../../architecture/concepts/erasure-coding.md).

### Scheme: 1+1

- **Description:** In the _1+1 scheme_, data is mirrored, effectively creating an exact copy of every data block.
- **Redundancy Level:** Can tolerate the failure of **one** storage node.
- **Raw-to-Effective Ratio:** **200%**
- **Available Storage Capacity:** **50%**
- **Performance Considerations:** Offers **fast recovery and high read performance** due to data mirroring.
- **Best Use Cases:**
- Workloads requiring **high availability and minimal recovery time**.
- Applications where **performance is prioritized over storage efficiency**.
- Requires 3 or more nodes for full redundancy.

### Scheme: 2+1

- **Description:** In the _2+1 scheme_, data is divided into two fragments with one parity fragment, offering a
balance between performance and storage efficiency.
- **Redundancy Level:** Can tolerate the failure of **one** storage node.
- **Raw-to-Effective Ratio:** **150%**
- **Available Storage Capacity:** **66.6%**
- **Performance Considerations:** For writes of 8K or higher, **lower write amplification** compared to **1+1**, as data is distributed across multiple nodes. This typically results in similar or higher IOPS. However, for small random writes (4K), the write performance is worse than **1+1**. Write latency is somewhat higher than with **1+1**. Read performance is similar to **1+1**, if local node affinity is disabled. With node affinity enabled, read performance is slightly worse (up to 25%). In a degraded state (one node offline / unavailable or failed disk), the performance is worse than with **1+1**. Recovery time to full redundancy from single disk error is slightly higher than with **1+1**.
- **Best Use Cases:**
- Deployments where **storage efficiency is relevant** without significantly compromising performance.
- Requires 4 or more nodes for full redundancy.


### Scheme: 4+1

- **Description:** In the _4+1 scheme_, data is divided into four fragments with one parity fragment, offering
optimal storage efficiency.
- **Redundancy Level:** Can tolerate the failure of **one** storage node.
- **Raw-to-Effective Ratio:** **125%**
- **Available Storage Capacity:** **80%**
- **Performance Considerations:** For writes of 16K or higher, **lower write amplification** compared to **2+1**, as data is distributed across more nodes. This typically results in similar or higher write IOPS. However, for 4-8K random writes, the write performance is typically worse than **2+1**. Write latency is somewhat similar to **2+1**. Read performance is similar to **2+1**, if local node affinity is disabled. With node affinity enabled, read performance is slightly worse (up to 13%). In a degraded state (one node offline / unavailable or failed disk), the performance is worse than with **2+1**. Recovery time to full redundancy from single disk error is slightly higher than with **2+1**.
- **Best Use Cases:**
- Deployments where **storage efficiency is a priority** without significantly compromising performance.
- Requires 6 or more nodes for full redundancy.

### Scheme: 1+2

- **Description:** In the _1+2 scheme_, data is replicated twice, effectively creating multiple copies of every data block.
- **Redundancy Level:** Can tolerate the failure of **two** storage nodes.
- **Raw-to-Effective Ratio:** **300%**
- **Available Storage Capacity:** **33.3%**
- **Performance Considerations:** Offers **fast recovery and high read performance** due to data replication, but write performance is lower than with **1+1** in all cases (~33%).
- **Best Use Cases:**
- Workloads requiring **high redundancy and minimal recovery time**.
- Applications where **performance is prioritized over storage efficiency**.
- Requires 4 or more nodes for full redundancy.

### Scheme: 2+2

- **Description:** In the _2+2 scheme_, data is divided into two fragments with two parity fragments, offering a great
balance between redundancy and storage efficiency.
- **Redundancy Level:** Can tolerate the failure of **two** storage nodes.
- **Raw-to-Effective Ratio:** **200%**
- **Available Storage Capacity:** **50%**
- **Performance Considerations:** Similar to **2+1**, but with higher write latencies and lower effective write IOPS due to higher write amplification.
- **Best Use Cases:**
- Deployments where **high redundancy and storage efficiency is important** without compromising redundancy.
- Applications that can tolerate slightly **higher recovery times** compared to **1+2**.
- Requires 6 or more nodes for full redundancy.

### Scheme: 4+2

- **Description:** In the _4+2 scheme_, data is divided into four fragments with two parity fragments, offering a great
balance between redundancy and storage efficiency.
- **Redundancy Level:** Can tolerate the failure of **two** storage nodes.
- **Raw-to-Effective Ratio:** **150%**
- **Available Storage Capacity:** **66.6%**
- **Performance Considerations:** Similar to **4+1**, but with higher write latencies and lower effective write IOPS due to higher write amplification.
- **Best Use Cases:**
- Deployments where **high redundancy and storage efficiency is a priority**.
- Requires 8 or more nodes in a cluster.
The table below gives an overview of the supported schemes:

- **FTT** (failures to tolerate) is the number of storage nodes that can fail with no data loss (`m`).
- **Data protection overhead** is the additional raw capacity stored on top of the usable data (e.g. 100% means the
cluster holds 2× the raw capacity for the usable amount; 0% means no protection).
- **IOPS r/w performance and latency** is a relative rating of read/write IOPS and latency.
- **Minimum nodes** is the number of storage nodes required for full redundancy.

| Schema | FTT | Data protection overhead | IOPS r/w performance & latency | Minimum nodes |
|--------|-----|--------------------------|--------------------------------|---------------|
| 1+0 | 0 | 0% | Very good | 1 |
| 1+1 | 1 | 100% | Excellent | 3 |
| 2+1 | 1 | 50% | Very good | 4 |
| 4+1 | 1 | 25% | Very good | 6 |
| 1+2 | 2 | 200% | Very good | 5 |
| 2+2 | 2 | 100% | Very good | 6 |
| 4+2 | 2 | 50% | Very good | 8 |

## Choosing the Scheme

Expand Down
Loading