diff --git a/docs/deployments/deployment-preparation/erasure-coding-scheme.md b/docs/deployments/deployment-preparation/erasure-coding-scheme.md index 0ac1697a..e58cc586 100644 --- a/docs/deployments/deployment-preparation/erasure-coding-scheme.md +++ b/docs/deployments/deployment-preparation/erasure-coding-scheme.md @@ -6,7 +6,7 @@ weight: 30100 Choosing the appropriate **erasure coding scheme** is crucial when deploying a simplyblock storage cluster, as it directly impacts **data redundancy, storage efficiency, and overall system performance**. Simplyblock currently supports -the following erasure coding schemes: **1+1**, **2+1**, **4+1**, **1+2**, **2+2**, and **4+2**. Understanding the +the following erasure coding schemes: **1+0**, **1+1**, **2+1**, **4+1**, **1+2**, **2+2**, and **4+2**. Understanding the trade-offs between redundancy and storage utilization will help determine the best option for your workload. All schemas have been performance-optimized by specialized algorithms. There is, however, a remaining capacity-to-performance trade-off. @@ -22,79 +22,23 @@ allowing data recovery in case of hardware failures. The notation **k+m** repres If you need more information on erasure coding, see the dedicated concept page for [erasure coding](../../architecture/concepts/erasure-coding.md). -### Scheme: 1+1 - -- **Description:** In the _1+1 scheme_, data is mirrored, effectively creating an exact copy of every data block. -- **Redundancy Level:** Can tolerate the failure of **one** storage node. -- **Raw-to-Effective Ratio:** **200%** -- **Available Storage Capacity:** **50%** -- **Performance Considerations:** Offers **fast recovery and high read performance** due to data mirroring. -- **Best Use Cases:** - - Workloads requiring **high availability and minimal recovery time**. - - Applications where **performance is prioritized over storage efficiency**. - - Requires 3 or more nodes for full redundancy. - -### Scheme: 2+1 - -- **Description:** In the _2+1 scheme_, data is divided into two fragments with one parity fragment, offering a - balance between performance and storage efficiency. -- **Redundancy Level:** Can tolerate the failure of **one** storage node. -- **Raw-to-Effective Ratio:** **150%** -- **Available Storage Capacity:** **66.6%** -- **Performance Considerations:** For writes of 8K or higher, **lower write amplification** compared to **1+1**, as data is distributed across multiple nodes. This typically results in similar or higher IOPS. However, for small random writes (4K), the write performance is worse than **1+1**. Write latency is somewhat higher than with **1+1**. Read performance is similar to **1+1**, if local node affinity is disabled. With node affinity enabled, read performance is slightly worse (up to 25%). In a degraded state (one node offline / unavailable or failed disk), the performance is worse than with **1+1**. Recovery time to full redundancy from single disk error is slightly higher than with **1+1**. -- **Best Use Cases:** - - Deployments where **storage efficiency is relevant** without significantly compromising performance. - - Requires 4 or more nodes for full redundancy. - - -### Scheme: 4+1 - -- **Description:** In the _4+1 scheme_, data is divided into four fragments with one parity fragment, offering - optimal storage efficiency. -- **Redundancy Level:** Can tolerate the failure of **one** storage node. -- **Raw-to-Effective Ratio:** **125%** -- **Available Storage Capacity:** **80%** -- **Performance Considerations:** For writes of 16K or higher, **lower write amplification** compared to **2+1**, as data is distributed across more nodes. This typically results in similar or higher write IOPS. However, for 4-8K random writes, the write performance is typically worse than **2+1**. Write latency is somewhat similar to **2+1**. Read performance is similar to **2+1**, if local node affinity is disabled. With node affinity enabled, read performance is slightly worse (up to 13%). In a degraded state (one node offline / unavailable or failed disk), the performance is worse than with **2+1**. Recovery time to full redundancy from single disk error is slightly higher than with **2+1**. -- **Best Use Cases:** - - Deployments where **storage efficiency is a priority** without significantly compromising performance. - - Requires 6 or more nodes for full redundancy. - -### Scheme: 1+2 - -- **Description:** In the _1+2 scheme_, data is replicated twice, effectively creating multiple copies of every data block. -- **Redundancy Level:** Can tolerate the failure of **two** storage nodes. -- **Raw-to-Effective Ratio:** **300%** -- **Available Storage Capacity:** **33.3%** -- **Performance Considerations:** Offers **fast recovery and high read performance** due to data replication, but write performance is lower than with **1+1** in all cases (~33%). -- **Best Use Cases:** - - Workloads requiring **high redundancy and minimal recovery time**. - - Applications where **performance is prioritized over storage efficiency**. - - Requires 4 or more nodes for full redundancy. - -### Scheme: 2+2 - -- **Description:** In the _2+2 scheme_, data is divided into two fragments with two parity fragments, offering a great - balance between redundancy and storage efficiency. -- **Redundancy Level:** Can tolerate the failure of **two** storage nodes. -- **Raw-to-Effective Ratio:** **200%** -- **Available Storage Capacity:** **50%** -- **Performance Considerations:** Similar to **2+1**, but with higher write latencies and lower effective write IOPS due to higher write amplification. -- **Best Use Cases:** - - Deployments where **high redundancy and storage efficiency is important** without compromising redundancy. - - Applications that can tolerate slightly **higher recovery times** compared to **1+2**. - - Requires 6 or more nodes for full redundancy. - -### Scheme: 4+2 - -- **Description:** In the _4+2 scheme_, data is divided into four fragments with two parity fragments, offering a great - balance between redundancy and storage efficiency. -- **Redundancy Level:** Can tolerate the failure of **two** storage nodes. -- **Raw-to-Effective Ratio:** **150%** -- **Available Storage Capacity:** **66.6%** -- **Performance Considerations:** Similar to **4+1**, but with higher write latencies and lower effective write IOPS due to higher write amplification. -- **Best Use Cases:** - - Deployments where **high redundancy and storage efficiency is a priority**. - - Requires 8 or more nodes in a cluster. +The table below gives an overview of the supported schemes: + +- **FTT** (failures to tolerate) is the number of storage nodes that can fail with no data loss (`m`). +- **Data protection overhead** is the additional raw capacity stored on top of the usable data (e.g. 100% means the + cluster holds 2× the raw capacity for the usable amount; 0% means no protection). +- **IOPS r/w performance and latency** is a relative rating of read/write IOPS and latency. +- **Minimum nodes** is the number of storage nodes required for full redundancy. + +| Schema | FTT | Data protection overhead | IOPS r/w performance & latency | Minimum nodes | +|--------|-----|--------------------------|--------------------------------|---------------| +| 1+0 | 0 | 0% | Very good | 1 | +| 1+1 | 1 | 100% | Excellent | 3 | +| 2+1 | 1 | 50% | Very good | 4 | +| 4+1 | 1 | 25% | Very good | 6 | +| 1+2 | 2 | 200% | Very good | 5 | +| 2+2 | 2 | 100% | Very good | 6 | +| 4+2 | 2 | 50% | Very good | 8 | ## Choosing the Scheme