Skip to content

Commit e8f9db0

Browse files
committed
Add raft doc
1 parent 5fc0b6b commit e8f9db0

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed

docs/guides/raft_production.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Raft Implementation & Production Configuration
2+
3+
This guide details the Raft consensus implementation in `ev-node`, used for High Availability (HA) of the Sequencer/Aggregator. It is targeted at experienced DevOps and developers configuring production environments.
4+
5+
## Overview
6+
7+
`ev-node` uses the [HashiCorp Raft](https://github.com/hashicorp/raft) implementation to manage leader election and state replication when running in **Aggregator Mode**.
8+
9+
* **Role**: Ensures only one active Aggregator (Leader) produces blocks at a time.
10+
* **Failover**: Automatically elects a new leader if the current leader fails.
11+
* **Safety**: Synchronizes the block production state to prevent double-signing or fork divergence.
12+
13+
### Architecture
14+
15+
* **Transport**: TCP-based transport for inter-node communication.
16+
* **Storage**: [BoltDB](https://github.com/etcd-io/bbolt) is used for both the Raft Log (`raft-log.db`) and Stable Store (`raft-stable.db`). Snapshots are stored as files.
17+
* **FSM (Finite State Machine)**: The State Machine applies `RaftBlockState` (Protobuf) containing the latest block height, hash, and timestamp.
18+
* **Safety Checks**:
19+
* **Startup**: Nodes check for divergence between local block store and Raft state.
20+
* **Leadership Transfer**: Before becoming leader, a node waits for its FSM to catch up (`waitForMsgsLanded`) to prevent proposing blocks from a stale state.
21+
* **Shutdown**: The leader attempts to transfer leadership gracefully before shutting down to minimize downtime.
22+
23+
## Configuration
24+
25+
Raft is configured via CLI flags or the `config.toml` file under the `[raft]` (or `[rollkit.raft]`) section.
26+
27+
### Essential Flags
28+
29+
| Flag | Config Key | Description | Production Value |
30+
|------|------------|-------------|------------------|
31+
| `--evnode.raft.enable` | `raft.enable` | Enable Raft consensus. | `true` |
32+
| `--evnode.raft.node_id` | `raft.node_id` | **Unique** identifier for the node. | e.g., `node-01` |
33+
| `--evnode.raft.raft_addr` | `raft.raft_addr` | TCP address for Raft transport. | `0.0.0.0:5001` (Bind to private IP) |
34+
| `--evnode.raft.raft_dir` | `raft.raft_dir` | Directory for Raft data. | `/data/raft` (Must be persistent) |
35+
| `--evnode.raft.peers` | `raft.peers` | Comma-separated list of peer addresses in format `nodeID@host:port`. | `node-1@10.0.0.1:5001,node-2@10.0.0.2:5001,node-3@10.0.0.3:5001` |
36+
| `--evnode.raft.bootstrap` | `raft.bootstrap` | Bootstrap the cluster. **Required** for initial setup. | `true` (See Limitations) |
37+
38+
### Timeout Tuning
39+
40+
Raft timeouts should be tuned relative to your **Block Time** (`--evnode.node.block_time`) to utilize the fast failover capabilities without causing instability.
41+
42+
| Flag | Default | Recommended Tuning |
43+
|------|---------|--------------------|
44+
| `--evnode.raft.heartbeat_timeout` | `1s` | **10-30% of Leader Lease**. For sub-second block times, lower to `50ms-100ms`. |
45+
| `--evnode.raft.leader_lease_timeout` | `500ms` | **Must be < Election Timeout**. Use `500ms` for 1s block times. For slower chains (e.g., 10s blocks), increase to `1s-2s` to tolerate network jitter. |
46+
| `--evnode.raft.send_timeout` | `1s` | Should be `> 2x RTT`. |
47+
48+
**Relation to Block Time**:
49+
Ideally, a failover should complete within `2 * BlockTime` to minimize user impact.
50+
* **Fast Chain (BlockTime < 1s)**: Tighten timeouts. Heartbeat `50ms`, Lease `250ms`.
51+
* **Standard Chain (BlockTime = 1s)**: Heartbeat `100ms`, Lease `500ms`.
52+
* **Slow Chain (BlockTime > 5s)**: Defaults are usually sufficient (`1s` heartbeat).
53+
54+
> **Warning**: Setting timeouts too low (< RTT + Jitter) will cause leadership flapping and halted block production.
55+
56+
## Production Deployment Principles
57+
58+
### 1. Static Peering & Bootstrap
59+
Current implementation requires **Bootstrap Mode** (`--evnode.raft.bootstrap=true`) for all nodes participating in the cluster initialization.
60+
* **All nodes** should list the full set of peers in `--evnode.raft.peers`.
61+
* The `peers` list format is strict: `NodeID@Host:Port`.
62+
* **Limitation**: Dynamic addition of peers (Run-time Membership Changes) via RPC/CLI is not currently exposed. The cluster membership is static based on the initial bootstrap configuration.
63+
64+
### 2. Infrastructure Requirements
65+
* **Encrypted Network (CRITICAL)**: Raft traffic is **unencrypted** (plain TCP). You **MUST** run the cluster inside a private network, VPN, or encrypted mesh (e.g., WireGuard, Tailscale). **Never expose Raft ports to the public internet**; doing so allows attackers to hijack the cluster consensus.
66+
* **Cluster Size**: Run an **odd number** of nodes (3 or 5) to tolerate failures (3 nodes tolerate 1 failure; 5 nodes tolerate 2).
67+
* **Storage**: The `--evnode.raft.raft_dir` **MUST** be mounted on persistent storage. Loss of this directory will cause the node to lose its identity and commit history, effectively removing it from the cluster.
68+
* **Network**: Raft requires low-latency, reliable connectivity. Ensure firewall rules allow TCP traffic on `raft_addr`.
69+
70+
### 3. P2P Interaction & Catch-Up
71+
Raft and P2P work in parallel to ensure reliability:
72+
* **Hot Replication (Raft)**: New blocks produced by the leader are replicated via the Raft transport (Header + Data) to all followers. This ensures low-latency propagation of the chain tip.
73+
* **Catch-Up (P2P)**: If a node falls behind (e.g., disconnected for longer than the Raft log retention), it will receive a **Raft Snapshot** to update its consensus state to the latest head. However, the *historical blocks* between its local state and the new head are fetched via the **P2P Network** (or DA).
74+
* **Implication**: You must ensure P2P connectivity (`--p2p.listen_address` and `--p2p.peers`) is configured even for Raft nodes, to allow them to backfill missing data from peers.
75+
76+
### 4. Lifecycle Management
77+
* **Rolling Restarts**: You can restart nodes one by one. The `ev-node` implementation handles graceful shutdown (leadership transfer) to minimize impact.
78+
* **State Divergence**: If a node falls too far behind or its local store conflicts with Raft (e.g., due to catastrophic disk failure), it may panic on startup to protect safety. In such cases, a manual extensive recovery (wiping state and re-syncing) may be required.
79+
80+
### 4. Monitoring
81+
Monitor the following metrics (propagated via Prometheus if enabled):
82+
* **Leadership Changes**: Frequent changes indicate network instability or overloaded nodes.
83+
* **Applied Index vs Commit Index**: A growing lag indicates the FSM cannot keep up.
84+
85+
## Example Command
86+
87+
```bash
88+
./ev-node start \
89+
--node.aggregator \
90+
--raft.enable \
91+
--raft.node_id="node-1" \
92+
--raft.raft_addr="0.0.0.0:5001" \
93+
--raft.raft_dir="/var/lib/ev-node/raft" \
94+
--raft.bootstrap=true \
95+
--raft.peers="node-1@10.0.1.1:5001,node-2@10.0.1.2:5001,node-3@10.0.1.3:5001" \
96+
--p2p.listen_address="/ip4/0.0.0.0/tcp/26656" \
97+
...other flags
98+
```

0 commit comments

Comments
 (0)