Skip to content

Commit 28fa8aa

Browse files
committed
feat: document replica lag banning
1 parent e8a1ca9 commit 28fa8aa

File tree

5 files changed

+29
-5
lines changed

5 files changed

+29
-5
lines changed

docs/enterprise_edition/control_plane.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ PgDog transmits the following information to the control plane:
5959
| [Query statistics](insights/statistics.md) | Real-time statistics on each query executed through PgDog, like duration, idle-in-transaction time, and more. |
6060
| [Errors](insights/errors.md) | Recent errors encountered by clients, e.g. query syntax issues. |
6161
| [Query plans](insights/query_plans.md) | Output of `EXPLAIN` for slow and sampled queries, collected by PgDog in the background. |
62-
| [Configuration](configuration.md) | Current PgDog settings and database schema. |
62+
| Configuration | Current PgDog settings and database schema. |
6363

6464
#### High availability
6565

docs/enterprise_edition/insights/errors.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ The following information is available in the errors view:
3535

3636
## Configuration
3737

38-
Errors are collected automatically if query statistics are enabled. The in-memory view is periodically purged of old errors, configurable in [`pgdog.toml`](../configuration/pgdog.toml/general.md):
38+
Errors are collected automatically if query statistics are enabled. The in-memory view is periodically purged of old errors, configurable in [`pgdog.toml`](../../configuration/pgdog.toml/general.md):
3939

4040
```toml
4141
[query_stats]

docs/enterprise_edition/insights/query_plans.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ The following information is available in this view:
4343

4444
### Configuration
4545

46-
Which queries are planned and how frequently is configurable in [`pgdog.toml`](../configuration/pgdog.toml/general.md):
46+
Which queries are planned and how frequently is configurable in [`pgdog.toml`](../../configuration/pgdog.toml/general.md):
4747

4848
```toml
4949
[query_stats]

docs/enterprise_edition/insights/statistics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ The following information is available in the query statistics view:
8282

8383
### Configuration
8484

85-
Query statistics collection can be enabled/disabled and tweaked via configuration in [`pgdog.toml`](../configuration/pgdog.toml/general.md):
85+
Query statistics collection can be enabled/disabled and tweaked via configuration in [`pgdog.toml`](../../configuration/pgdog.toml/general.md):
8686

8787
```toml
8888
[query_stats]

docs/features/load-balancer/replication-failover.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ icon: material/chart-timeline-variant
44

55
# Replication and failover
66

7-
PgDog has built-in functionality for monitoring the state of Postgres replica databases. If configured, it can also automatically detect when a replica is promoted and redirect write queries to the new primary.
7+
PgDog has built-in functionality for monitoring the state of Postgres replica databases. If configured, it can also automatically detect when a replica is promoted and redirect write queries to the new primary, and ban replicas from serving traffic if they have fallen far behind in the replication stream.
88

99
## Replication
1010

@@ -57,6 +57,30 @@ Decreasing the value of `lsn_check_interval` will produce more accurate statisti
5757

5858
It's common for PgDog deployments to be serving upwards of 30,000-50,000 queries per second per pooler process, so you can run the LSN check query quite frequently without noticeable impact on system latency.
5959

60+
### Replica lag ban
61+
62+
!!! note "Experimental feature"
63+
This feature is new and experimental. Please report any issues you encounter.
64+
65+
If a replica has fallen far behind the primary, it may start serving stale data to the application. This can cause hard to debug issues, so it's often best to remove this replica from the load balancer until it's able to catch up.
66+
67+
PgDog supports this with configurable banning thresholds:
68+
69+
```toml
70+
[general]
71+
ban_replica_lag = 60_000 # 1 minute
72+
ban_replica_lag_bytes = 25_000_000 # 25 MiB
73+
```
74+
75+
| Setting | Description |
76+
|-|-|
77+
| `ban_replica_lag` | How far behind in [transaction time](#replication-lag) (ms) can the replica fall before it gets removed from the load balancer. |
78+
| `ban_replica_lag_bytes` | Same as above, except the lag is measured in bytes. |
79+
80+
The load balancer will use the lowest threshold to make its determination. By default, both settings are set to the maximum integer value, effectively **disabling** this feature.
81+
82+
Unlike [health check-triggered](healthchecks.md) bans, replica lag ban is not cleared after hitting the [`ban_timeout`](../../configuration/pgdog.toml/general.md#ban_timeout) threshold and will remain in place until the replica lag falls below the configured amount(s). Incidentally, by removing traffic from the replica database, it has a better chance of catching up to the primary, since most causes of replication lag in PostgreSQL are related to query load.
83+
6084
## Failover
6185

6286
<center>

0 commit comments

Comments
 (0)