Skip to content

Commit 0cd85a6

Browse files
docs: Document the new health endpoints (#414)
1 parent 18482d6 commit 0cd85a6

1 file changed

Lines changed: 180 additions & 2 deletions

File tree

docs/06-concepts/13-health-checks.md

Lines changed: 180 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,186 @@
11
# Health checks
22

3-
Serverpod automatically performs health checks while running. It measures CPU and memory usage and the response time to the database. The metrics are stored in the database every minute in the serverpod_health_metric and serverpod_health_connection_info tables. However, the best way to visualize the data is through Serverpod Insights, which gives you a graphical view.
3+
Serverpod provides a complete health check system that allows you to monitor the health of your server and your dependencies through Kubernetes-style HTTP endpoints (`/livez`, `/readyz`, `/startupz`) - each with a specific purpose that helps orchestrators (like Kubernetes) make informed decisions about container lifecycle and traffic routing.
44

5-
## Adding custom metrics
5+
## Endpoints
6+
7+
### Liveness Probe `/livez`
8+
9+
The liveness probe answers: "Should this container be killed and restarted?".
10+
11+
- Returns `200 OK` if the server process can respond.
12+
- Only fails if the process is fundamentally broken.
13+
- A failed liveness check triggers a pod restart.
14+
- Does not check dependencies (database, Redis, etc.).
15+
16+
This endpoint is intentionally permissive. It should only fail when the process is truly unrecoverable (deadlocks, memory corruption, infinite loops). Transient issues like slow database queries or temporary network blips should not trigger restarts.
17+
18+
**Example:**
19+
20+
```bash
21+
curl http://localhost:8080/livez
22+
# Returns: 200 OK
23+
```
24+
25+
### Readiness Probe `/readyz`
26+
27+
The readiness probe answers: "Should traffic be routed to this container?".
28+
29+
- Returns `200 OK` if all dependencies are healthy.
30+
- Returns `503 Service Unavailable` if any critical dependency is unavailable.
31+
- Checks database connectivity, Redis connectivity (if configured), and custom health indicators.
32+
33+
A failed readiness check stops traffic routing without restarting the pod. This allows the pod to recover from temporary issues without receiving extra pressure from new traffic.
34+
35+
**Example:**
36+
37+
```bash
38+
curl http://localhost:8080/readyz
39+
# Returns: 200 OK or 503 Service Unavailable
40+
```
41+
42+
### Startup Probe `/startupz`
43+
44+
The startup probe answers: "Has this container finished initializing?".
45+
46+
- Returns `200 OK` once server initialization (including migrations) is complete.
47+
- Prevents premature liveness/readiness checks during boot.
48+
- Kubernetes waits for this to pass before starting liveness/readiness probes.
49+
50+
This endpoint will determine when the pod is ready to receive traffic. While this endpoint is failing, the orchestrator will not route any traffic to the pod.
51+
52+
**Example:**
53+
54+
```bash
55+
curl http://localhost:8080/startupz
56+
# Returns: 200 OK once startup is complete
57+
```
58+
59+
## Response format
60+
61+
Health endpoints return JSON responses following the [RFC draft for Health Check Response Format](https://datatracker.ietf.org/doc/html/draft-inadarei-api-health-check-06).
62+
63+
- **Unauthenticated requests** receive only HTTP status codes (no body) for security.
64+
- **Authenticated requests** receive detailed JSON responses.
65+
66+
The format of the response is as follows:
67+
68+
```json
69+
{
70+
"status": "pass", // or "fail"
71+
"checks": {
72+
"database:connection": [ // The name of the check.
73+
{
74+
"componentId": "primary-db", // The ID of the component.
75+
"componentType": "datastore", // The type of the component.
76+
"status": "pass", // or "fail"
77+
"observedValue": 12, // Optional value of the check.
78+
"observedUnit": "ms", // Optional unit of the check.
79+
"output": "Connection normal", // Optional output of the check.
80+
"time": "2026-01-14T10:30:00Z" // The time of the check.
81+
}
82+
],
83+
"redis:latency": [
84+
{
85+
"componentId": "cache-cluster",
86+
"componentType": "datastore",
87+
"status": "pass",
88+
"observedValue": 3,
89+
"observedUnit": "ms",
90+
"time": "2026-01-14T10:30:00Z"
91+
}
92+
]
93+
}
94+
}
95+
```
96+
97+
## Built-in health indicators
98+
99+
Serverpod automatically registers health indicators based on your configuration:
100+
101+
- **ServerpodStartupIndicator** - Tracks server initialization completion.
102+
- **DatabaseHealthIndicator** - Checks PostgreSQL connectivity (if database is configured).
103+
- **RedisHealthIndicator** - Checks Redis connectivity (if Redis is enabled).
104+
105+
## Custom health indicators
106+
107+
You can add custom health indicators to check external services, microservices, or other dependencies.
108+
109+
### Creating a custom indicator
110+
111+
Create a class that extends `HealthIndicator`:
112+
113+
```dart
114+
import 'package:serverpod/serverpod.dart';
115+
116+
class StripeApiIndicator extends HealthIndicator<double> {
117+
@override
118+
String get name => 'stripe:api';
119+
120+
@override
121+
String get componentType => HealthComponentType.component.name;
122+
123+
@override
124+
String get observedUnit => 'ms';
125+
126+
@override
127+
Duration get timeout => const Duration(seconds: 3);
128+
129+
@override
130+
Future<HealthCheckResult> check() async {
131+
final stopwatch = Stopwatch()..start();
132+
try {
133+
// Perform your health check
134+
await stripeClient.ping();
135+
stopwatch.stop();
136+
137+
return pass(
138+
observedValue: stopwatch.elapsedMilliseconds.toDouble(),
139+
);
140+
} catch (e) {
141+
return fail(output: 'Stripe API unavailable: $e');
142+
}
143+
}
144+
}
145+
```
146+
147+
### Registering custom indicators
148+
149+
Register your indicators when creating the Serverpod instance:
150+
151+
```dart
152+
final pod = Serverpod(
153+
args,
154+
Protocol(),
155+
Endpoints(),
156+
healthConfig: HealthConfig(
157+
cacheTtl: Duration(seconds: 2),
158+
additionalReadinessIndicators: [
159+
StripeApiIndicator(),
160+
InventoryServiceIndicator(),
161+
],
162+
additionalStartupIndicators: [
163+
CacheWarmupIndicator(),
164+
],
165+
),
166+
);
167+
```
168+
169+
### Configuration options
170+
171+
The `HealthConfig` class provides the following options:
172+
173+
- **`cacheTtl`** - How long to cache health check results (default: 1 second). Prevents "thundering herd" during high-frequency probing.
174+
- **`additionalReadinessIndicators`** - Custom indicators checked by `/readyz`.
175+
- **`additionalStartupIndicators`** - Custom indicators checked by `/startupz`.
176+
177+
Each indicator can specify its own timeout via the `timeout` getter (default: 5 seconds). This prevents slow checks from blocking the entire health endpoint.
178+
179+
## Health metrics collection
180+
181+
Independently from the health check endpoints, Serverpod also collects health metrics about the server and its dependencies while running. Metrics like CPU, memory usage and response time to the database are stored in the database every minute in the `serverpod_health_metric` and `serverpod_health_connection_info` tables. Such metrics can be graphically visualized through Serverpod Insights.
182+
183+
### Adding custom metrics
6184

7185
Sometimes it is helpful to add custom health metrics. This can be for monitoring external services or internal processes within your Serverpod. To set up your custom metrics, you must create a `HealthCheckHandler` and register it with your Serverpod.
8186

0 commit comments

Comments
 (0)