This repository includes various Temporal Server operational artifacts.
OSS Temporal server dynamic config reference, dynamic config YAML samples, and troubleshooting info.
OSS Temporal server metrics references.
- Server Dashboards — Grafana dashboards for monitoring a self-hosted Temporal Server cluster, including:
- Server Overview Dashboard — cluster health, throughput, persistence, and service metrics
- Standby Cluster Dashboard — replication health, lag, and failover readiness for standby clusters
- SDK Dashboards — Grafana dashboards for monitoring Temporal SDK clients and workers (Java, Go, TypeScript, Python, .NET, Ruby).
- Troubleshooting Dashboards — Grafana dashboards focused on troubleshooting specific Temporal operational issues.
- Server Alerts — Grafana alerting provisioning rules for a self-hosted Temporal Server cluster. Covers the essential alert set plus dual visibility store alerts. Each alert links to a runbook with diagnosis and recovery steps.
Production-ready operational runbooks for self-hosted Temporal clusters. Each runbook has been tested against a real cluster and cross-references the specific dashboard panels and alert rules that surface its signals.
Content graduates here from tmp/ once it meets the full criteria: documented, dashboard panels in place, alerts wired up, and failure scenarios tested live.
| Runbook | Covers |
|---|---|
| Dual Visibility | Primary failure, secondary failure, both stores fail — detection via visibility_persistence_* metrics, recovery steps, write mode management. SQL-backed dual visibility only — Elasticsearch coverage TBD. |