-
Notifications
You must be signed in to change notification settings - Fork 43
Add docs recommending autoscaling setup #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
5c252da
fbf4c57
16d5692
26e953b
4f72016
718471b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,172 @@ | ||
| # Scaling Recommendations | ||
|
|
||
| This document describes practical reactivity and reliability tradeoffs when scaling Temporal workers per worker deployment version on Kubernetes, and recommends which tool fits which workload pattern. | ||
|
|
||
| The `internal/demo/` example wires the HPA path described here. The KEDA path is mentioned for comparison and as a recommendation for workloads that cannot tolerate the HPA path's limits. | ||
|
|
||
| ## TL;DR | ||
|
|
||
| We recommend choosing a scaler approach that aligns with the workload pattern your application exhibits. | ||
|
|
||
| | Workload pattern | Recommendation | | ||
| |------------------|----------------| | ||
| | Continuous traffic (task queue always loaded) | HPA | | ||
| | Idle periods >5 min between work OR needs scale-from-zero | KEDA Temporal scaler | | ||
| | Required reactivity < ~60 s from first backlog | KEDA Temporal scaler | | ||
| | Required reactivity ~90 s typical, tolerant of occasional multi-minute stalls | HPA + prometheus-adapter | | ||
| | 1000s of task queues and worker deployment versions | HPA + prometheus-adapter | | ||
|
|
||
| ## HPA scaling signal | ||
|
|
||
| This section describes the signal used by HPA + prometheus adapter to adjust the count of workers in a Kubernetes deployment managed by Temporal Worker Controller. | ||
|
|
||
| There are two metric data points that are scraped by HPA + prometheus adapter. | ||
|
|
||
| `temporal_cloud_v1_approximate_backlog_count` (or just "backlog") is a measurement of the number of pending tasks on a particular task queue that are waiting for a poller (a worker) to pull that task and process it. This is a metric provided by [Temporal Cloud's OpenMetrics aggregation service][tc-openmetrics]. | ||
|
|
||
| `temporal_slot_utilization` (or just "slot util") is emitted directly by Workers (no Temporal Cloud aggregation), scraped at the Prometheus `ServiceMonitor` interval (~10–30 s), and reflects the current state of a particular Worker. This metric rises *before* backlog accumulates. In other words, slots on the Worker saturate first, then queueing starts. | ||
|
|
||
| For a continuously-loaded task queue, important events from "backlog appears" to "HPA scales up" can be visualized like so: | ||
|
|
||
| ``` | ||
| backlog appears at T0 | ||
| └─ Temporal Cloud OpenMetrics emission cadence + ~60s worst-case (~1 sample/minute) | ||
| └─ Prometheus scrape interval + ~10s | ||
| └─ HPA poll interval + ~15s | ||
| └─ scale-up stabilization window + taken from HPA configuration | ||
| └─ first replica added | ||
| ``` | ||
| [tc-openmetrics]: https://docs.temporal.io/cloud/metrics/openmetrics | ||
|
|
||
| ## HPA strengths | ||
|
|
||
| Because HPA uses a single OpenMetrics scrape to gather all series for the namespace in a single HTTP request, the HPA approach scales independently of namespace count. The single HTTP request for OpenMetrics more efficient than KEDA's Temporal API-based approach, and will not run into Temporal API rate limiting problems (see section below on [KEDA limitations](#keda-limitations)). | ||
|
|
||
| HPA + prometheus adapter configured to look at both slot util and backlog provides fast scale-up via slot util and a backlog-driven backstop to prevent overly reactive replica count adjustment. | ||
|
|
||
| ## HPA limitations | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is great! was wondering - should we also mention that our OM endpoint inherently has a 3 minute time lag, which would mean that at time=0, we are seeing a one minute aggregate of time.now() - 3 minutes |
||
|
|
||
| This section describes two known limitations for HPA + prometheus adapter. | ||
|
|
||
| Temporal Cloud's OpenMetrics endpoint may sometimes return the same embedded timestamps on repeated scrapes for each series across the account simultaneously — backlog series, action counts, error counts, every queue, every namespace. This delay in returning fresh metrics data can impact the speed to which HPA + prometheus adapter scales out or in the replica count for a worker deployment version. This means that HPA + prometheus adapter may not be a good solution if your workload cannot tolerate occasional multi-minute scaling pauses. | ||
|
|
||
| > **Note**: This is why `metricsRelistInterval: 5m` is the recommended setting: the discovery window must comfortably exceed the longest expected delay so the metric does not deregister, otherwise re-registration waits up to one more relist cycle after delivery resumes. | ||
|
|
||
|
carlydf marked this conversation as resolved.
|
||
| HPA cannot scale your Worker Deployment from zero because the signal for scaling does not yet exist. The signal for scaling is the backlog metric for the task queue associated with the workers in the Worker Deployment. This metric will not exist until there is at least one worker polling the task queue. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i took about a second to understand what this really meant - at first, I thought this meant that there won't be a backlog metric emitted if you don't have workers running at all (which is not true since you do have this metric being emitted for the unversioned world without workers being present) I know you have clearly mentioned versions in the preamble here, but do you think we can be extra clear and mention the backlog count per version is not emitted without a worker being present since that is what creates a version in temporal? |
||
|
|
||
| In addition to the "first worker start" problem, for customers using Temporal Cloud, if there are no polling workers for a task queue for more than 5 minutes, Temporal Cloud will unload the task queue from memory. Unloaded task queues do not emit metrics, and therefore the signal that HPA uses to scale up will not be present. | ||
|
|
||
| Submitting a workflow does load the task queue back into memory, but the metric still won't reach the HPA until the next OpenMetrics emission cycle (~1 minute). By the time the HPA reacts, you've already had ~1+ minute of unprovisioned work. | ||
|
|
||
| ## KEDA strengths | ||
|
|
||
| KEDA's Temporal scaler calls `DescribeTaskQueue(stats=true)` (or `DescribeWorkerDeploymentVersion`), which loads the queue synchronously and returns the backlog directly. This allows KEDA to scale Temporal workers from zero. | ||
|
|
||
| ## KEDA limitations | ||
|
|
||
| KEDA bypasses the metric pipeline but uses Temporal API calls, which are subject to a per-namespace rate limit: | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ``` | ||
| FrontendGlobalWorkerDeploymentReadRPS = 50 # per namespace, evenly distributed across frontend instances | ||
| ``` | ||
|
|
||
| For a namespace with N task queues × M worker-deployment-versions = K HPAs, each KEDA poll uses ~1 API call. The polling budget: | ||
|
|
||
| | HPA count | Poll every 30s | Poll every 10s | Poll every 5s | | ||
| |-----------|----------------|----------------|---------------| | ||
| | 50 | 1.7 RPS (3%) | 5 RPS (10%) | 10 RPS (20%) | | ||
| | 250 | 8 RPS (17%) | 25 RPS (50%) | 50 RPS (100%) | | ||
| | 1500 | 50 RPS (100%) | exceeds limit | exceeds limit | | ||
|
|
||
|
|
||
| If you are using KEDA with Temporal Cloud and hitting the API rate limit described above, you will need to contact your Temporal Cloud account team to discuss increasing the rate limits. | ||
|
|
||
| ## Recommended configuration for the HPA + prometheus-adapter path | ||
|
|
||
| This demo's configuration represents the recommendation, in compact form: | ||
|
|
||
| **Scrape config** (`internal/demo/k8s/prometheus-stack-values.yaml`): | ||
| ```yaml | ||
| - job_name: temporal_cloud | ||
| scrape_interval: 10s | ||
| honor_timestamps: true | ||
| metrics_path: /v1/metrics | ||
| params: | ||
| labels: | ||
| - temporal_worker_deployment_name | ||
| - temporal_worker_build_id | ||
| ``` | ||
|
|
||
| **prometheus-adapter rule** (`internal/demo/k8s/prometheus-adapter-values.yaml`): | ||
| ```yaml | ||
| metricsRelistInterval: 5m # must accommodate Cloud's ~3-min embedded-timestamp lag | ||
| rules: | ||
| external: | ||
| - seriesQuery: 'temporal_cloud_v1_approximate_backlog_count{temporal_worker_build_id!="__unversioned__"}' | ||
| metricsQuery: 'sum(<<.Series>>{<<.LabelMatchers>>})' | ||
| name: | ||
| as: "temporal_cloud_v1_approximate_backlog_count" | ||
| resources: | ||
| namespaced: false | ||
| ``` | ||
|
|
||
| The `seriesQuery` filter excludes `__unversioned__` series. Without it, accounts with many unversioned namespaces produce 5000+ series in the discovery response, which slows or breaks adapter discovery. The filter scopes discovery to versioned workloads. | ||
|
|
||
| **HPA template** (`examples/wrt-hpa-backlog.yaml`): two metrics — slot utilization (fast leading signal, scale-up gate) and backlog count (confirming signal, AverageValue target). | ||
|
|
||
| ```yaml | ||
| apiVersion: autoscaling/v2 | ||
| kind: HorizontalPodAutoscaler | ||
| spec: | ||
| scaleTargetRef: {} | ||
| minReplicas: 1 | ||
| maxReplicas: 30 | ||
| metrics: | ||
| - type: External | ||
| external: | ||
| metric: | ||
| name: temporal_slot_utilization | ||
| selector: | ||
| matchLabels: | ||
| worker_type: "ActivityWorker" | ||
| target: | ||
| type: Value | ||
| value: "750m" | ||
|
|
||
| - type: External | ||
| external: | ||
| metric: | ||
| name: temporal_cloud_v1_approximate_backlog_count | ||
| selector: | ||
| matchLabels: | ||
| temporal_task_queue: "default_helloworld" | ||
| task_type: "Activity" | ||
| target: | ||
| type: AverageValue | ||
| averageValue: "1" | ||
| behavior: | ||
| scaleUp: | ||
| stabilizationWindowSeconds: 30 | ||
| policies: | ||
| - type: Percent | ||
| value: 10 | ||
| periodSeconds: 10 | ||
| selectPolicy: Max | ||
|
|
||
| scaleDown: | ||
| stabilizationWindowSeconds: 120 | ||
| policies: | ||
| - type: Percent | ||
| value: 10 | ||
| periodSeconds: 10 | ||
| selectPolicy: Max | ||
| ``` | ||
|
|
||
| ## References | ||
|
|
||
| - [Temporal Cloud OpenMetrics](https://docs.temporal.io/cloud/metrics/openmetrics) — endpoint and opt-in labels | ||
| - [prometheus-adapter README](https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/README.md) — `metrics-relist-interval` and discovery window semantics | ||
| - [prometheus-adapter externalmetrics.md](https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/externalmetrics.md) — external rules, `namespaced: false` for cluster-scoped metrics | ||
| - [Prometheus HTTP API: `/api/v1/series`](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers) — series discovery semantics | ||
| - [Prometheus scrape config: `honor_timestamps`](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) — preserving source timestamps | ||
| - [KEDA Temporal scaler](https://keda.sh/docs/latest/scalers/temporal/) — direct API polling alternative | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend adding this new doc to the list here.