Is your feature request related to a problem? Please describe.
After #286 closes, Worker Controller users will be able to use the KEDA temporal trigger for per-version scaling based on approximate backlog count using the KEDA temporal trigger. However, scaling workers based on backlog count (queue size) alone can cause premature scale down when queue size is zero at steady state. Task scheduling latency is lowest when backlog is zero, so zero backlog is a common target. To avoid scaling down prematurely, users need to combine backlog count with other metrics that indicate worker utilization so as to avoid scaling down when workers are in full use. One effective signal for this is the worker slot utilization metric, which is emitted locally by each worker.
HPA users are able to template the temporal_worker_build_id and temporal_worker_deployment_name tags into the HPA metrics query template via the matchLabels field which is compatible with any metrics provider that exposes the Kubernetes External Metrics API. The controller detects spec.metrics[*].external.metric.selector.matchLabels in any k8s resource with that field and [auto-injects] the relevant values.
Worker Controller users can therefore scale on backlog + slot utilization using HPA, but not with KEDA.
KEDA consumes metrics from various metrics providers using the "trigger" they define, such as:
The currently supported matchLabels injection pattern does not support per-version metrics filtering in KEDA, which means scaling on arbitrary cluster metrics such as slot utilization is not supported using KEDA for versioned worker, making KEDA compatibility with worker controller incomplete.
Describe the solution you'd like
Some templating format such that the above triggers (and other metrics triggers) can filter metrics by version. Could be restricted to the same variables already auto-injected to spec.metrics[*].external.metric.selector.matchLabels ({temporal_worker_deployment_name: <ns>_<wd-name>, temporal_worker_build_id: <buildID>, temporal_namespace: <temporal-ns>})
Additional context
Is your feature request related to a problem? Please describe.
After #286 closes, Worker Controller users will be able to use the KEDA
temporaltrigger for per-version scaling based on approximate backlog count using the KEDA temporal trigger. However, scaling workers based on backlog count (queue size) alone can cause premature scale down when queue size is zero at steady state. Task scheduling latency is lowest when backlog is zero, so zero backlog is a common target. To avoid scaling down prematurely, users need to combine backlog count with other metrics that indicate worker utilization so as to avoid scaling down when workers are in full use. One effective signal for this is the worker slot utilization metric, which is emitted locally by each worker.HPA users are able to template the
temporal_worker_build_idandtemporal_worker_deployment_nametags into the HPA metrics query template via thematchLabelsfield which is compatible with any metrics provider that exposes the Kubernetes External Metrics API. The controller detectsspec.metrics[*].external.metric.selector.matchLabelsin any k8s resource with that field and [auto-injects] the relevant values.Worker Controller users can therefore scale on backlog + slot utilization using HPA, but not with KEDA.
KEDA consumes metrics from various metrics providers using the "trigger" they define, such as:
The currently supported
matchLabelsinjection pattern does not support per-version metrics filtering in KEDA, which means scaling on arbitrary cluster metrics such as slot utilization is not supported using KEDA for versioned worker, making KEDA compatibility with worker controller incomplete.Describe the solution you'd like
Some templating format such that the above triggers (and other metrics triggers) can filter metrics by version. Could be restricted to the same variables already auto-injected to
spec.metrics[*].external.metric.selector.matchLabels({temporal_worker_deployment_name: <ns>_<wd-name>, temporal_worker_build_id: <buildID>, temporal_namespace: <temporal-ns>})Additional context