Skip to content

Latest commit

 

History

History
165 lines (135 loc) · 5 KB

File metadata and controls

165 lines (135 loc) · 5 KB
copyright
years
2014, 2025
lastupdated 2025-11-07
keywords openshift, {{site.data.keyword.openshiftlong_notm}}, kubernetes, node scaling, ca, autoscaler, gpu, hpa
subcollection openshift

{{site.data.keyword.attribute-definition-list}}

Setting up horizontal pod autoscaling on GPU worker nodes

{: #cluster-scaling-hpa-gpu}

Review the following steps to enable horizontal pod autoscaling on your GPU worker nodes.

Why horizontal pod autoscaling? : You might want to configure horizontal pod autoscaling to scale the number of pods when the workload consumes more or less than a certain amount of GPU. Because GPUs are an expensive resource you might not want workloads to be run at the fullest capacity long periods of time. Instead, you can scale pods or up down based on the running workload in the cluster.

Prerequisites

{: #hpa-gpu-prereqs}

To configure HPA, the following components must be installed on your cluster.

  • NVIDIA Data Center GPU Manager (DCGM) exporter to gather GPU metrics in Kubernetes. The DCGM exporter exposes GPU metrics for Prometheus which can be visualized using Grafana.
  • Prometheus and the Prometheus adapter to generate custom metrics.
  1. Install the NVIDIA GPU Operator{: external}

  2. Install Prometheus.

    helm install prom-stack prometheus-community/kube-prometheus-stack -f ~/ca-prom-val.yaml

    {: pre}

    cat ~/ca-prom-val.yaml

    {: pre}

    prometheus:
        prometheusSpec:
            additionalScrapeConfigs:
            - job_name: gpu-metrics
                scrape_interval: 1s
                metrics_path: /metrics
                scheme: http
                kubernetes_sd_configs:
                - role: endpoints
                    namespaces:
                        names:
                        - nvidia-gpu-operator
                relabel_configs:
                - source_labels: [__meta_kubernetes_endpoints_name]
                    action: drop
                    regex: .*-node-feature-discovery-master
                - source_labels: [__meta_kubernetes_pod_node_name]
                    action: replace
                    target_label: kubernetes_node

    {: screen}

  3. Get the Prometheus service details.

    oc get svc

    {: pre}

  4. Install the Prometheus adapter.

    helm upgrade --install prometheus-adapter prometheus-community/prometheus-adapter --set prometheus.url="http://prom-stack-kube-prometheus-prometheus.default.svc.cluster.local"

    {: pre}

Setting up HPA

{: #hpa-gpu-setup}

Complete the following steps to create a deployment that uses HPA.

  1. Create a deployment.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
        name: cuda-test
        labels:
            app: cuda-test
    spec:
        selector:
            matchLabels:
                app: cuda-test
        template:
            metadata:
                labels:
                    app: cuda-test
            spec:
                containers:
                - name: cuda-test-main
                    image: "k8s.gcr.io/cuda-vector-add:v0.1"
                    command: ["bash", "-c", "for (( c=1; c<=5000; c++ )); do ./vectorAdd; done"]
                    resources:
                        limits:
                            nvidia.com/gpu: 1

    {: codeblock}

  2. Create a HorizontalPodAutoscaler resource.

    kind: HorizontalPodAutoscaler
    apiVersion: autoscaling/v2
    metadata:
        name: cuda-hpa
        namespace: default
    spec:
        scaleTargetRef:
            apiVersion: apps/v1
            kind: Deployment
            name: cuda-test
        minReplicas: 1
        maxReplicas: 3
        metrics:
            - type: Pods
                pods:
                    metric:
                        name: DCGM_FI_DEV_GPU_UTIL     #the metric you want to use for autoscaling
                    target:
                        type: AverageValue
                        averageValue: '5'

    {: codeblock}

  3. Run the following commands to review the results.

    oc get pods | grep cuda

    {: pre}

    cuda-test-d987464bf-brd48                                1/1     Running   0          4m19s
    cuda-test-d987464bf-gsx82                                0/1     Pending   0          4m19s
    cuda-test-d987464bf-zstzs                                1/1     Running   0          7m35s

    {: screen}

    There was 1 replica that was scaled up to 3 as the workload resource increased.

    Min replicas:       1
    Max replicas:       3
    Deployment pods:    3 current / 3 desired
    Events:
    Type    Reason             Age   From                       Message
    ----    ------             ----  ----                       -------
    Normal  SuccessfulRescale  50s   horizontal-pod-autoscaler  New size: 3; reason: pods metric DCGM_FI_DEV_GPU_UTIL above target

    {: screen}