Skip to content

Latest commit

 

History

History
42 lines (28 loc) · 1.46 KB

File metadata and controls

42 lines (28 loc) · 1.46 KB

Run inference-perf as a Job in a Kubernetes cluster

This guide explains how to deploy inference-perf to a Kubernetes cluster as a job.

via Helm Chart

Refer to the guide in /deploy/inference-perf.

via Manual Deployment

Setup

inference-perf requires all config be configured in a single yaml file and passed via the -c flag. When deploying as a job the most straightforward way to pass this value is to create a ConfigMap and then mount the ConfigMap in the Job. Update the config.yml as needed then create the ConfigMap by running at the root of this repo:

kubectl create configmap inference-perf-config --from-file=config.yml

Optional: Create a Kubernetes Secret that contains the Hugging Face token:

Note: this step is required for gated models only

kubectl create secret generic hf-secret \
    --from-literal=hf_api_token=${HF_TOKEN} \
    --dry-run=client -o yaml | kubectl apply -f -

* For huggingface authentication, please refer to “Hugging Face Authentication” in the section Run locally

Instructions

Apply the job by running the following:

kubectl apply -f manifests.yaml

Viewing Results

Currently, inference-perf outputs benchmark results to standard output only. To view the results after the job completes, run:

kubectl wait --for=condition=complete job/inference-perf && kubectl logs jobs/inference-perf