Pod Label not visible in DCGM Exporter Metrics

_**Important Note:  NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case [here](https://enterprise-support.nvidia.com/s/create-case)**._

**Describe the bug**
The GPU Operator is the recommended install mode for DCGM Exporter however it doesn't seem to support enabling Pod Labels for metrics.

**To Reproduce**
Deploy GPU-Operator and enable DCGM Exporter with extra environment variables

- DCGM_EXPORTER_KUBERNETES_ENABLE_POD_LABELS
- DCGM_EXPORTER_KUBERNETES_ENABLE_POD_UID

**Expected behavior**
Deployment should create

- ClusterRole and ClusterRoleBinding to ServiceAccount used by dcgm export pods
- AutoMount Service Account Token to allow Pods read Kubernetes API
- Mount kubelet path as a volume by DCGM pods

**Environment (please provide the following information):**
 - GPU Operator Version: v23.6.1
 - OS: Rocky-Linux-8.10
 - Kernel Version: [e.g. 6.8.0-generic]
 - Container Runtime Version: containerd  v1.7.1
 - Kubernetes Distro and Version: K8s v1.24.12



**Information to [attach](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/)** (optional if deemed irrelevant)

 - [ ] kubernetes pods status: `kubectl get pods -n OPERATOR_NAMESPACE`
 - [ ] kubernetes daemonset status: `kubectl get ds -n OPERATOR_NAMESPACE`
 - [ ] If a pod/ds is in an error state or pending state `kubectl describe pod -n OPERATOR_NAMESPACE POD_NAME`
 - [ ] If a pod/ds is in an error state or pending state `kubectl logs -n OPERATOR_NAMESPACE POD_NAME --all-containers`
 - [ ] Output from running `nvidia-smi` from the driver container: `kubectl exec DRIVER_POD_NAME -n OPERATOR_NAMESPACE -c nvidia-driver-ctr -- nvidia-smi`
 - [ ] containerd logs `journalctl -u containerd > containerd.log`


Collecting full debug bundle (optional):

```
curl -o must-gather.sh -L https://raw.githubusercontent.com/NVIDIA/gpu-operator/main/hack/must-gather.sh
chmod +x must-gather.sh
./must-gather.sh
```
**NOTE**: please refer to the [must-gather](https://raw.githubusercontent.com/NVIDIA/gpu-operator/main/hack/must-gather.sh) script for debug data collected.

This bundle can be submitted to us via email: **operator_feedback@nvidia.com**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod Label not visible in DCGM Exporter Metrics #2009

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pod Label not visible in DCGM Exporter Metrics #2009

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions