Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions parts/linux/cloud-init/artifacts/cse_config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1256,16 +1256,20 @@ configureManagedGPUExperience() {
if [ "${GPU_NODE}" != "true" ] || [ "${skip_nvidia_driver_install}" = "true" ]; then
return
fi
local managed_gpu_marker="/opt/azure/containers/managed-gpu-experience.enabled"
Comment thread
surajssd marked this conversation as resolved.
if [ "${ENABLE_MANAGED_GPU_EXPERIENCE}" = "true" ]; then
logs_to_events "AKS.CSE.installNvidiaManagedExpPkgFromCache" "installNvidiaManagedExpPkgFromCache" || exit $ERR_NVIDIA_DCGM_INSTALL
logs_to_events "AKS.CSE.startNvidiaManagedExpServices" "startNvidiaManagedExpServices" || exit $ERR_NVIDIA_DCGM_EXPORTER_FAIL
addKubeletNodeLabel "kubernetes.azure.com/dcgm-exporter=enabled"
mkdir -p "$(dirname "${managed_gpu_marker}")"
touch "${managed_gpu_marker}"
else
# EnableManagedGPUExperience is mutable, so services may have been
# installed on a previous CSE run. Stop them if they exist.
logs_to_events "AKS.CSE.stop.nvidia-device-plugin" "systemctlDisableAndStop nvidia-device-plugin"
logs_to_events "AKS.CSE.stop.nvidia-dcgm" "systemctlDisableAndStop nvidia-dcgm"
logs_to_events "AKS.CSE.stop.nvidia-dcgm-exporter" "systemctlDisableAndStop nvidia-dcgm-exporter"
rm -f "${managed_gpu_marker}"
fi
}

Expand Down
2 changes: 1 addition & 1 deletion pkg/agent/testdata/AKSUbuntu2204+China/CustomData

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/AKSUbuntu2204+CustomCloud/CustomData

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/AKSUbuntu2204+SSHStatusOff/CustomData

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/AzureLinuxV2+Kata/CustomData

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/AzureLinuxV3+Kata/CustomData

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/CustomizedImage/CustomData

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/CustomizedImageKata/CustomData

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/Flatcar+CustomCloud+USSec/CustomData

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/Flatcar+CustomCloud/CustomData

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/Flatcar+CustomCloud/CustomData.inner

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/Flatcar/CustomData

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/Flatcar/CustomData.inner

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/MarinerV2+CustomCloud/CustomData

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pkg/agent/testdata/MarinerV2+Kata/CustomData

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

20 changes: 20 additions & 0 deletions spec/parts/linux/cloud-init/artifacts/cse_config_spec.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1236,6 +1236,18 @@ providers:
fi
}

mkdir() {
echo "mkdir $@"
}

touch() {
echo "touch $@"
}

rm() {
echo "rm $@"
}

BeforeEach 'KUBELET_NODE_LABELS=""'

It 'should not enable managed GPU experience if not GPU node'
Expand All @@ -1246,6 +1258,8 @@ providers:
The output should not include "installNvidiaManagedExpPkgFromCache called"
The output should not include "startNvidiaManagedExpServices called"
The output should not include "addKubeletNodeLabel kubernetes.azure.com/dcgm-exporter=enabled"
The output should not include "touch /opt/azure/containers/managed-gpu-experience.enabled"
The output should not include "rm -f /opt/azure/containers/managed-gpu-experience.enabled"
Comment thread
surajssd marked this conversation as resolved.
End

It 'should not enable managed GPU experience when skip_nvidia_driver_install is true'
Expand All @@ -1258,6 +1272,8 @@ providers:
The output should not include "installNvidiaManagedExpPkgFromCache called"
The output should not include "startNvidiaManagedExpServices called"
The output should not include "addKubeletNodeLabel kubernetes.azure.com/dcgm-exporter=enabled"
The output should not include "touch /opt/azure/containers/managed-gpu-experience.enabled"
The output should not include "rm -f /opt/azure/containers/managed-gpu-experience.enabled"
Comment thread
surajssd marked this conversation as resolved.
End

It 'should not enable managed GPU experience when ENABLE_MANAGED_GPU_EXPERIENCE is unspecified'
Expand All @@ -1270,6 +1286,7 @@ providers:
The output should not include "installNvidiaManagedExpPkgFromCache called"
The output should not include "startNvidiaManagedExpServices called"
The output should not include "addKubeletNodeLabel kubernetes.azure.com/dcgm-exporter=enabled"
The output should include "rm -f /opt/azure/containers/managed-gpu-experience.enabled"
End

It 'should enable managed GPU experience when ENABLE_MANAGED_GPU_EXPERIENCE is true'
Expand All @@ -1283,6 +1300,8 @@ providers:
The output should include "startNvidiaManagedExpServices called"
The output should include "addKubeletNodeLabel kubernetes.azure.com/dcgm-exporter=enabled"
The variable KUBELET_NODE_LABELS should equal 'kubernetes.azure.com/dcgm-exporter=enabled'
The output should include "mkdir -p /opt/azure/containers"
The output should include "touch /opt/azure/containers/managed-gpu-experience.enabled"
End

It 'should disable managed GPU experience when ENABLE_MANAGED_GPU_EXPERIENCE is false'
Expand All @@ -1296,6 +1315,7 @@ providers:
The output should include "systemctlDisableAndStop nvidia-dcgm"
The output should include "systemctlDisableAndStop nvidia-dcgm-exporter"
The output should not include "addKubeletNodeLabel kubernetes.azure.com/dcgm-exporter=enabled"
The output should include "rm -f /opt/azure/containers/managed-gpu-experience.enabled"
End
End
End
Loading