Skip to content

test: add NPD GPU clock throttling validation to e2e tests#7919

Open
ganeshkumarashok wants to merge 2 commits intomainfrom
aganeshkumar/gpu-clock-throttling-e2e
Open

test: add NPD GPU clock throttling validation to e2e tests#7919
ganeshkumarashok wants to merge 2 commits intomainfrom
aganeshkumar/gpu-clock-throttling-e2e

Conversation

@ganeshkumarashok
Copy link
Contributor

This PR adds validation for GPU clock throttling NPD condition in the e2e GPU NPD scenario tests. The new validation ensures that
NPD is correctly detecting and reporting the absence of problematic GPU clock throttling on GPU-enabled nodes.

Changes:

  • Added ValidateNPDGPUClockThrottlingCondition function to validate the GPUClockThrottling NPD condition
  • Integrated the new validation into the runScenarioGPUNPD test flow

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Add validation for GPU clock throttling NPD condition in the GPU NPD scenario tests.
This ensures that NPD is correctly detecting and reporting the absence of problematic
GPU clock throttling on GPU-enabled nodes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds e2e test validation for the NPD (Node Problem Detector) GPU clock throttling condition. The change integrates a new validation function into the existing GPU NPD test scenario to ensure that NPD correctly detects and reports the absence of problematic GPU clock throttling on GPU-enabled nodes.

Changes:

  • Added ValidateNPDGPUClockThrottlingCondition validator function to check NPD's GPUClockThrottling condition
  • Integrated the new validation into the runScenarioGPUNPD test flow

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
e2e/validators.go Adds new ValidateNPDGPUClockThrottlingCondition function that validates NPD reports no problematic GPU clock throttling (ConditionFalse with reason "GPUClockThrottlingIsNotPresent")
e2e/test_helpers.go Integrates GPU clock throttling validation into the GPU NPD test scenario between GPU count validation and IB link flapping validation

Comment on lines +1154 to +1160
// ValidateNPDGPUClockThrottlingCondition validates that NPD is reporting no problematic GPU clock throttling
func ValidateNPDGPUClockThrottlingCondition(ctx context.Context, s *Scenario) {
s.T.Helper()
// Validate that NPD is reporting no problematic GPU clock throttling
validateNPDCondition(ctx, s, "GPUClockThrottling", "GPUClockThrottlingIsNotPresent", corev1.ConditionFalse,
"No problematic GPU clock throttling detected", "expected GPUClockThrottling message to indicate no throttling")
}
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NPD configuration file validation before checking the condition. All other NPD validators in this file follow the pattern of first validating that the NPD plugin configuration file exists before checking the condition status.

For consistency with other NPD validators (ValidateNPDUnhealthyNvidiaDevicePlugin, ValidateNPDUnhealthyNvidiaDCGMServices, ValidateNPDHealthyNvidiaGridLicenseStatus, ValidateNPDGPUCountPlugin), this function should first verify that the GPU clock throttling NPD plugin configuration file exists at /etc/node-problem-detector.d/custom-plugin-monitor/gpu_checks/ before attempting to validate the condition.

Consider splitting this into two functions:

  1. ValidateNPDGPUClockThrottlingPlugin - to check configuration file exists
  2. ValidateNPDGPUClockThrottlingCondition - to validate the condition

Then call ValidateNPDGPUClockThrottlingPlugin first in the test flow.

Copilot uses AI. Check for mistakes.
Replace hardcoded "gzip" string literals with encodingGzip constant
to avoid goconst linter error about repeated string occurrences.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants