Skip to content

Enable CUDA CI#1473

Merged
XuehaoSun merged 128 commits intomainfrom
xuehao/cuda-ci
Mar 20, 2026
Merged

Enable CUDA CI#1473
XuehaoSun merged 128 commits intomainfrom
xuehao/cuda-ci

Conversation

@XuehaoSun
Copy link
Copy Markdown
Contributor

@XuehaoSun XuehaoSun commented Feb 27, 2026

Description

Enable CUDA CI

TODO

  • Skip gptqmodel, auto-gptq test
  • Fix absolute path (like /models/xxx)
  • Skip tests that require a lot of hardware resources
  • Separate the vLLM-related unit tests @xin3he
  • triton issue
  • CUDA compatibility

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Copilot AI review requested due to automatic review settings February 27, 2026 02:55
@XuehaoSun XuehaoSun marked this pull request as draft February 27, 2026 02:55
Comment thread .azure-pipelines/scripts/cuda_unit_test/runpod_manager.py Fixed
Comment thread .azure-pipelines/scripts/cuda_unit_test/runpod_manager.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces CUDA CI infrastructure using RunPod for GPU-based testing in Azure Pipelines. The implementation creates a three-stage pipeline that dynamically provisions GPU instances, runs CUDA unit tests, and ensures proper cleanup of cloud resources.

Changes:

  • Added Azure pipeline configuration for CUDA tests with RunPod integration
  • Created Python scripts to manage RunPod instance lifecycle and Azure DevOps agent registration
  • Implemented bash script to execute CUDA unit tests with multiple test suites (standard, LLMC, SGLang)
  • Commented out auto-gptq requirement in test dependencies

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
test/test_cuda/requirements.txt Commented out auto-gptq dependency per TODO item
.azure-pipelines/unit-test-cuda.yml New pipeline with 3 stages: pod provisioning, GPU testing, and cleanup
.azure-pipelines/scripts/cuda_unit_test/runpod_manager.py Manages RunPod GPU instance creation, monitoring, and termination
.azure-pipelines/scripts/cuda_unit_test/run_cuda_ut.sh Executes CUDA unit tests with separate functions for standard, LLMC, and SGLang tests
.azure-pipelines/scripts/cuda_unit_test/azure_agent.py Manages Azure DevOps agent registration and deregistration

Comment thread .azure-pipelines/scripts/cuda_unit_test/runpod_manager.py Outdated
Comment thread .azure-pipelines/scripts/cuda_unit_test/run_cuda_ut.sh Outdated
Comment thread .azure-pipelines/unit-test-cuda.yml Outdated
Comment thread .azure-pipelines/scripts/cuda_unit_test/runpod_manager.py
Comment thread .azure-pipelines/unit-test-cuda.yml Outdated
Comment thread .azure-pipelines/scripts/cuda_unit_test/runpod_manager.py
Comment thread .azure-pipelines/scripts/cuda_unit_test/runpod_manager.py
Comment thread .azure-pipelines/scripts/cuda_unit_test/azure_agent.py
Comment thread .azure-pipelines/scripts/cuda_unit_test/runpod_manager.py
Comment thread .azure-pipelines/scripts/cuda_unit_test/run_cuda_ut.sh
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor

xin3he commented Mar 19, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor

xin3he commented Mar 19, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor

xin3he commented Mar 19, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

xin3he added 3 commits March 20, 2026 09:43
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor

xin3he commented Mar 20, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor

xin3he commented Mar 20, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor

xin3he commented Mar 20, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@xin3he
Copy link
Copy Markdown
Contributor

xin3he commented Mar 20, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor

xin3he commented Mar 20, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Xin He <xin3.he@intel.com>
@XuehaoSun XuehaoSun merged commit 4f1a8de into main Mar 20, 2026
19 of 27 checks passed
@XuehaoSun XuehaoSun deleted the xuehao/cuda-ci branch March 20, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add parts of CUDA UT into CI Refactor CUDA Unit Tests: Segregate into Weekly Full Suite and CI Lightweight Suite

6 participants