refactor: reusable workflows accept config-file input (PoC) by Eren-Jeager123 · Pull Request #6126 · aws/deep-learning-containers

Eren-Jeager123 · 2026-05-22T02:21:25Z

Summary

Add optional config-file input to 4 PyTorch-relevant reusable workflows (sanity, security, telemetry, EFA)
When config-file is provided, the reusable workflow reads metadata from the YAML config directly — callers no longer need to parse and forward 10+ individual parameters
Backward compatible: existing callers (vLLM, SGLang, Ray, Base) pass individual inputs and work unchanged
Updated pr-pytorch-ec2-cuda.yml as PoC — eliminates the verbose load-config→parse→forward pattern

Motivation

GitHub Actions limitation: when a matrix job has outputs, only the last-finishing matrix leg gets its outputs exposed. If multiple PyTorch versions change in the same PR, downstream reusable workflow calls only receive metadata from one version.

By making reusable workflows self-sufficient (reading their own config), each matrix leg can pass its own config-file path without shared outputs clobbering each other.

Workflow & Caller Inventory

13 Reusable Workflows

#	Reusable Workflow	Key Inputs
1	`reusable-sanity-tests.yml`	image-uri, framework, framework-version, python-version, cuda-version, os-version, customer-type, arch-type, device-type, contributor, container-type
2	`reusable-security-tests.yml`	image-uri, framework, framework-version
3	`reusable-telemetry-tests.yml`	image-uri, framework, framework-version, container-type
4	`reusable-efa-tests.yml`	image-uri, aws-account-id, aws-region
5	`reusable-vllm-model-tests.yml`	image-uri, aws-account-id, aws-region
6	`reusable-vllm-upstream-tests.yml`	image-uri, framework-version, vllm-ref, setup-script, example-test-script
7	`reusable-vllm-sagemaker-tests.yml`	image-uri only
8	`reusable-vllm-omni-model-tests.yml`	image-uri, customer-type
9	`reusable-sglang-model-tests.yml`	image-uri, aws-account-id, aws-region
10	`reusable-sglang-upstream-tests.yml`	image-uri, framework-version, benchmark-start-command
11	`reusable-sglang-sagemaker-tests.yml`	image-uri only
12	`reusable-release-image.yml`	source-image-uri, release-spec, environment
13	`reusable-sagemaker-xgboost-integ-tests.yml`	image-uri, aws-account-id, aws-region

PR Workflows (20 callers)

Caller	Reusable(s) Called	Multi-version Matrix?
`pr-pytorch-ec2-cuda.yml` (this PoC)	sanity, security, telemetry, efa	Future: yes
`pr-pytorch-ec2-cpu.yml`	sanity, security, telemetry	Future: yes
`pr-pytorch-sagemaker-cpu.yml`	sanity, security, telemetry	Future: yes
`pr-pytorch-sagemaker-cuda.yml`	sanity, security, telemetry	Future: yes
`pr-base-v1.yml`	sanity, security	No
`pr-base-v2.yml`	sanity, security	No (matrix over targets)
`pr-vllm-ec2-amzn2023.yml`	sanity, security, telemetry, upstream, model	No
`pr-vllm-ec2.yml`	sanity, security, telemetry, upstream, model	No
`pr-vllm-sagemaker-amzn2023.yml`	sanity, security, telemetry, sagemaker	No
`pr-vllm-sagemaker.yml`	sanity, security, telemetry, sagemaker	No
`pr-vllm-hyperpod-amzn2023.yml`	sanity, security, telemetry	No
`pr-vllm-omni-ec2-amzn2023.yml`	sanity, security, telemetry, vllm-omni-model	No
`pr-vllm-omni-sagemaker-amzn2023.yml`	sanity, security, telemetry, vllm-omni-model	No
`pr-sglang-ec2-amzn2023.yml`	sanity, security, telemetry, sglang-upstream, sglang-model	No
`pr-sglang-ec2.yml`	sanity, security, telemetry, sglang-upstream, sglang-model	No
`pr-sglang-sagemaker-amzn2023.yml`	sanity, security, telemetry, sglang-sagemaker	No
`pr-sglang-sagemaker.yml`	sanity, security, telemetry, sglang-sagemaker	No
`pr-ray-ec2-cpu.yml`	sanity, security	No
`pr-ray-ec2-gpu.yml`	sanity, security	No
`pr-ray-sagemaker-cpu/gpu.yml`	sanity, security	No
`pr-sagemaker-xgboost.yml`	xgboost-integ	No

Autorelease Workflows (14 callers)

Caller	Reusable(s) Called
`autorelease-pytorch-ec2-cuda.yml`	sanity, security, telemetry, efa, release
`autorelease-pytorch-ec2-cpu.yml`	sanity, security, telemetry, release
`autorelease-pytorch-sagemaker-cpu.yml`	sanity, security, telemetry, release
`autorelease-pytorch-sagemaker-cuda.yml`	sanity, security, telemetry, efa, release
`autorelease-vllm-ec2-amzn2023.yml`	sanity, security, telemetry, upstream, model, release
`autorelease-vllm-ec2.yml`	sanity, security, telemetry, upstream, model, release
`autorelease-vllm-sagemaker-amzn2023.yml`	sanity, security, telemetry, sagemaker, release
`autorelease-vllm-sagemaker.yml`	sanity, security, telemetry, sagemaker, release
`autorelease-vllm-hyperpod-amzn2023.yml`	sanity, security, telemetry, release
`autorelease-vllm-omni.yml`	sanity, security, telemetry, vllm-omni-model, release
`autorelease-sglang-ec2.yml`	sanity, security, telemetry, sglang-upstream, sglang-model, release
`autorelease-sglang-sagemaker.yml`	sanity, security, telemetry, sglang-sagemaker, release
`autorelease-ray.yml`	sanity, security, release
`autorelease-base.yml`	sanity, security, release

Dispatch Workflows (3 callers)

Caller	Reusable(s) Called
`dispatch-partner-release.yml`	release
`dispatch-release-sagemaker-xgboost.yml`	xgboost-integ, release

Design

Before (10+ inputs forwarded per call):
┌─────────────────┐     ┌──────────────────────────────────────┐
│ PR Workflow      │     │ Reusable Workflow                    │
│                  │     │                                      │
│ load-config ────────┐  │  inputs:                            │
│   parse YAML    │   │  │    framework: "pytorch"             │
│   forward each  │   ├──│    framework-version: "2.11.0"      │
│   field as      │   │  │    python-version: "py312"          │
│   input         │   │  │    cuda-version: "cu130"            │
│                  │   │  │    os-version: "amzn2023"           │
│                  │   │  │    ... (10+ more)                   │
└─────────────────┘   │  └──────────────────────────────────────┘
                      └──→  (each field passed individually)

After (single config-file input):
┌─────────────────┐     ┌──────────────────────────────────────┐
│ PR Workflow      │     │ Reusable Workflow                    │
│                  │     │                                      │
│ config-file ─────────── │  inputs:                            │
│   just pass     │     │    config-file: "path/to/config.yml" │
│   the path      │     │                                      │
│                  │     │  load-config job:                    │
│                  │     │    reads YAML → resolves all fields  │
└─────────────────┘     └──────────────────────────────────────┘

Blast Radius

Only pr-pytorch-ec2-cuda.yml is triggered by this PR (it watches its own path)
Non-pytorch workflows do NOT have reusable-sanity/security/telemetry/efa in their path triggers
Backward compatible: vLLM/SGLang/Ray/Base callers pass individual inputs → else branch forwards them unchanged

Test plan

Verify YAML syntax passes (done locally)
Confirm only PR - PyTorch EC2 CUDA workflow triggers on this PR
Validate that existing callers (vLLM, SGLang, etc.) work unchanged on their next PR
Future: migrate remaining 3 PyTorch PR workflows + 4 autorelease workflows to config-file interface

Add optional `config-file` input to the 4 PyTorch-relevant reusable workflows (sanity, security, telemetry, EFA). When provided, the reusable workflow reads all metadata directly from the YAML config file instead of requiring callers to parse and forward 10+ individual input parameters. This enables a future where matrix jobs can pass per-leg config file paths without the GitHub Actions limitation of matrix outputs only exposing the last-finishing leg. Backward compatible: existing callers that pass individual inputs continue to work unchanged (the else branch passes them through). Updated pr-pytorch-ec2-cuda.yml as the first caller to use the new config-file interface, eliminating the verbose load-config→parse→forward pattern for test jobs.

Make config-file a required input for the 4 shared reusable workflows (sanity, security, telemetry, EFA). Each reusable workflow now reads its own metadata from the YAML config file internally, eliminating the need for callers to parse and forward 10+ individual parameters. Migrate all 37 caller workflows (PR, autorelease, dispatch) to pass config-file instead of individual metadata inputs. This removes ~440 lines of boilerplate forwarding. Add a Dockerfile comment to trigger the PyTorch EC2 CUDA PR workflow for CI validation.

aws-deep-learning-containers-ci Bot added the authorized label May 22, 2026

Eren-Jeager123 force-pushed the refactor/reusable-workflow-config-input branch from 80e8723 to dd3b901 Compare May 22, 2026 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: reusable workflows accept config-file input (PoC)#6126

refactor: reusable workflows accept config-file input (PoC)#6126
Eren-Jeager123 wants to merge 2 commits into
mainfrom
refactor/reusable-workflow-config-input

Eren-Jeager123 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Eren-Jeager123 commented May 22, 2026

Summary

Motivation

Workflow & Caller Inventory

13 Reusable Workflows

PR Workflows (20 callers)

Autorelease Workflows (14 callers)

Dispatch Workflows (3 callers)

Design

Blast Radius

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant