Skip to content

refactor: reusable workflows accept config-file input (PoC)#6126

Open
Eren-Jeager123 wants to merge 2 commits into
mainfrom
refactor/reusable-workflow-config-input
Open

refactor: reusable workflows accept config-file input (PoC)#6126
Eren-Jeager123 wants to merge 2 commits into
mainfrom
refactor/reusable-workflow-config-input

Conversation

@Eren-Jeager123
Copy link
Copy Markdown
Contributor

Summary

  • Add optional config-file input to 4 PyTorch-relevant reusable workflows (sanity, security, telemetry, EFA)
  • When config-file is provided, the reusable workflow reads metadata from the YAML config directly — callers no longer need to parse and forward 10+ individual parameters
  • Backward compatible: existing callers (vLLM, SGLang, Ray, Base) pass individual inputs and work unchanged
  • Updated pr-pytorch-ec2-cuda.yml as PoC — eliminates the verbose load-config→parse→forward pattern

Motivation

GitHub Actions limitation: when a matrix job has outputs, only the last-finishing matrix leg gets its outputs exposed. If multiple PyTorch versions change in the same PR, downstream reusable workflow calls only receive metadata from one version.

By making reusable workflows self-sufficient (reading their own config), each matrix leg can pass its own config-file path without shared outputs clobbering each other.

Workflow & Caller Inventory

13 Reusable Workflows

# Reusable Workflow Key Inputs
1 reusable-sanity-tests.yml image-uri, framework, framework-version, python-version, cuda-version, os-version, customer-type, arch-type, device-type, contributor, container-type
2 reusable-security-tests.yml image-uri, framework, framework-version
3 reusable-telemetry-tests.yml image-uri, framework, framework-version, container-type
4 reusable-efa-tests.yml image-uri, aws-account-id, aws-region
5 reusable-vllm-model-tests.yml image-uri, aws-account-id, aws-region
6 reusable-vllm-upstream-tests.yml image-uri, framework-version, vllm-ref, setup-script, example-test-script
7 reusable-vllm-sagemaker-tests.yml image-uri only
8 reusable-vllm-omni-model-tests.yml image-uri, customer-type
9 reusable-sglang-model-tests.yml image-uri, aws-account-id, aws-region
10 reusable-sglang-upstream-tests.yml image-uri, framework-version, benchmark-start-command
11 reusable-sglang-sagemaker-tests.yml image-uri only
12 reusable-release-image.yml source-image-uri, release-spec, environment
13 reusable-sagemaker-xgboost-integ-tests.yml image-uri, aws-account-id, aws-region

PR Workflows (20 callers)

Caller Reusable(s) Called Multi-version Matrix?
pr-pytorch-ec2-cuda.yml (this PoC) sanity, security, telemetry, efa Future: yes
pr-pytorch-ec2-cpu.yml sanity, security, telemetry Future: yes
pr-pytorch-sagemaker-cpu.yml sanity, security, telemetry Future: yes
pr-pytorch-sagemaker-cuda.yml sanity, security, telemetry Future: yes
pr-base-v1.yml sanity, security No
pr-base-v2.yml sanity, security No (matrix over targets)
pr-vllm-ec2-amzn2023.yml sanity, security, telemetry, upstream, model No
pr-vllm-ec2.yml sanity, security, telemetry, upstream, model No
pr-vllm-sagemaker-amzn2023.yml sanity, security, telemetry, sagemaker No
pr-vllm-sagemaker.yml sanity, security, telemetry, sagemaker No
pr-vllm-hyperpod-amzn2023.yml sanity, security, telemetry No
pr-vllm-omni-ec2-amzn2023.yml sanity, security, telemetry, vllm-omni-model No
pr-vllm-omni-sagemaker-amzn2023.yml sanity, security, telemetry, vllm-omni-model No
pr-sglang-ec2-amzn2023.yml sanity, security, telemetry, sglang-upstream, sglang-model No
pr-sglang-ec2.yml sanity, security, telemetry, sglang-upstream, sglang-model No
pr-sglang-sagemaker-amzn2023.yml sanity, security, telemetry, sglang-sagemaker No
pr-sglang-sagemaker.yml sanity, security, telemetry, sglang-sagemaker No
pr-ray-ec2-cpu.yml sanity, security No
pr-ray-ec2-gpu.yml sanity, security No
pr-ray-sagemaker-cpu/gpu.yml sanity, security No
pr-sagemaker-xgboost.yml xgboost-integ No

Autorelease Workflows (14 callers)

Caller Reusable(s) Called
autorelease-pytorch-ec2-cuda.yml sanity, security, telemetry, efa, release
autorelease-pytorch-ec2-cpu.yml sanity, security, telemetry, release
autorelease-pytorch-sagemaker-cpu.yml sanity, security, telemetry, release
autorelease-pytorch-sagemaker-cuda.yml sanity, security, telemetry, efa, release
autorelease-vllm-ec2-amzn2023.yml sanity, security, telemetry, upstream, model, release
autorelease-vllm-ec2.yml sanity, security, telemetry, upstream, model, release
autorelease-vllm-sagemaker-amzn2023.yml sanity, security, telemetry, sagemaker, release
autorelease-vllm-sagemaker.yml sanity, security, telemetry, sagemaker, release
autorelease-vllm-hyperpod-amzn2023.yml sanity, security, telemetry, release
autorelease-vllm-omni.yml sanity, security, telemetry, vllm-omni-model, release
autorelease-sglang-ec2.yml sanity, security, telemetry, sglang-upstream, sglang-model, release
autorelease-sglang-sagemaker.yml sanity, security, telemetry, sglang-sagemaker, release
autorelease-ray.yml sanity, security, release
autorelease-base.yml sanity, security, release

Dispatch Workflows (3 callers)

Caller Reusable(s) Called
dispatch-partner-release.yml release
dispatch-release-sagemaker-xgboost.yml xgboost-integ, release

Design

Before (10+ inputs forwarded per call):
┌─────────────────┐     ┌──────────────────────────────────────┐
│ PR Workflow      │     │ Reusable Workflow                    │
│                  │     │                                      │
│ load-config ────────┐  │  inputs:                            │
│   parse YAML    │   │  │    framework: "pytorch"             │
│   forward each  │   ├──│    framework-version: "2.11.0"      │
│   field as      │   │  │    python-version: "py312"          │
│   input         │   │  │    cuda-version: "cu130"            │
│                  │   │  │    os-version: "amzn2023"           │
│                  │   │  │    ... (10+ more)                   │
└─────────────────┘   │  └──────────────────────────────────────┘
                      └──→  (each field passed individually)

After (single config-file input):
┌─────────────────┐     ┌──────────────────────────────────────┐
│ PR Workflow      │     │ Reusable Workflow                    │
│                  │     │                                      │
│ config-file ─────────── │  inputs:                            │
│   just pass     │     │    config-file: "path/to/config.yml" │
│   the path      │     │                                      │
│                  │     │  load-config job:                    │
│                  │     │    reads YAML → resolves all fields  │
└─────────────────┘     └──────────────────────────────────────┘

Blast Radius

  • Only pr-pytorch-ec2-cuda.yml is triggered by this PR (it watches its own path)
  • Non-pytorch workflows do NOT have reusable-sanity/security/telemetry/efa in their path triggers
  • Backward compatible: vLLM/SGLang/Ray/Base callers pass individual inputs → else branch forwards them unchanged

Test plan

  • Verify YAML syntax passes (done locally)
  • Confirm only PR - PyTorch EC2 CUDA workflow triggers on this PR
  • Validate that existing callers (vLLM, SGLang, etc.) work unchanged on their next PR
  • Future: migrate remaining 3 PyTorch PR workflows + 4 autorelease workflows to config-file interface

Add optional `config-file` input to the 4 PyTorch-relevant reusable
workflows (sanity, security, telemetry, EFA). When provided, the
reusable workflow reads all metadata directly from the YAML config
file instead of requiring callers to parse and forward 10+ individual
input parameters.

This enables a future where matrix jobs can pass per-leg config file
paths without the GitHub Actions limitation of matrix outputs only
exposing the last-finishing leg.

Backward compatible: existing callers that pass individual inputs
continue to work unchanged (the else branch passes them through).

Updated pr-pytorch-ec2-cuda.yml as the first caller to use the new
config-file interface, eliminating the verbose load-config→parse→forward
pattern for test jobs.
@Eren-Jeager123 Eren-Jeager123 force-pushed the refactor/reusable-workflow-config-input branch from 80e8723 to dd3b901 Compare May 22, 2026 02:26
Make config-file a required input for the 4 shared reusable workflows
(sanity, security, telemetry, EFA). Each reusable workflow now reads
its own metadata from the YAML config file internally, eliminating
the need for callers to parse and forward 10+ individual parameters.

Migrate all 37 caller workflows (PR, autorelease, dispatch) to pass
config-file instead of individual metadata inputs. This removes ~440
lines of boilerplate forwarding.

Add a Dockerfile comment to trigger the PyTorch EC2 CUDA PR workflow
for CI validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant