Add release engineering infrastructure by sunway513 · Pull Request #2670 · ROCm/aiter

sunway513 · 2026-04-09T14:49:43Z

Summary

Establish formal release engineering infrastructure for AITER to support regular release cycles.

Changes

Release workflow (aiter-release.yaml) — full rewrite:

Aligned with CI nightly pipeline for precompiled kernel wheels
Added prebuild kernel validation (requires ≥10 compiled .so files)
Added smoke test (import + version check)
Docker image push on tag push (rocm/aiter-ci:{tag}-py{ver})
S3 wheel upload to both release-specific and staging paths
GitHub Release creation with wheel assets attached
workflow_dispatch with configurable inputs (runner, GPU archs, Docker images, Python versions)
Fixed non-standard top-level description: field that broke workflow parsing

CI workflow triggers — added release/** branch support:

aiter-test.yaml
atom-test.yaml
sglang_downstream.yaml
triton-test.yaml
vllm_benchmark.yaml

Release process documentation:

RELEASE_PROCESS.md — full release lifecycle (branch → RC → validation → publish → hotfix)
scripts/generate_changelog.sh — auto-categorizes commits by PR prefix
scripts/release_checklist.md — pre/post release validation checklist

Motivation

AITER has accumulated 334 commits since v0.1.11.post1 (2026-03-05) without a release. SGLang upgrade has been blocked for 7 weeks. This PR establishes the infrastructure needed for a sustainable 2-week release cadence.

ROCm Version Support Plan

Target support matrix for upcoming releases:

ROCm 7.0 + gfx942/gfx950 (Tier 1)
ROCm 7.2.1 + gfx942/gfx950 (Tier 1)

Test plan

Verify CI workflows trigger on release/** branch push
Verify workflow_dispatch is recognized on main after merge
Test tag push triggers release workflow end-to-end
Validate prebuild kernel count ≥ 10 in built wheel
Smoke test passes (import aiter, check version)

🤖 Generated with Claude Code

- Rewrite aiter-release.yaml to align with CI nightly pipeline: - Prebuild kernel validation (>=10 .so files) - Smoke test (import + version check) - Docker image push on tag (rocm/aiter-ci:{tag}-py{ver}) - S3 wheel upload (releases + staging paths) - GitHub Release creation with wheel assets - workflow_dispatch with configurable inputs - Add release/** branch triggers to all CI workflows: aiter-test, atom-test, sglang, triton, vllm - Add RELEASE_PROCESS.md documenting release lifecycle - Add scripts/generate_changelog.sh for auto-categorized changelogs - Add scripts/release_checklist.md for pre/post release validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-09T14:50:28Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-355`	Run Triton tests on MI355 in addition to MI325
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2670 --add-label <label>

Copilot

Pull request overview

This PR introduces release engineering infrastructure for AITER by adding a tag-triggered release workflow, documenting the release lifecycle, and ensuring CI runs on release/** branches to support regular release cycles.

Changes:

Rewrote aiter-release.yaml to build precompiled-kernel wheels on tag push, validate prebuilts, upload wheels (artifacts + S3), push tagged Docker images, and create GitHub Releases.
Updated multiple CI workflows to also trigger on release/** branches.
Added release process documentation plus a changelog generator and release checklist.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`.github/workflows/aiter-release.yaml`	New end-to-end release workflow (build/validate/upload/release) triggered by tags and workflow dispatch.
`.github/workflows/aiter-test.yaml`	Allow CI to run on `release/**` branches.
`.github/workflows/atom-test.yaml`	Allow downstream ATOM validation to run on `release/**` branches.
`.github/workflows/sglang_downstream.yaml`	Allow downstream SGLang validation to run on `release/**` branches.
`.github/workflows/triton-test.yaml`	Allow Triton tests to run on `release/**` branches.
`.github/workflows/vllm_benchmark.yaml`	Allow vLLM benchmark workflow to run on `release/**` branches.
`RELEASE_PROCESS.md`	Documents the release/RC/hotfix flow and expectations.
`scripts/generate_changelog.sh`	Generates categorized release notes from git history.
`scripts/release_checklist.md`	Operational checklist for pre-release/release/post-release steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T14:55:04Z

+
+  # Generate changelog and create GitHub Release on tag push
+  create_release:
+    if: ${{ startsWith(github.ref, 'refs/tags/v') && github.event.inputs.skip_github_release != 'true' }}


In create_release.if, skip_github_release is declared as a boolean input but the condition compares it to the string 'true'. Depending on how GitHub evaluates boolean inputs, this can cause the release to be created even when the checkbox is set. Prefer a boolean-safe check (e.g., !github.event.inputs.skip_github_release with a fallback for tag-push where inputs is unset).

Suggested change

if: ${{ startsWith(github.ref, 'refs/tags/v') && github.event.inputs.skip_github_release != 'true' }}

if: ${{ startsWith(github.ref, 'refs/tags/v') && (github.event.inputs == null || !github.event.inputs.skip_github_release) }}

Copilot · 2026-04-09T14:55:05Z

+          # Find previous tag
+          PREV_TAG=$(git tag --sort=-version:refname | grep -v "$TAG" | head -1)
+          echo "Generating changelog: ${PREV_TAG}..${TAG}"
+
+          if [ -f "scripts/generate_changelog.sh" ]; then
+            bash scripts/generate_changelog.sh "$PREV_TAG" "$TAG" /tmp/release_notes.md
+          else
+            # Fallback: use git log
+            echo "# ${TAG} Release Notes" > /tmp/release_notes.md
+            echo "" >> /tmp/release_notes.md
+            git log "${PREV_TAG}..${TAG}" --format="- %s" >> /tmp/release_notes.md


PREV_TAG selection will be empty when the repo has no prior tags (or if grep -v "$TAG" filters everything), which will break changelog generation (git log "${PREV_TAG}..${TAG}"). Add a guard for the “first release tag” case (e.g., fall back to the first commit / skip the range and generate notes from git log "$TAG").

Suggested change

# Find previous tag

PREV_TAG=$(git tag --sort=-version:refname | grep -v "$TAG" | head -1)

echo "Generating changelog: ${PREV_TAG}..${TAG}"

if [ -f "scripts/generate_changelog.sh" ]; then

bash scripts/generate_changelog.sh "$PREV_TAG" "$TAG" /tmp/release_notes.md

else

# Fallback: use git log

echo "# ${TAG} Release Notes" > /tmp/release_notes.md

echo "" >> /tmp/release_notes.md

git log "${PREV_TAG}..${TAG}" --format="- %s" >> /tmp/release_notes.md

# Find previous tag; allow no match for the first release tag

PREV_TAG=$(git tag --sort=-version:refname | grep -Fxv "$TAG" | head -1 || true)

if [ -n "$PREV_TAG" ]; then

echo "Generating changelog: ${PREV_TAG}..${TAG}"

if [ -f "scripts/generate_changelog.sh" ]; then

bash scripts/generate_changelog.sh "$PREV_TAG" "$TAG" /tmp/release_notes.md

else

# Fallback: use git log between the previous tag and this tag

echo "# ${TAG} Release Notes" > /tmp/release_notes.md

echo "" >> /tmp/release_notes.md

git log "${PREV_TAG}..${TAG}" --format="- %s" >> /tmp/release_notes.md

fi

else

echo "No previous tag found; generating changelog from all commits in ${TAG}"

echo "# ${TAG} Release Notes" > /tmp/release_notes.md

echo "" >> /tmp/release_notes.md

git log "$TAG" --format="- %s" >> /tmp/release_notes.md

Copilot · 2026-04-09T14:55:05Z

+          for WHL in dist/*.whl; do
+            WHL_NAME=$(basename ${WHL})
+            echo "Uploading ${WHL_NAME} to S3..."
+            # Upload to release-specific path
+            aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-releases/gfx942-gfx950/${TAG}/${WHL_NAME}
+            # Also upload to staging for downstream CI compatibility
+            aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-staging/gfx942-gfx950/${WHL_NAME}


The workflow allows overriding GPU_ARCHS, but the S3 upload path is hard-coded to gfx942-gfx950. If someone dispatches a build with a different arch set, the wheels will be uploaded under the wrong folder. Either derive the S3 prefix from ${GPU_ARCHS} (normalized) or remove the arch override to keep the uploaded path consistent with what was built.

Suggested change

for WHL in dist/*.whl; do

WHL_NAME=$(basename ${WHL})

echo "Uploading ${WHL_NAME} to S3..."

# Upload to release-specific path

aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-releases/gfx942-gfx950/${TAG}/${WHL_NAME}

# Also upload to staging for downstream CI compatibility

aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-staging/gfx942-gfx950/${WHL_NAME}

RAW_GPU_ARCHS="${GPU_ARCHS:-gfx942;gfx950}"

S3_GPU_ARCHS=$(printf '%s' "${RAW_GPU_ARCHS}" | tr ',;: ' '-' | tr -s '-')

for WHL in dist/*.whl; do

WHL_NAME=$(basename ${WHL})

echo "Uploading ${WHL_NAME} to S3 under ${S3_GPU_ARCHS}..."

# Upload to release-specific path

aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-releases/${S3_GPU_ARCHS}/${TAG}/${WHL_NAME}

# Also upload to staging for downstream CI compatibility

aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-staging/${S3_GPU_ARCHS}/${WHL_NAME}

Copilot · 2026-04-09T14:55:06Z

+            curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
+            unzip -q awscliv2.zip
+            sudo ./aws/install
+            rm -rf awscliv2.zip aws


Installing AWS CLI via sudo ./aws/install assumes the runner has sudo and permits system-level installs. On many self-hosted/k8s runners this will fail. Consider using a GitHub Action that provides the AWS CLI, installing via pip install --user awscli, or running aws s3 cp inside the build container where you control privileges.

Suggested change

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

unzip -q awscliv2.zip

sudo ./aws/install

rm -rf awscliv2.zip aws

python3 -m pip install --user awscli

echo "$HOME/.local/bin" >> "$GITHUB_PATH"

Copilot · 2026-04-09T14:55:06Z

+# Get all commit subjects with PR numbers
+COMMITS=$(git log "${FROM_REF}..${TO_REF}" --format="%s" --reverse)
+TOTAL=$(echo "$COMMITS" | wc -l)
+
+# Temp files for categories
+TMP=$(mktemp -d)
+trap 'rm -rf "$TMP"' EXIT
+
+touch "$TMP/features" "$TMP/performance" "$TMP/fixes" "$TMP/refactor" "$TMP/ci" "$TMP/other"
+
+while IFS= read -r line; do
+    # Extract PR number if present
+    PR_NUM=$(echo "$line" | grep -oP '#\d+' | tail -1 || true)
+    PR_LINK=""
+    if [ -n "$PR_NUM" ]; then
+        PR_LINK=" (${REPO_URL}/pull/${PR_NUM#\#})"
+    fi
+
+    # Clean up subject (remove trailing PR reference for display)
+    SUBJECT=$(echo "$line" | sed 's/ (#[0-9]*)$//')
+
+    ENTRY="- ${SUBJECT}${PR_LINK}"
+
+    # Categorize by prefix/keywords
+    case "$line" in
+        *"[feat]"*|*"feat("*|*"feat:"*|"Add "*|"add "*|"support"*|"Support"*|"Enable "*|"enable "*|"Introduce "*|"new "*|"New "*)
+            echo "$ENTRY" >> "$TMP/features" ;;
+        *"[Perf]"*|*"tune"*|*"Tune"*|*"tuned"*|*"Retune"*|*"retune"*|*"optim"*|*"Optim"*|*"perf"*|*"speed"*)
+            echo "$ENTRY" >> "$TMP/performance" ;;
+        *"fix"*|*"Fix"*|*"FIX"*|*"bug"*|*"Bug"*|*"hotfix"*|*"Revert"*|*"revert"*|*"accuracy"*)
+            echo "$ENTRY" >> "$TMP/fixes" ;;
+        *"refactor"*|*"Refactor"*|*"replace"*|*"Replace"*|*"remove"*|*"Remove"*|*"rm "*|*"[OPUS]"*|*"opus"*|*"migrate"*|*"clean"*)
+            echo "$ENTRY" >> "$TMP/refactor" ;;
+        "CI:"*|"CI "*|*"[CI]"*|*"test"*|*"Test"*|*"build"*|*"Build"*)
+            echo "$ENTRY" >> "$TMP/ci" ;;
+        *)
+            echo "$ENTRY" >> "$TMP/other" ;;
+    esac
+done <<< "$COMMITS"


If there are zero commits between the two refs, COMMITS becomes empty but the here-string loop still runs once with an empty line, producing a bogus - entry and counts of 1. Handle the empty range explicitly (or iterate directly over git log ... | while read ... and compute TOTAL via git rev-list --count).

Copilot · 2026-04-09T14:55:06Z

+while IFS= read -r line; do
+    # Extract PR number if present
+    PR_NUM=$(echo "$line" | grep -oP '#\d+' | tail -1 || true)
+    PR_LINK=""
+    if [ -n "$PR_NUM" ]; then
+        PR_LINK=" (${REPO_URL}/pull/${PR_NUM#\#})"
+    fi


grep -oP requires GNU grep with PCRE support and will fail on default macOS/BSD grep. Since this script is likely to be run locally during releases, consider switching to a POSIX-compatible extraction (e.g., sed/grep -E) or documenting the GNU grep dependency.

The Checks workflow (pre-checks.yaml) produces the signal artifact that all other CI workflows depend on via check_signal.sh. Without this trigger, all CI jobs on release branches fail at the check-signal step because no artifact exists for that commit SHA. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

[P1] Verify prebuilt kernels and smoke test from /tmp instead of /workspace to ensure Python imports from site-packages (the installed wheel) rather than the mounted source tree. Without this, validation passes even if the wheel has no prebuilt kernels. [P2] Handle first-ever release tag gracefully — PREV_TAG can be empty. Also check for hand-written RELEASE_NOTES file first and skip changelog generation entirely if found. [P2] Derive S3 upload path from GPU_ARCHS input instead of hardcoding gfx942-gfx950. Manual dispatches with different arch sets now upload to the correct prefix. [Doc] Fix RELEASE_PROCESS.md — CI runs on both push and PRs targeting release/** branches, not "push only". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sunway513 requested review from a team and Copilot April 9, 2026 14:49

Copilot started reviewing on behalf of sunway513 April 9, 2026 14:51 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

sunway513 and others added 2 commits April 9, 2026 15:13

sunway513 mentioned this pull request Apr 9, 2026

Release v0.1.12: CI Validation Tracking #2674

Open

3 tasks

Merge branch 'main' into feat/release-infra

50728f4

valarLip assigned gyohuangxin Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add release engineering infrastructure#2670

Add release engineering infrastructure#2670
sunway513 wants to merge 4 commits intomainfrom
feat/release-infra

sunway513 commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if: ${{ startsWith(github.ref, 'refs/tags/v') && github.event.inputs.skip_github_release != 'true' }}
	if: ${{ startsWith(github.ref, 'refs/tags/v') && (github.event.inputs == null \|\| !github.event.inputs.skip_github_release) }}

Conversation

sunway513 commented Apr 9, 2026

Summary

Changes

Motivation

ROCm Version Support Plan

Test plan

Uh oh!

github-actions bot commented Apr 9, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants