Skip to content

Add release engineering infrastructure#2670

Open
sunway513 wants to merge 4 commits intomainfrom
feat/release-infra
Open

Add release engineering infrastructure#2670
sunway513 wants to merge 4 commits intomainfrom
feat/release-infra

Conversation

@sunway513
Copy link
Copy Markdown
Collaborator

Summary

Establish formal release engineering infrastructure for AITER to support regular release cycles.

Changes

Release workflow (aiter-release.yaml) — full rewrite:

  • Aligned with CI nightly pipeline for precompiled kernel wheels
  • Added prebuild kernel validation (requires ≥10 compiled .so files)
  • Added smoke test (import + version check)
  • Docker image push on tag push (rocm/aiter-ci:{tag}-py{ver})
  • S3 wheel upload to both release-specific and staging paths
  • GitHub Release creation with wheel assets attached
  • workflow_dispatch with configurable inputs (runner, GPU archs, Docker images, Python versions)
  • Fixed non-standard top-level description: field that broke workflow parsing

CI workflow triggers — added release/** branch support:

  • aiter-test.yaml
  • atom-test.yaml
  • sglang_downstream.yaml
  • triton-test.yaml
  • vllm_benchmark.yaml

Release process documentation:

  • RELEASE_PROCESS.md — full release lifecycle (branch → RC → validation → publish → hotfix)
  • scripts/generate_changelog.sh — auto-categorizes commits by PR prefix
  • scripts/release_checklist.md — pre/post release validation checklist

Motivation

AITER has accumulated 334 commits since v0.1.11.post1 (2026-03-05) without a release. SGLang upgrade has been blocked for 7 weeks. This PR establishes the infrastructure needed for a sustainable 2-week release cadence.

ROCm Version Support Plan

Target support matrix for upcoming releases:

  • ROCm 7.0 + gfx942/gfx950 (Tier 1)
  • ROCm 7.2.1 + gfx942/gfx950 (Tier 1)

Test plan

  • Verify CI workflows trigger on release/** branch push
  • Verify workflow_dispatch is recognized on main after merge
  • Test tag push triggers release workflow end-to-end
  • Validate prebuild kernel count ≥ 10 in built wheel
  • Smoke test passes (import aiter, check version)

🤖 Generated with Claude Code

- Rewrite aiter-release.yaml to align with CI nightly pipeline:
  - Prebuild kernel validation (>=10 .so files)
  - Smoke test (import + version check)
  - Docker image push on tag (rocm/aiter-ci:{tag}-py{ver})
  - S3 wheel upload (releases + staging paths)
  - GitHub Release creation with wheel assets
  - workflow_dispatch with configurable inputs
- Add release/** branch triggers to all CI workflows:
  aiter-test, atom-test, sglang, triton, vllm
- Add RELEASE_PROCESS.md documenting release lifecycle
- Add scripts/generate_changelog.sh for auto-categorized changelogs
- Add scripts/release_checklist.md for pre/post release validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sunway513 sunway513 requested review from a team and Copilot April 9, 2026 14:49
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-355 Run Triton tests on MI355 in addition to MI325
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2670 --add-label <label>

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces release engineering infrastructure for AITER by adding a tag-triggered release workflow, documenting the release lifecycle, and ensuring CI runs on release/** branches to support regular release cycles.

Changes:

  • Rewrote aiter-release.yaml to build precompiled-kernel wheels on tag push, validate prebuilts, upload wheels (artifacts + S3), push tagged Docker images, and create GitHub Releases.
  • Updated multiple CI workflows to also trigger on release/** branches.
  • Added release process documentation plus a changelog generator and release checklist.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.github/workflows/aiter-release.yaml New end-to-end release workflow (build/validate/upload/release) triggered by tags and workflow dispatch.
.github/workflows/aiter-test.yaml Allow CI to run on release/** branches.
.github/workflows/atom-test.yaml Allow downstream ATOM validation to run on release/** branches.
.github/workflows/sglang_downstream.yaml Allow downstream SGLang validation to run on release/** branches.
.github/workflows/triton-test.yaml Allow Triton tests to run on release/** branches.
.github/workflows/vllm_benchmark.yaml Allow vLLM benchmark workflow to run on release/** branches.
RELEASE_PROCESS.md Documents the release/RC/hotfix flow and expectations.
scripts/generate_changelog.sh Generates categorized release notes from git history.
scripts/release_checklist.md Operational checklist for pre-release/release/post-release steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Generate changelog and create GitHub Release on tag push
create_release:
if: ${{ startsWith(github.ref, 'refs/tags/v') && github.event.inputs.skip_github_release != 'true' }}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In create_release.if, skip_github_release is declared as a boolean input but the condition compares it to the string 'true'. Depending on how GitHub evaluates boolean inputs, this can cause the release to be created even when the checkbox is set. Prefer a boolean-safe check (e.g., !github.event.inputs.skip_github_release with a fallback for tag-push where inputs is unset).

Suggested change
if: ${{ startsWith(github.ref, 'refs/tags/v') && github.event.inputs.skip_github_release != 'true' }}
if: ${{ startsWith(github.ref, 'refs/tags/v') && (github.event.inputs == null || !github.event.inputs.skip_github_release) }}

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/aiter-release.yaml Outdated
Comment on lines +295 to +305
# Find previous tag
PREV_TAG=$(git tag --sort=-version:refname | grep -v "$TAG" | head -1)
echo "Generating changelog: ${PREV_TAG}..${TAG}"

if [ -f "scripts/generate_changelog.sh" ]; then
bash scripts/generate_changelog.sh "$PREV_TAG" "$TAG" /tmp/release_notes.md
else
# Fallback: use git log
echo "# ${TAG} Release Notes" > /tmp/release_notes.md
echo "" >> /tmp/release_notes.md
git log "${PREV_TAG}..${TAG}" --format="- %s" >> /tmp/release_notes.md
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PREV_TAG selection will be empty when the repo has no prior tags (or if grep -v "$TAG" filters everything), which will break changelog generation (git log "${PREV_TAG}..${TAG}"). Add a guard for the “first release tag” case (e.g., fall back to the first commit / skip the range and generate notes from git log "$TAG").

Suggested change
# Find previous tag
PREV_TAG=$(git tag --sort=-version:refname | grep -v "$TAG" | head -1)
echo "Generating changelog: ${PREV_TAG}..${TAG}"
if [ -f "scripts/generate_changelog.sh" ]; then
bash scripts/generate_changelog.sh "$PREV_TAG" "$TAG" /tmp/release_notes.md
else
# Fallback: use git log
echo "# ${TAG} Release Notes" > /tmp/release_notes.md
echo "" >> /tmp/release_notes.md
git log "${PREV_TAG}..${TAG}" --format="- %s" >> /tmp/release_notes.md
# Find previous tag; allow no match for the first release tag
PREV_TAG=$(git tag --sort=-version:refname | grep -Fxv "$TAG" | head -1 || true)
if [ -n "$PREV_TAG" ]; then
echo "Generating changelog: ${PREV_TAG}..${TAG}"
if [ -f "scripts/generate_changelog.sh" ]; then
bash scripts/generate_changelog.sh "$PREV_TAG" "$TAG" /tmp/release_notes.md
else
# Fallback: use git log between the previous tag and this tag
echo "# ${TAG} Release Notes" > /tmp/release_notes.md
echo "" >> /tmp/release_notes.md
git log "${PREV_TAG}..${TAG}" --format="- %s" >> /tmp/release_notes.md
fi
else
echo "No previous tag found; generating changelog from all commits in ${TAG}"
echo "# ${TAG} Release Notes" > /tmp/release_notes.md
echo "" >> /tmp/release_notes.md
git log "$TAG" --format="- %s" >> /tmp/release_notes.md

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/aiter-release.yaml Outdated
Comment on lines +261 to +267
for WHL in dist/*.whl; do
WHL_NAME=$(basename ${WHL})
echo "Uploading ${WHL_NAME} to S3..."
# Upload to release-specific path
aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-releases/gfx942-gfx950/${TAG}/${WHL_NAME}
# Also upload to staging for downstream CI compatibility
aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-staging/gfx942-gfx950/${WHL_NAME}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow allows overriding GPU_ARCHS, but the S3 upload path is hard-coded to gfx942-gfx950. If someone dispatches a build with a different arch set, the wheels will be uploaded under the wrong folder. Either derive the S3 prefix from ${GPU_ARCHS} (normalized) or remove the arch override to keep the uploaded path consistent with what was built.

Suggested change
for WHL in dist/*.whl; do
WHL_NAME=$(basename ${WHL})
echo "Uploading ${WHL_NAME} to S3..."
# Upload to release-specific path
aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-releases/gfx942-gfx950/${TAG}/${WHL_NAME}
# Also upload to staging for downstream CI compatibility
aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-staging/gfx942-gfx950/${WHL_NAME}
RAW_GPU_ARCHS="${GPU_ARCHS:-gfx942;gfx950}"
S3_GPU_ARCHS=$(printf '%s' "${RAW_GPU_ARCHS}" | tr ',;: ' '-' | tr -s '-')
for WHL in dist/*.whl; do
WHL_NAME=$(basename ${WHL})
echo "Uploading ${WHL_NAME} to S3 under ${S3_GPU_ARCHS}..."
# Upload to release-specific path
aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-releases/${S3_GPU_ARCHS}/${TAG}/${WHL_NAME}
# Also upload to staging for downstream CI compatibility
aws s3 cp ${WHL} s3://framework-whls-nightlies/whl-staging/${S3_GPU_ARCHS}/${WHL_NAME}

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +254
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -q awscliv2.zip
sudo ./aws/install
rm -rf awscliv2.zip aws
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing AWS CLI via sudo ./aws/install assumes the runner has sudo and permits system-level installs. On many self-hosted/k8s runners this will fail. Consider using a GitHub Action that provides the AWS CLI, installing via pip install --user awscli, or running aws s3 cp inside the build container where you control privileges.

Suggested change
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -q awscliv2.zip
sudo ./aws/install
rm -rf awscliv2.zip aws
python3 -m pip install --user awscli
echo "$HOME/.local/bin" >> "$GITHUB_PATH"

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +53
# Get all commit subjects with PR numbers
COMMITS=$(git log "${FROM_REF}..${TO_REF}" --format="%s" --reverse)
TOTAL=$(echo "$COMMITS" | wc -l)

# Temp files for categories
TMP=$(mktemp -d)
trap 'rm -rf "$TMP"' EXIT

touch "$TMP/features" "$TMP/performance" "$TMP/fixes" "$TMP/refactor" "$TMP/ci" "$TMP/other"

while IFS= read -r line; do
# Extract PR number if present
PR_NUM=$(echo "$line" | grep -oP '#\d+' | tail -1 || true)
PR_LINK=""
if [ -n "$PR_NUM" ]; then
PR_LINK=" (${REPO_URL}/pull/${PR_NUM#\#})"
fi

# Clean up subject (remove trailing PR reference for display)
SUBJECT=$(echo "$line" | sed 's/ (#[0-9]*)$//')

ENTRY="- ${SUBJECT}${PR_LINK}"

# Categorize by prefix/keywords
case "$line" in
*"[feat]"*|*"feat("*|*"feat:"*|"Add "*|"add "*|"support"*|"Support"*|"Enable "*|"enable "*|"Introduce "*|"new "*|"New "*)
echo "$ENTRY" >> "$TMP/features" ;;
*"[Perf]"*|*"tune"*|*"Tune"*|*"tuned"*|*"Retune"*|*"retune"*|*"optim"*|*"Optim"*|*"perf"*|*"speed"*)
echo "$ENTRY" >> "$TMP/performance" ;;
*"fix"*|*"Fix"*|*"FIX"*|*"bug"*|*"Bug"*|*"hotfix"*|*"Revert"*|*"revert"*|*"accuracy"*)
echo "$ENTRY" >> "$TMP/fixes" ;;
*"refactor"*|*"Refactor"*|*"replace"*|*"Replace"*|*"remove"*|*"Remove"*|*"rm "*|*"[OPUS]"*|*"opus"*|*"migrate"*|*"clean"*)
echo "$ENTRY" >> "$TMP/refactor" ;;
"CI:"*|"CI "*|*"[CI]"*|*"test"*|*"Test"*|*"build"*|*"Build"*)
echo "$ENTRY" >> "$TMP/ci" ;;
*)
echo "$ENTRY" >> "$TMP/other" ;;
esac
done <<< "$COMMITS"
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are zero commits between the two refs, COMMITS becomes empty but the here-string loop still runs once with an empty line, producing a bogus - entry and counts of 1. Handle the empty range explicitly (or iterate directly over git log ... | while read ... and compute TOTAL via git rev-list --count).

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +31
while IFS= read -r line; do
# Extract PR number if present
PR_NUM=$(echo "$line" | grep -oP '#\d+' | tail -1 || true)
PR_LINK=""
if [ -n "$PR_NUM" ]; then
PR_LINK=" (${REPO_URL}/pull/${PR_NUM#\#})"
fi
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep -oP requires GNU grep with PCRE support and will fail on default macOS/BSD grep. Since this script is likely to be run locally during releases, consider switching to a POSIX-compatible extraction (e.g., sed/grep -E) or documenting the GNU grep dependency.

Copilot uses AI. Check for mistakes.
sunway513 and others added 2 commits April 9, 2026 15:13
The Checks workflow (pre-checks.yaml) produces the signal artifact
that all other CI workflows depend on via check_signal.sh. Without
this trigger, all CI jobs on release branches fail at the
check-signal step because no artifact exists for that commit SHA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
[P1] Verify prebuilt kernels and smoke test from /tmp instead of
/workspace to ensure Python imports from site-packages (the installed
wheel) rather than the mounted source tree. Without this, validation
passes even if the wheel has no prebuilt kernels.

[P2] Handle first-ever release tag gracefully — PREV_TAG can be empty.
Also check for hand-written RELEASE_NOTES file first and skip
changelog generation entirely if found.

[P2] Derive S3 upload path from GPU_ARCHS input instead of hardcoding
gfx942-gfx950. Manual dispatches with different arch sets now upload
to the correct prefix.

[Doc] Fix RELEASE_PROCESS.md — CI runs on both push and PRs targeting
release/** branches, not "push only".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants