Add baremetal testing backend with Python venv isolation by Copilot · Pull Request #332 · ROCm/iris

Copilot · 2026-01-25T01:40:29Z

Run on baremetal and use venvs and avoid Apptainer since CI is talking 3 hours

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

…st workflows Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot

Pull request overview

This PR introduces a baremetal testing backend using Python virtual environments to replace Apptainer/Docker containerization in CI workflows, addressing 3-hour CI execution times. The change modifies the container abstraction layer to support a third runtime option alongside Docker and Apptainer.

Changes:

New baremetal backend with Python venv isolation (build.sh, run.sh scripts)
Universal container scripts updated to support baremetal runtime detection and execution
CI workflows configured to use baremetal as default CONTAINER_RUNTIME
Added .gitignore entry for baremetal venv directory

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
baremetal/run.sh	New script to activate and run commands in baremetal venv
baremetal/build.sh	New script to create Python venv with minimal dependencies
.gitignore	Excludes baremetal/venv/ directory from version control
.github/workflows/iris-tests.yml	Sets CONTAINER_RUNTIME=baremetal for test workflow
.github/workflows/iris-performance-regression-test.yml	Sets CONTAINER_RUNTIME=baremetal for performance tests
.github/workflows/iris-external-validation-test.yml	Sets CONTAINER_RUNTIME=baremetal for external validation
.github/scripts/container_run.sh	Adds baremetal runtime detection and execution path
.github/scripts/container_exec.sh	Implements baremetal command execution with venv activation
.github/scripts/container_build.sh	Adds baremetal build path calling baremetal/build.sh

Copilot · 2026-01-25T03:54:00Z

+# Create virtual environment if it doesn't exist
+if [ ! -d "${VENV_DIR}" ]; then
+    echo "[INFO] Creating new Python virtual environment..."
+    python3 -m venv --system-site-packages "${VENV_DIR}"


Using '--system-site-packages' flag assumes that all necessary packages (PyTorch, Triton, ROCm tools, etc.) are already installed system-wide on the baremetal machine. This assumption may not hold true in all CI environments or local setups. If the system site packages don't include the required dependencies, the virtual environment will not have access to them, causing test failures.

Consider either:

Removing the '--system-site-packages' flag and installing all dependencies explicitly in the venv

Adding a validation check to ensure required system packages are available

Documenting this requirement clearly in comments

Suggested change

python3 -m venv --system-site-packages "${VENV_DIR}"

python3 -m venv "${VENV_DIR}"

Copilot · 2026-01-25T03:54:00Z

+#
+# Note: This script is designed for Unix-like systems (Linux, macOS).
+# Windows users should use WSL or container-based backends.
+export PATH="/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"


Setting the PATH variable at the start of the script will be lost after the script exits. This PATH modification only affects the run.sh script execution context and won't be available to the activated virtual environment or the commands executed within it.

If the intent is to ensure /opt/rocm/bin is available during command execution, this should be exported within the EXEC_CMD in container_exec.sh (line 152-160) or documented that users need ROCm in their system PATH already.

Suggested change

export PATH="/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

if [ -d "/opt/rocm/bin" ]; then

export PATH="/opt/rocm/bin:${PATH}"

fi

Copilot · 2026-01-25T03:54:00Z

+    if [ -n "$GPU_DEVICES" ]; then
+        EXEC_CMD="$EXEC_CMD && export HIP_VISIBLE_DEVICES=${GPU_DEVICES}"
+    fi
+


The baremetal implementation sources the venv activation script but doesn't export the PATH variable set in baremetal/run.sh (line 7). This means that if ROCm tools (/opt/rocm/bin) are needed during test execution, they won't be available unless they're already in the system PATH.

Consider adding the ROCm PATH to the EXEC_CMD similar to how HIP_VISIBLE_DEVICES is set, for example:
EXEC_CMD="$EXEC_CMD && export PATH=/opt/rocm/bin:$PATH"

Suggested change

# Ensure ROCm tools are available on PATH (mirror baremetal/run.sh behavior)

EXEC_CMD="$EXEC_CMD && export PATH=/opt/rocm/bin:\$PATH"

Copilot · 2026-01-25T03:54:00Z

+
+echo "[INFO] Baremetal environment setup completed successfully"
+echo "[INFO] Virtual environment location: ${VENV_DIR}"
+echo "[INFO] To activate manually: source ${VENV_DIR}/bin/activate"


The baremetal backend is now added as a new installation and testing method, but there is no documentation explaining how to use it or what its requirements are. Users and developers need to understand:

When to use baremetal vs Docker/Apptainer

System prerequisites (ROCm must be installed, PyTorch, Triton, etc.)

How to manually run baremetal tests

Limitations compared to containerized approaches

Consider adding a documentation section or updating the installation documentation to cover the baremetal backend option.

Suggested change

echo "[INFO] To activate manually: source ${VENV_DIR}/bin/activate"

echo

echo "======================================================================="

echo " Iris baremetal backend usage"

echo "======================================================================="

echo

echo "When to use baremetal vs Docker/Apptainer:"

echo " - Use *baremetal* when you already have ROCm/HIP and AMD GPU drivers"

echo " installed on the host, and you want direct access to the system"

echo " environment without container isolation."

echo " - Use *Docker/Apptainer* when you prefer a fully reproducible,"

echo " containerized environment or do not control the host setup."

echo

echo "System prerequisites for baremetal:"

echo " - ROCm/HIP toolkit installed and working on the host"

echo " - Supported AMD GPU with ROCm drivers (e.g., MI300X/MI350X/MI355X)"

echo " - Python 3 with venv support"

echo " - PyTorch and Triton (installed after activating this venv)"

echo

echo "To activate this environment manually:"

echo " source \"${VENV_DIR}/bin/activate\""

echo

echo "After activation, install Iris and dev dependencies from the repo root:"

echo " cd \"${SCRIPT_DIR}/..\""

echo " pip install -e '.[dev]'"

echo

echo "To run baremetal tests (from the repo root after activation):"

echo " pytest tests/unittests/"

echo " pytest tests/examples/"

echo

echo "Limitations compared to containerized approaches:"

echo " - No environment isolation: host ROCm/PyTorch/Triton versions are used."

echo " - You are responsible for installing and maintaining ROCm/HIP and GPU"

echo " drivers on the host."

echo " - Behavior may differ across systems if host dependencies are not"

echo " strictly controlled."

echo "======================================================================="

Copilot · 2026-01-25T03:54:01Z

+# Note: Using latest versions for simplicity. For reproducible builds,
+# consider creating a requirements.txt with pinned versions.
+echo "[INFO] Installing base dependencies..."
+pip install wheel jupyter


The comment on lines 29-30 mentions "Using latest versions for simplicity" and suggests creating a requirements.txt for reproducible builds. However, this approach conflicts with the goal of having reliable CI testing. Installing 'latest' versions of wheel and jupyter can introduce version incompatibilities and make test results non-reproducible across runs.

For CI reliability, consider:

Pinning specific versions of wheel and jupyter that are known to work

Creating a requirements.txt file even for the baremetal backend

Or removing this installation entirely if these packages are provided by system-site-packages

Suggested change

# Note: Using latest versions for simplicity. For reproducible builds,

# consider creating a requirements.txt with pinned versions.

echo "[INFO] Installing base dependencies..."

pip install wheel jupyter

# Note: Using pinned versions for reproducible builds and CI reliability.

# If you update these, ensure CI passes and consider mirroring in requirements files.

echo "[INFO] Installing base dependencies..."

pip install wheel==0.43.0 jupyter==1.0.0

Copilot · 2026-01-25T03:54:01Z

+VENV_DIR="${SCRIPT_DIR}/venv"
+


The script uses a fixed venv directory name 'venv' without any isolation between different users, branches, or test runs. In a CI environment with parallel jobs or multiple developers on a shared system, this could lead to:

Race conditions if multiple jobs try to build/use the venv simultaneously

Conflicts between different Git branches that may need different dependencies

Stale dependencies if the venv isn't cleaned between runs

Consider using a more unique venv path that includes:

A timestamp or unique identifier

The current Git branch or commit hash

User identification for shared systems

Or add documentation about when the venv should be manually cleaned/rebuilt.

Suggested change

VENV_DIR="${SCRIPT_DIR}/venv"

# Derive a unique venv directory to avoid cross-branch/user/test interference

USER_NAME=$(id -un 2>/dev/null || whoami 2>/dev/null || echo "unknown")

GIT_BRANCH_RAW=$(git -C "${SCRIPT_DIR}" rev-parse --abbrev-ref HEAD 2>/dev/null || echo "no-branch")

GIT_COMMIT=$(git -C "${SCRIPT_DIR}" rev-parse --short HEAD 2>/dev/null || echo "no-commit")

# Sanitize branch name for filesystem use (replace '/' with '_')

SANITIZED_BRANCH=${GIT_BRANCH_RAW//\//_}

# Prefer CI-provided job/run IDs if available, otherwise fall back to shell PID

JOB_ID=${CI_JOB_ID:-${GITHUB_RUN_ID:-$$}}

VENV_DIR="${SCRIPT_DIR}/venv_${USER_NAME}_${SANITIZED_BRANCH}_${GIT_COMMIT}_${JOB_ID}"

Copilot · 2026-01-25T03:54:02Z

+# Note: Using latest versions for simplicity. For reproducible builds,
+# consider creating a requirements.txt with pinned versions.
+echo "[INFO] Installing base dependencies..."
+pip install wheel jupyter


The baremetal environment only installs minimal dependencies (wheel, jupyter) and does not install critical dependencies required for Iris to function. Comparing with the Docker environment, this is missing:

Triton (the core dependency for Iris framework)

PyTorch/torch (required for tensor operations)

Build tools (ninja-build, cmake) needed for C++/HIP compilation

Other Python packages from pyproject.toml dependencies (numpy, requests, ruff)

This will cause tests to fail when trying to run Iris code. The baremetal build should either install Iris with its dependencies using 'pip install -e .[dev]' or install the required base dependencies that match the Docker/Apptainer environments.

Copilot · 2026-01-25T03:54:02Z

+# Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved.
+#
+# Note: This script is designed for Unix-like systems (Linux, macOS).
+# Windows users should use WSL or container-based backends.


The build.sh script is missing the 'set -e' directive to fail on errors, unlike container_build.sh and container_exec.sh which include it. Without 'set -e', if any command in the script fails (e.g., python3 not found, pip install fails), the script will continue executing and may report success even though the setup is incomplete.

Add 'set -e' after the shebang and before the main script logic, similar to the pattern used in other scripts in this PR.

Suggested change

# Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved.

#

# Note: This script is designed for Unix-like systems (Linux, macOS).

# Windows users should use WSL or container-based backends.

Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved.

#

# Note: This script is designed for Unix-like systems (Linux, macOS).

# Windows users should use WSL or container-based backends.

set -e

Copilot · 2026-01-25T03:54:02Z

+set -e
+
+SCRIPT_DIR=$(dirname "$(realpath "$0")")
+VENV_DIR="${SCRIPT_DIR}/venv"
+
+echo "[INFO] Setting up baremetal Python venv at ${VENV_DIR}"


When baremetal/build.sh is called from container_build.sh, any errors in build.sh will be caught by the 'set -e' in container_build.sh. However, build.sh itself doesn't have 'set -e', which means individual command failures within build.sh (like pip install failing) won't propagate correctly. While container_build.sh will catch a complete script failure, partial failures within build.sh could go unnoticed.

Add 'set -e' at the beginning of baremetal/build.sh to ensure proper error propagation.

Copilot · 2026-01-25T03:54:02Z

+if [ ! -d "${VENV_DIR}" ]; then
+    echo "[INFO] Creating new Python virtual environment..."
+    python3 -m venv --system-site-packages "${VENV_DIR}"
+    echo "[INFO] Virtual environment created successfully"
+else
+    echo "[INFO] Using existing virtual environment at ${VENV_DIR}"
+fi


The baremetal backend creates a venv with '--system-site-packages' and only installs 'wheel' and 'jupyter', but relies on the system having PyTorch, Triton, and other dependencies already installed. This creates an implicit dependency that isn't validated.

If the system doesn't have the required packages, tests will fail with import errors rather than clear setup errors. Consider adding a validation step that checks for required packages:

python3 -c "import torch; import triton" || { echo "[ERROR] Required system packages not found. Please ensure PyTorch and Triton are installed system-wide." exit 1 }

This would provide clearer error messages and fail fast during setup rather than during test execution.

Initial plan

e3e1630

Copilot AI assigned Copilot and mawad-amd Jan 25, 2026

Copilot started work on behalf of mawad-amd January 25, 2026 01:40 View session

Copilot AI and others added 5 commits January 25, 2026 01:43

Add baremetal testing backend with venv support

2b677c6

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Fix baremetal exec to use absolute venv path and current directory

b3b703f

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Add baremetal/venv to .gitignore and remove from git

b5b0bf7

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Add documentation for baremetal testing backend

81cd907

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Address code review feedback: fix whitespace, add comments

996b700

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add baremetal testing backend using python venv~~ Add baremetal testing backend with Python venv isolation Jan 25, 2026

Copilot AI requested a review from mawad-amd January 25, 2026 01:48

Copilot finished work on behalf of mawad-amd January 25, 2026 01:48

mawad-amd reviewed Jan 25, 2026

View reviewed changes

Comment thread .github/scripts/container_build.sh Outdated

Copilot started work on behalf of mawad-amd January 25, 2026 01:54 View session

Add CONTAINER_RUNTIME env var to override auto-detection

925ceff

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI requested a review from mawad-amd January 25, 2026 01:58

Copilot finished work on behalf of mawad-amd January 25, 2026 01:58

mawad-amd reviewed Jan 25, 2026

View reviewed changes

Comment thread baremetal/README.md Outdated

Copilot started work on behalf of mawad-amd January 25, 2026 02:02 View session

Remove baremetal README and set CONTAINER_RUNTIME=baremetal in all te…

41d39dd

…st workflows Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot finished work on behalf of mawad-amd January 25, 2026 02:06

Copilot AI requested a review from mawad-amd January 25, 2026 02:06

mawad-amd reviewed Jan 25, 2026

View reviewed changes

Comment thread .github/scripts/container_build.sh

Copilot started work on behalf of mawad-amd January 25, 2026 02:49 View session

Copilot AI and others added 2 commits January 25, 2026 02:51

Auto-build venv if missing to handle CI cleanup between steps

44125d8

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Add error handling to auto-build venv logic

ca66736

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI requested a review from mawad-amd January 25, 2026 02:54

Copilot finished work on behalf of mawad-amd January 25, 2026 02:54

Copilot started work on behalf of mawad-amd January 25, 2026 02:58 View session

Add --system-site-packages flag to venv creation

4ea8296

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot finished work on behalf of mawad-amd January 25, 2026 03:01

Copilot started work on behalf of mawad-amd January 25, 2026 03:08 View session

Copilot finished work on behalf of mawad-amd January 25, 2026 03:10

mawad-amd reviewed Jan 25, 2026

View reviewed changes

Comment thread baremetal/run.sh Outdated

Apply suggestion from @mawad-amd

80aaeac

mawad-amd reviewed Jan 25, 2026

View reviewed changes

Comment thread baremetal/run.sh Outdated

Apply suggestion from @mawad-amd

0ab0ce8

mawad-amd reviewed Jan 25, 2026

View reviewed changes

Comment thread baremetal/run.sh Outdated

Apply suggestion from @mawad-amd

051f98c

mawad-amd marked this pull request as ready for review January 25, 2026 03:44

mawad-amd requested a review from neoblizz as a code owner January 25, 2026 03:44

Copilot AI review requested due to automatic review settings January 25, 2026 03:44

mawad-amd requested a review from BKP as a code owner January 25, 2026 03:44

Copilot started reviewing on behalf of mawad-amd January 25, 2026 03:45 View session

Copilot AI reviewed Jan 25, 2026

View reviewed changes

mawad-amd closed this Jan 28, 2026

	python3 -m venv --system-site-packages "${VENV_DIR}"
	python3 -m venv "${VENV_DIR}"

-export PATH="/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
+if [ -d "/opt/rocm/bin" ]; then
+    export PATH="/opt/rocm/bin:${PATH}"
+fi



	# Ensure ROCm tools are available on PATH (mirror baremetal/run.sh behavior)
	EXEC_CMD="$EXEC_CMD && export PATH=/opt/rocm/bin:\$PATH"

-echo "[INFO] To activate manually: source ${VENV_DIR}/bin/activate"
+echo
+echo "======================================================================="
+echo " Iris baremetal backend usage"
+echo "======================================================================="
+echo
+echo "When to use baremetal vs Docker/Apptainer:"
+echo "  - Use *baremetal* when you already have ROCm/HIP and AMD GPU drivers"
+echo "    installed on the host, and you want direct access to the system"
+echo "    environment without container isolation."
+echo "  - Use *Docker/Apptainer* when you prefer a fully reproducible,"
+echo "    containerized environment or do not control the host setup."
+echo
+echo "System prerequisites for baremetal:"
+echo "  - ROCm/HIP toolkit installed and working on the host"
+echo "  - Supported AMD GPU with ROCm drivers (e.g., MI300X/MI350X/MI355X)"
+echo "  - Python 3 with venv support"
+echo "  - PyTorch and Triton (installed after activating this venv)"
+echo
+echo "To activate this environment manually:"
+echo "  source \"${VENV_DIR}/bin/activate\""
+echo
+echo "After activation, install Iris and dev dependencies from the repo root:"
+echo "  cd \"${SCRIPT_DIR}/..\""
+echo "  pip install -e '.[dev]'"
+echo
+echo "To run baremetal tests (from the repo root after activation):"
+echo "  pytest tests/unittests/"
+echo "  pytest tests/examples/"
+echo
+echo "Limitations compared to containerized approaches:"
+echo "  - No environment isolation: host ROCm/PyTorch/Triton versions are used."
+echo "  - You are responsible for installing and maintaining ROCm/HIP and GPU"
+echo "    drivers on the host."
+echo "  - Behavior may differ across systems if host dependencies are not"
+echo "    strictly controlled."
+echo "======================================================================="

-VENV_DIR="${SCRIPT_DIR}/venv"
+# Derive a unique venv directory to avoid cross-branch/user/test interference
+USER_NAME=$(id -un 2>/dev/null || whoami 2>/dev/null || echo "unknown")
+GIT_BRANCH_RAW=$(git -C "${SCRIPT_DIR}" rev-parse --abbrev-ref HEAD 2>/dev/null || echo "no-branch")
+GIT_COMMIT=$(git -C "${SCRIPT_DIR}" rev-parse --short HEAD 2>/dev/null || echo "no-commit")
+# Sanitize branch name for filesystem use (replace '/' with '_')
+SANITIZED_BRANCH=${GIT_BRANCH_RAW//\//_}
+# Prefer CI-provided job/run IDs if available, otherwise fall back to shell PID
+JOB_ID=${CI_JOB_ID:-${GITHUB_RUN_ID:-$$}}
+VENV_DIR="${SCRIPT_DIR}/venv_${USER_NAME}_${SANITIZED_BRANCH}_${GIT_COMMIT}_${JOB_ID}"

Conversation

Copilot AI commented Jan 25, 2026 • edited by mawad-amd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jan 25, 2026 •

edited by mawad-amd

Loading