PySDK Version
Describe the bug
Unable to run sagemaker training in "local" mode with docker version v5.
Component: sagemaker.train.local.local_container._LocalContainer._get_compose_cmd_prefix
_get_compose_cmd_prefix() only recognizes Docker Compose v2 by
checking "v2" in output. Docker Compose v5.x (and v3, v4, or any
future major version) is silently rejected, causing::
ImportError: Docker Compose is not installed.
Local Mode features will not work without docker compose.
even though docker compose is fully installed and functional.
The substring check "v2" in output is too narrow. Every Compose
version with a major number other than 2 is treated as "not installed".
To reproduce
A clear, step-by-step set of instructions to reproduce the bug.
Required for Test:
docker pull docker:latest
docker tag image to docker:latest
# Make sure the latest version is > 2
docker run --rm docker:latest docker compose version
# Docker Compose version v5.1.1
The provided code need to be complete and runnable, if additional data is needed, please include them in the issue.
import re
import subprocess
import mock
def test_local_train_triggers_compose_v5_bug(tmp_path):
"""Invoke sagemaker local-mode training to prove Docker Compose v5 triggers the bug.
This test exercises the real call chain:
ModelTrainer.train()
→ _LocalContainer.train()
→ _LocalContainer._generate_compose_command()
→ _LocalContainer._get_compose_cmd_prefix() ← bug lives here
The bug: ``_get_compose_cmd_prefix`` only accepts output containing "v2".
Any Compose v3/v4/v5 output causes an ``ImportError`` even though Docker
Compose is fully installed and functional.
Assumes ``docker:latest`` is already pulled on the local workstation.
"""
import os
import boto3
from moto import mock_aws
from sagemaker.core.local import LocalSession
from sagemaker.core.training.configs import Compute, InputData, OutputDataConfig, SourceCode
from sagemaker.train.model_trainer import ModelTrainer, Mode
#Assumes a local docker:latest image is already pulled, which contains a real compose version string in its output. If this test starts failing due to a compose version update, update this image to one that contains a compose version string that triggers the bug (v3+).
PUBLIC_DOCKER_IMAGE = "docker:latest"
# Step 1: get the real compose version string from the local docker:latest image
result = subprocess.run( # nosec B603 B607
["docker", "run", "--rm", PUBLIC_DOCKER_IMAGE, "docker", "compose", "version"],
capture_output=True,
text=True,
timeout=30, # image assumed already pulled — no pull delay
)
assert result.returncode == 0, (
f"Could not get compose version from {PUBLIC_DOCKER_IMAGE}:\n{result.stderr}"
)
real_version_output = result.stdout # e.g. "Docker Compose version v5.1.1\n"
match = re.search(r"v(\d+)", real_version_output.strip())
assert match is not None, f"Could not parse version from: {real_version_output!r}"
major_version = int(match.group(1))
if major_version == 2:
print(
f"docker:latest ships Compose v{major_version} — bug only triggers on v3+. "
"Update PUBLIC_DOCKER_IMAGE to an image that ships Compose v3+."
)
# Step 2: build a minimal LocalSession backed by moto
with mock_aws():
boto_session = boto3.Session(region_name="us-east-1")
s3 = boto_session.client("s3")
s3.create_bucket(Bucket="sagemaker-bug-repro")
sm_session = LocalSession(
boto_session=boto_session,
default_bucket="sagemaker-bug-repro",
sagemaker_config={
"SchemaVersion": "1.0",
"SageMaker": {"PythonSDK": {"Modules": {"TelemetryOptOut": True}}},
},
)
sm_session.config = {
"local": {"local_code": True, "container_root": str(tmp_path)}
}
sm_session._default_bucket = "sagemaker-bug-repro"
# Step 3: build a ModelTrainer pointing at the public docker:latest image.
# We mock subprocess.check_output so the SDK receives the *real* version
# string from the public image instead of whatever is installed on this host.
trainer = ModelTrainer(
training_image=PUBLIC_DOCKER_IMAGE,
role="arn:aws:iam::123456789012:role/fake-role",
compute=Compute(instance_type="local_cpu", instance_count=1),
sagemaker_session=sm_session,
output_data_config=OutputDataConfig(s3_output_path="s3://sagemaker-bug-repro/output"),
base_job_name="compose-v5-bug",
training_mode=Mode.LOCAL_CONTAINER,
local_container_root=str(tmp_path),
source_code=SourceCode(command="true"), # no-op command
)
# Step 4: patch subprocess so _get_compose_cmd_prefix sees the real
# version string from docker:latest (v5.x), then assert the bug fires.
with (
mock.patch("subprocess.check_output", return_value=real_version_output),
mock.patch("shutil.which", return_value=None),
):
trainer.train(wait=False, logs=False)
Expected behavior
sagemaker train in local mode should not error
Screenshots or logs --- STACK Trace
.tox/py313/lib/python3.13/site-packages/sagemaker/core/telemetry/telemetry_logging.py:187: in wrapper
raise caught_ex
.tox/py313/lib/python3.13/site-packages/sagemaker/core/telemetry/telemetry_logging.py:153: in wrapper
response = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
.tox/py313/lib/python3.13/site-packages/sagemaker/core/workflow/pipeline_context.py:346: in wrapper
return run_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
.tox/py313/lib/python3.13/site-packages/pydantic/_internal/_validate_call.py:39: in wrapper_function
return wrapper(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
.tox/py313/lib/python3.13/site-packages/pydantic/_internal/_validate_call.py:136: in __call__
res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.tox/py313/lib/python3.13/site-packages/sagemaker/train/model_trainer.py:813: in train
local_container.train(wait)
.tox/py313/lib/python3.13/site-packages/sagemaker/train/local/local_container.py:237: in train
compose_command = self._generate_compose_command(wait)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.tox/py313/lib/python3.13/site-packages/sagemaker/train/local/local_container.py:479: in _generate_compose_command
_compose_cmd_prefix = self._get_compose_cmd_prefix()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = _LocalContainer(training_job_name='a-test-20260408132342', instance_type='local_cpu', instance_count=1, image='glm-sta...er_arguments=['-c', 'chmod +x /opt/ml/input/data/sm_drivers/sm_train.sh && /opt/ml/input/data/sm_drivers/sm_train.sh'])
def _get_compose_cmd_prefix(self) -> List[str]:
"""Gets the Docker Compose command.
The method initially looks for 'docker compose' v2
executable, if not found looks for 'docker-compose' executable.
Returns:
List[str]: Docker Compose executable split into list.
Raises:
ImportError: If Docker Compose executable was not found.
"""
compose_cmd_prefix = []
output = None
try:
output = subprocess.check_output(
["docker", "compose", "version"],
stderr=subprocess.DEVNULL,
encoding="UTF-8",
)
except subprocess.CalledProcessError:
logger.info(
"'Docker Compose' is not installed. "
"Proceeding to check for 'docker-compose' CLI."
)
if output and "v2" in output.strip():
logger.info("'Docker Compose' found using Docker CLI.")
compose_cmd_prefix.extend(["docker", "compose"])
return compose_cmd_prefix
if shutil.which("docker-compose") is not None:
logger.info("'Docker Compose' found using Docker Compose CLI.")
compose_cmd_prefix.extend(["docker-compose"])
return compose_cmd_prefix
> raise ImportError(
"Docker Compose is not installed. "
"Local Mode features will not work without docker compose. "
"For more information on how to install 'docker compose', please, see "
"https://docs.docker.com/compose/install/"
)
E ImportError: Docker Compose is not installed. Local Mode features will not work without docker compose. For more information on how to install 'docker compose', please, see https://docs.docker.com/compose/install/
.tox/py313/lib/python3.13/site-packages/sagemaker/train/local/local_container.py:638: ImportError
# ═══════════════════════════════════════════════════════════════════════════
# PROPOSED FIX – standalone validation of the corrected logic
# ═══════════════════════════════════════════════════════════════════════════
def _fixed_compose_check(output: str | None) -> bool:
"""Proposed replacement for the ``"v2" in output`` check.
Accepts any Docker Compose plugin version >= 2.0.0.
Examples
--------
>>> _fixed_compose_check("Docker Compose version v5.1.1")
True
>>> _fixed_compose_check("Docker Compose version v2.27.0")
True
>>> _fixed_compose_check("Docker Compose version v1.29.2")
False
>>> _fixed_compose_check("")
False
>>> _fixed_compose_check(None)
False
"""
if not output:
return False
match = re.search(r"v(\d+)", output.strip())
return match is not None and int(match.group(1)) >= 2
System information
A description of your system. Please provide:
- SageMaker Python SDK version: sagemaker-train (>=1.6.0,<2.0.0) / sagemaker-core (>=2.7.1,<3.0.0)
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): NA
- Framework version: sagemaker-train (>=1.6.0,<2.0.0) / sagemaker-core (>=2.7.1,<3.0.0)
- Python version: 3.13
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
Add any other context about the problem here.
PySDK Version
Describe the bug
Unable to run sagemaker training in "local" mode with docker version v5.
Component:
sagemaker.train.local.local_container._LocalContainer._get_compose_cmd_prefix_get_compose_cmd_prefix() only recognizes Docker Compose v2 by
checking
"v2" in output. Docker Compose v5.x (and v3, v4, or anyfuture major version) is silently rejected, causing::
even though
docker composeis fully installed and functional.The substring check
"v2" in outputis too narrow. Every Composeversion with a major number other than 2 is treated as "not installed".
To reproduce
A clear, step-by-step set of instructions to reproduce the bug.
Required for Test:
docker pull docker:latest
docker tag image to docker:latest
The provided code need to be complete and runnable, if additional data is needed, please include them in the issue.
Expected behavior
sagemaker train in local mode should not error
Screenshots or logs --- STACK Trace
System information
A description of your system. Please provide:
Additional context
Add any other context about the problem here.