Path traversal protection in unzip to temp file#45691
Path traversal protection in unzip to temp file#45691ayushhgarg-work wants to merge 2 commits intoAzure:mainfrom
Conversation
|
@ayushhgarg-work please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Pull request overview
This PR hardens azure-ai-ml local job execution utilities against archive extraction vulnerabilities (ZipSlip/TarSlip) by validating ZIP members before extraction and adding a safer TAR extraction helper, along with new unit tests for the security behavior.
Changes:
- Add ZIP member validation to block path traversal before
extractall()inunzip_to_temporary_file. - Introduce
_safe_tar_extractalland use it when extracting the bootstrapper archive from a Docker container. - Add unit tests covering ZIP/TAR path traversal and link-handling scenarios.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| sdk/ml/azure-ai-ml/azure/ai/ml/operations/_local_job_invoker.py | Adds ZIP path validation and a safe TAR extraction helper; switches bootstrapper extraction to use the safer path. |
| sdk/ml/azure-ai-ml/tests/job_common/unittests/test_local_job_invoker.py | Adds unit tests for ZIP/TAR extraction safety and related rejection cases. |
You can also share your feedback on Copilot code review. Take the survey.
| from azure.ai.ml.operations._local_job_invoker import ( | ||
| _get_creationflags_and_startupinfo_for_background_process, | ||
| _safe_tar_extractall, | ||
| patch_invocation_script_serialization, | ||
| unzip_to_temporary_file, | ||
| ) | ||
|
|
||
| import zipfile |
| with tarfile.open(fileobj=buf, mode="r") as tar: | ||
| with pytest.raises((ValueError, Exception)): | ||
| _safe_tar_extractall(tar, dest) |
| with tarfile.open(fileobj=buf, mode="r") as tar: | ||
| with pytest.raises((ValueError, Exception)): | ||
| _safe_tar_extractall(tar, dest) |
| with tarfile.open(fileobj=buf, mode="r") as tar: | ||
| with pytest.raises((ValueError, Exception)): | ||
| _safe_tar_extractall(tar, dest) No newline at end of file |
|
|
||
| # Python 3.12+ has built-in data_filter for safe extraction | ||
| if hasattr(tarfile, "data_filter"): | ||
| tar.extractall(resolved_dest, filter="data") |
| for member in tar.getmembers(): | ||
| if member.issym() or member.islnk(): | ||
| raise ValueError( | ||
| f"Tar archive contains a symbolic or hard link and cannot be extracted safely: {member.name}" | ||
| ) | ||
| member_path = os.path.realpath(os.path.join(resolved_dest, member.name)) | ||
| if member_path != resolved_dest and not member_path.startswith(resolved_dest + os.sep): | ||
| raise ValueError( | ||
| f"Tar archive contains a path traversal entry and cannot be extracted safely: {member.name}" | ||
| ) | ||
| # All members validated; safe to extract | ||
| tar.extractall(resolved_dest) |
| job_def = _make_job_definition("safe-run") | ||
| result = unzip_to_temporary_file(job_def, zip_bytes) | ||
|
|
||
| assert result.exists() | ||
| assert (result / "azureml-setup" / "invocation.sh").exists() | ||
| assert (result / "azureml-setup" / "config.json").exists() |
Fixes MSRC [110063] — two path traversal vulnerabilities in _local_job_invoker.py.
Fix 1: unzip_to_temporary_file now validates all ZIP member paths resolve within the target directory before extraction.
Fix 2: New _safe_tar_extractall helper replaces raw tarfile.extract() in copy_bootstrapper_from_container. Uses filter='data' on Python 3.12+; on older versions, manually blocks path traversal, symlinks, and hard links.
7 security regression tests added. All 10 tests passing, 0 regressions.