Skip to content

Backport six-CVE security review (release 20.3.4+security.1)#1

Merged
icanhasmath merged 6 commits into
20.3.4.xfrom
20.3.4-security
Jun 2, 2026
Merged

Backport six-CVE security review (release 20.3.4+security.1)#1
icanhasmath merged 6 commits into
20.3.4.xfrom
20.3.4-security

Conversation

@icanhasmath
Copy link
Copy Markdown

Summary

Backports security fixes to the Python 2.7-compatible pip 20.3.4 line and cuts release 20.3.4+security.1. Five of the six reported CVEs apply to 20.3.4 and are fixed here (one commit each); the sixth was assessed as not applicable (rationale below).

All fixes were verified against the actual 20.3.4 source and adapted for Python 2.7 (no f-strings, no os.path.commonpath, comment-style type hints).

Fixed (one commit per CVE)

CVE GHSA Area Fix
CVE-2021-3572 GHSA-5xp3-jfq3-5q8x vcs/git.py Parse git show-ref with split("\n")/single-space split instead of splitlines(), so unicode separators in a ref name can't spoof the resolved revision. (upstream PR pypa#9827)
CVE-2023-5752 GHSA-mq26-g339-26xf vcs/mercurial.py Pass the hg revision as a single -r=<rev> token so it can't be parsed as an option (e.g. --config). (upstream PR pypa#12306)
CVE-2025-8869 GHSA-4xh5-x5gv-qwph utils/unpacking.py Validate tar symlink targets are members of the same archive before extraction (no-data_filter fallback path). (upstream 2490eb2..b154d06)
CVE-2026-3219 GHSA-58qw-9mgm-455v utils/unpacking.py Disambiguate archive detection in unpack_file (content-type → extension → unambiguous magic); reject tar+zip polyglots. (upstream PR pypa#13870)
CVE-2026-1703 GHSA-6vgw-5pg2-w6jp utils/unpacking.py Enforce a path-component boundary in is_within_directory instead of a character prefix. Uses an explicit boundary check rather than os.path.commonpath (absent on Python 2.7). (upstream 4c651b7)

Not applicable

CVE-2026-6357 / GHSA-jp4c-xjxw-mgf9 (self-version-check imports well-known modules after wheel install). In 20.3.4 the self-version-check and all its dependencies are imported eagerly at startup, and the post-install check touches only vendored (pip._vendor.*) modules. A newly-installed wheel has no well-known top-level module name to shadow, so the import-shadowing primitive the CVE depends on does not exist. Documented in NEWS.

Testing

Validated with Python 2.7.18: targeted unit suites tests/unit/test_utils_unpacking.py + tests/unit/test_vcs.py run with the repo src/ on PYTHONPATH118 passed, 2 skipped (the 2 errors under --noconftest are the pre-existing data-fixture tests, unrelated). New regression tests added for each fix. Edited sources compile clean on both Python 2.7 and 3.

🤖 Generated with Claude Code

icanhasmath and others added 6 commits June 2, 2026 00:05
`Git.get_revision_sha` parsed `git show-ref` output with `str.splitlines()`,
which splits on unicode line separators (U+2028, U+0085, etc.) that git
permits inside a tag name. A crafted ref name could inject a spoofed
"<sha> refs/tags/<rev>" line, causing pip to resolve a revision to an
attacker-chosen commit and install a different revision than intended.

Parse with `split("\n")` (stripping a trailing "\r", skipping blank lines)
and split each line only on the ASCII space, mirroring upstream PR pypa#9827
(pip 21.1, commit ca832b2). Uses a positional maxsplit for Python 2.7
compatibility instead of upstream's `maxsplit=` keyword.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`Mercurial.get_base_rev_args` returned the bare revision as its own argv
element, so a revision such as `--config=alias.update=!<cmd>` was passed to
`hg` as a standalone token and parsed as a Mercurial command-line option,
enabling arbitrary configuration injection and code execution when installing
from an hg URL.

Return a single `-r=<rev>` token so hg always treats the whole value as the
revision. Mirrors upstream PR pypa#12306 (pip 23.3, commit 389cb79); uses
`str.format` instead of an f-string for Python 2.7 compatibility.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tory

`is_within_directory` used `os.path.commonprefix`, which compares paths
character-by-character. It therefore treated a sibling like
`/dest/parentfoo` as being within `/dest/parent`, so a crafted archive could
write files into a sibling directory whose name is a string prefix of the
install path.

Upstream (commit 4c651b7) switched to `os.path.commonpath`, but that is
unavailable on Python 2.7 (which pip 20.3.4 still supports). Instead, compare
on explicit path-component boundaries: the target must equal the directory or
start with it followed by a path separator. Adds the prefix-substring
regression case to test_is_within_directory.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n fallback

pip's `untar_file` extracted symlink members without checking where they
point. A malicious sdist could ship a symlink whose target is an arbitrary
host path; pip would then follow that link during subsequent writes, creating
or overwriting files outside the install directory. Python's PEP 706
`tarfile.data_filter` protects newer interpreters, but pip 20.3.4 supports
Python 2.7 and older 3.x where no such filter exists, so this fallback path is
the only line of defence.

Add `is_symlink_target_in_tar`, which confirms a symlink member's target is
itself a member of the same archive, and call it from `untar_file` before
extracting any symlink, raising InstallationError otherwise. Adapted from the
upstream series (commits 2490eb2..b154d06); helper applied directly to the
single untar_file path that 20.3.4 has.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ack_file

`unpack_file` ran `zipfile.is_zipfile()` for any archive whose name/content-
type pip did not already recognise. Because `is_zipfile` scans for a ZIP
end-of-central-directory record anywhere in the file, a tar+ZIP polyglot
(a ZIP appended to an otherwise valid `.tar.gz`) was always handled as a ZIP
regardless of its filename, letting an attacker control which set of files
pip actually extracts.

Choose the format by decreasing reliability -- content-type, then filename
extension, then magic signature -- and only trust the magic signature when it
is unambiguous. A file that is simultaneously a valid zip and a valid tar is
now rejected with InstallationError. Mirrors upstream PR pypa#13870 (pip 26.1);
written with `.format`/nested defs for Python 2.7 compatibility.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump version 20.3.4 -> 20.3.4+security.1 and add a NEWS section covering the
five backported CVE fixes (CVE-2021-3572, CVE-2023-5752, CVE-2025-8869,
CVE-2026-3219, CVE-2026-1703). CVE-2026-6357 was assessed as not applicable to
20.3.4 and is documented as such in NEWS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@icanhasmath icanhasmath requested a review from martinPavesio June 2, 2026 05:22
Copy link
Copy Markdown

@martinPavesio martinPavesio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@icanhasmath icanhasmath merged commit 626cd74 into 20.3.4.x Jun 2, 2026
1 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants