exp: use allow_fail True for downloads after checking the index contents by aignas · Pull Request #3657 · bazel-contrib/rules_python

aignas · 2026-03-10T11:16:14Z

Stacked on #3656.

I am leaving this here in case someone is interested and motivated to
collaborate on this (i.e. help with writing code and doing testing with their
setup).

Summary:

We first check the index contents and only then we download from the appropriate locations.

Things that I would like to see in the final solution:

Confirmation from someone that it works with private indexes.
Unit test coverage.
Download of index contents in parallel if enabled and if multiple indexes are specified.
Using the URLs that we find on the index package to construct the URLs we
use for each package. That way we can be sure that it works as we see that
the package names for some of the packages are not the same as the package
name itself. (you can do this if you output the return value of the index.
Cache the result in the in-memory cache.
Should we write the facts for this information?

Experiment for #2632
Related to #3260

This makes the logic in the parser a little bit more sophisticated, but we also start handling the yank reason. This fixes the issue where the `data-yank` presence but no value would be interpreted as a yanked package. With this it should start working. This implementation assumes that we have HTML escaped sequences as tag values. It also unescapes them when returning the strings. The posibilities that it gives us are: - Use the `data-requires-python` to potentially discard any Python packages that are unsupported in the `select_whl` function. Work towards bazel-contrib#260. Work towards bazel-contrib#2731.

gemini-code-assist · 2026-03-10T12:06:57Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This experimental pull request introduces a significant change to how Python packages are resolved and downloaded from PyPI. Instead of directly attempting downloads, the system now first queries the PyPI index contents to identify available packages and their precise download URLs. This approach aims to improve reliability, enable cross-compilation by fetching platform-specific wheels, and lay the groundwork for more sophisticated package resolution, especially in multi-index or private index scenarios. The changes also refine the handling of "yanked" packages by allowing a more descriptive status.

Highlights

Enhanced Yanked Package Handling: The yanked field in package metadata now supports None (not yanked) or a string (yank reason), providing more granular status than a simple boolean.
Optimized HTML Parsing: The parse_simpleapi_html function has been significantly refactored for improved performance and now supports parsing the main PyPI index page to discover package URLs.
Pre-download Index Content Check: A new mechanism (_get_index_overrides) was introduced to query PyPI indexes for available packages and their specific URLs before attempting to download, allowing for more intelligent and targeted downloads.
Refined Download Failure Handling: The allow_fail=True parameter was removed from ctx.download calls, indicating that download failures are now expected to be handled by the pre-check logic rather than being silently allowed.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/private/pypi/parse_requirements.bzl
- Updated the yanked field's default value from False to None in _package_srcs and _add_dists functions.
- Adjusted logic to check for yanked != None instead of yanked or not yanked to correctly interpret the new None state.
python/private/pypi/parse_simpleapi_html.bzl
- Modified parse_simpleapi_html to accept a parse_index boolean parameter, enabling it to parse the main PyPI index page for package URLs.
- Refactored HTML parsing logic for improved efficiency, replacing content.split with content.find for iteration.
- Introduced _parse_attrs helper function to efficiently extract HTML attributes.
- Added _unescape_pypi_html utility to decode HTML entities in attribute values, particularly for data-yanked reasons.
- Updated the yanked field in returned structs to be str | None instead of bool.
python/private/pypi/simpleapi_download.bzl
- Added a new _get_index_overrides function to pre-fetch package availability and URLs from all configured indexes.
- Modified the simpleapi_download function to utilize _get_index_overrides to determine the correct index URL for each package.
- Removed allow_fail = True from ctx.download calls, relying on the pre-check logic to prevent unnecessary download attempts.
- Updated the docstring to reflect the new pre-check and targeted download strategy.
- Propagated the parse_index parameter to _read_simpleapi and _read_index_result.
tests/pypi/hub_builder/hub_builder_tests.bzl
- Updated expected yanked values in test structs from False to None to align with the new yanked field type.
tests/pypi/parse_requirements/parse_requirements_tests.bzl
- Updated expected yanked values in test structs from False to None to align with the new yanked field type.
tests/pypi/parse_simpleapi_html/parse_simpleapi_html_tests.bzl
- Added new test cases to verify correct parsing of data-yanked attributes, including empty reasons and reasons with HTML entities.
- Updated assertions to expect yanked as str | None instead of bool.

Activity

This is an experimental pull request, stacked on fix(pypi): return yank reason from SimpleAPI HTML #3656.
The author is actively seeking collaboration and feedback, specifically on testing with private indexes, unit test coverage, parallel downloads for index contents, using index-provided URLs for package construction, and caching results.
The changes are related to existing issues pypi downloads blocked by experimental_downloader_config fail silently #2632 and SimpleAPI download failures are cached and require bazel clean #3260.
No reviewer comments or approvals have been recorded yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly improves the handling of multiple PyPI indexes by introducing a more efficient method for querying index roots and refactoring the HTML parser to support modern features like yanked packages. A security audit found no medium, high, or critical vulnerabilities, confirming adherence to Bazel's security model and absence of injection or data leakage issues. However, a critical error handling placeholder (fail("TODO")) for unavailable indexes still needs to be addressed.

gemini-code-assist · 2026-03-10T12:18:56Z

python/private/pypi/simpleapi_download.bzl

+        if not result.success:
+            fail("TODO")


Calling fail() here will cause the entire dependency resolution to crash if an index URL is unavailable or returns an error. This could be due to transient network issues or a misconfigured URL. It would be more resilient to handle this failure gracefully by logging a warning and skipping the faulty index, allowing the process to continue with the remaining ones.

Suggested change

if not result.success:

fail("TODO")

if not result.success:

# TODO: Add logging to warn the user about the unreachable index.

continue

gemini-code-assist · 2026-03-10T12:18:56Z

python/private/pypi/simpleapi_download.bzl

               {obj}`http_file` for docs.
        cache: {type}`struct` the `pypi_cache` instance.
        get_auth: A function to get auth information. Used in tests.
+        parse_index: TODO


The documentation for the parse_index parameter is a 'TODO'. Please provide a clear explanation of its purpose to improve maintainability.

Suggested change

parse_index: TODO

parse_index: {type}`bool` Whether to parse the content as a root index page (e.g. `/simple/`) instead of a package-specific page.

aignas added 3 commits March 10, 2026 19:27

wip

9dd238c

wip

60b4cd6

aignas added help wanted do not merge Tag that prevents merging labels Mar 10, 2026

gemini-code-assist bot reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

exp: use allow_fail True for downloads after checking the index contents#3657

exp: use allow_fail True for downloads after checking the index contents#3657
aignas wants to merge 3 commits intobazel-contrib:mainfrom
aignas:aignas.feat.check_first

aignas commented Mar 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	parse_index: TODO
	parse_index: {type}`bool` Whether to parse the content as a root index page (e.g. `/simple/`) instead of a package-specific page.

Uh oh!

Conversation

aignas commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aignas commented Mar 10, 2026 •

edited

Loading