Skip to content

exp: use allow_fail True for downloads after checking the index contents#3657

Draft
aignas wants to merge 3 commits intobazel-contrib:mainfrom
aignas:aignas.feat.check_first
Draft

exp: use allow_fail True for downloads after checking the index contents#3657
aignas wants to merge 3 commits intobazel-contrib:mainfrom
aignas:aignas.feat.check_first

Conversation

@aignas
Copy link
Collaborator

@aignas aignas commented Mar 10, 2026

Stacked on #3656.

I am leaving this here in case someone is interested and motivated to
collaborate on this (i.e. help with writing code and doing testing with their
setup).

Summary:

  • We first check the index contents and only then we download from the appropriate locations.

Things that I would like to see in the final solution:

  • Confirmation from someone that it works with private indexes.
  • Unit test coverage.
  • Download of index contents in parallel if enabled and if multiple indexes are specified.
  • Using the URLs that we find on the index package to construct the URLs we
    use for each package. That way we can be sure that it works as we see that
    the package names for some of the packages are not the same as the package
    name itself. (you can do this if you output the return value of the index.
  • Cache the result in the in-memory cache.
  • Should we write the facts for this information?

Experiment for #2632
Related to #3260

aignas added 3 commits March 10, 2026 19:27
This makes the logic in the parser a little bit more sophisticated, but
we also start handling the yank reason. This fixes the issue where the
`data-yank` presence but no value would be interpreted as a yanked
package. With this it should start working.

This implementation assumes that we have HTML escaped sequences as tag
values. It also unescapes them when returning the strings.

The posibilities that it gives us are:
- Use the `data-requires-python` to potentially discard any Python
  packages that are unsupported in the `select_whl` function.

Work towards bazel-contrib#260.
Work towards bazel-contrib#2731.
@aignas aignas added help wanted do not merge Tag that prevents merging labels Mar 10, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This experimental pull request introduces a significant change to how Python packages are resolved and downloaded from PyPI. Instead of directly attempting downloads, the system now first queries the PyPI index contents to identify available packages and their precise download URLs. This approach aims to improve reliability, enable cross-compilation by fetching platform-specific wheels, and lay the groundwork for more sophisticated package resolution, especially in multi-index or private index scenarios. The changes also refine the handling of "yanked" packages by allowing a more descriptive status.

Highlights

  • Enhanced Yanked Package Handling: The yanked field in package metadata now supports None (not yanked) or a string (yank reason), providing more granular status than a simple boolean.
  • Optimized HTML Parsing: The parse_simpleapi_html function has been significantly refactored for improved performance and now supports parsing the main PyPI index page to discover package URLs.
  • Pre-download Index Content Check: A new mechanism (_get_index_overrides) was introduced to query PyPI indexes for available packages and their specific URLs before attempting to download, allowing for more intelligent and targeted downloads.
  • Refined Download Failure Handling: The allow_fail=True parameter was removed from ctx.download calls, indicating that download failures are now expected to be handled by the pre-check logic rather than being silently allowed.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/private/pypi/parse_requirements.bzl
    • Updated the yanked field's default value from False to None in _package_srcs and _add_dists functions.
    • Adjusted logic to check for yanked != None instead of yanked or not yanked to correctly interpret the new None state.
  • python/private/pypi/parse_simpleapi_html.bzl
    • Modified parse_simpleapi_html to accept a parse_index boolean parameter, enabling it to parse the main PyPI index page for package URLs.
    • Refactored HTML parsing logic for improved efficiency, replacing content.split with content.find for iteration.
    • Introduced _parse_attrs helper function to efficiently extract HTML attributes.
    • Added _unescape_pypi_html utility to decode HTML entities in attribute values, particularly for data-yanked reasons.
    • Updated the yanked field in returned structs to be str | None instead of bool.
  • python/private/pypi/simpleapi_download.bzl
    • Added a new _get_index_overrides function to pre-fetch package availability and URLs from all configured indexes.
    • Modified the simpleapi_download function to utilize _get_index_overrides to determine the correct index URL for each package.
    • Removed allow_fail = True from ctx.download calls, relying on the pre-check logic to prevent unnecessary download attempts.
    • Updated the docstring to reflect the new pre-check and targeted download strategy.
    • Propagated the parse_index parameter to _read_simpleapi and _read_index_result.
  • tests/pypi/hub_builder/hub_builder_tests.bzl
    • Updated expected yanked values in test structs from False to None to align with the new yanked field type.
  • tests/pypi/parse_requirements/parse_requirements_tests.bzl
    • Updated expected yanked values in test structs from False to None to align with the new yanked field type.
  • tests/pypi/parse_simpleapi_html/parse_simpleapi_html_tests.bzl
    • Added new test cases to verify correct parsing of data-yanked attributes, including empty reasons and reasons with HTML entities.
    • Updated assertions to expect yanked as str | None instead of bool.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the handling of multiple PyPI indexes by introducing a more efficient method for querying index roots and refactoring the HTML parser to support modern features like yanked packages. A security audit found no medium, high, or critical vulnerabilities, confirming adherence to Bazel's security model and absence of injection or data leakage issues. However, a critical error handling placeholder (fail("TODO")) for unavailable indexes still needs to be addressed.

Comment on lines +205 to +206
if not result.success:
fail("TODO")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Calling fail() here will cause the entire dependency resolution to crash if an index URL is unavailable or returns an error. This could be due to transient network issues or a misconfigured URL. It would be more resilient to handle this failure gracefully by logging a warning and skipping the faulty index, allowing the process to continue with the remaining ones.

Suggested change
if not result.success:
fail("TODO")
if not result.success:
# TODO: Add logging to warn the user about the unreachable index.
continue

{obj}`http_file` for docs.
cache: {type}`struct` the `pypi_cache` instance.
get_auth: A function to get auth information. Used in tests.
parse_index: TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation for the parse_index parameter is a 'TODO'. Please provide a clear explanation of its purpose to improve maintainability.

Suggested change
parse_index: TODO
parse_index: {type}`bool` Whether to parse the content as a root index page (e.g. `/simple/`) instead of a package-specific page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Tag that prevents merging help wanted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant