feat(graph): add suggest-collection command for collection optimization by smoparth · Pull Request #1030 · python-wheel-build/fromager

smoparth · 2026-04-06T19:26:09Z

Pull Request Description

What

Add fromager graph suggest-collection that analyzes dependency overlap between onboarding packages and existing collections to recommend the best-fit collection for each package. Helps pipeline maintainers make data-driven decisions when assigning new packages to permanent collections.

Iterative DFS traversal over all edge types for full dependency closure
Ranks collections by fewest new packages, then highest coverage
Rich table (default) and JSON output formats
Error handling for bad graph files and duplicate collection names
Tests covering helpers, CLI, and analysis logic

Why

Closes: #971

PR follows CONTRIBUTING.md guidelines

coderabbitai · 2026-04-06T19:26:23Z

Warning

Rate limit exceeded

@smoparth has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 33 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0b5bb768-28c0-44d9-941c-9dcd8a01e867

📥 Commits

Reviewing files that changed from the base of the PR and between 15236a5 and 6e989dc.

📒 Files selected for processing (3)

src/fromager/commands/graph.py
src/fromager/dependency_graph.py
tests/test_suggest_collection.py

📝 Walkthrough

Walkthrough

Adds a new CLI command fromager graph find-best-fit with --format (table/json) that loads an onboarding DependencyGraph, selects TOP_LEVEL child nodes, computes each top-level package’s transitive dependency closure, and compares closures against one or more collection graphs. Collection names are derived from graph file stems; collection package sets are canonicalized by name. For each onboarding package the code computes existing/new package counts and coverage percentage, ranks collections (fewest new packages, then highest coverage, then name), and outputs either JSON or a Rich table. Supports empty onboarding handling and deterministic ordering; helper functions and unit/CLI tests added.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fromager/commands/graph.py`:
- Around line 1031-1033: The current check in the onboarding output path always
prints a human message when toplevel_nodes is empty; update the block that
references toplevel_nodes so that when output_format == "json" it emits a valid
JSON empty list (e.g., using click.echo("[]")) and returns, otherwise keep the
human-friendly message "No top-level packages found in onboarding graph." This
change touches the conditional that checks toplevel_nodes in the function
handling the onboarding graph output and should reference the output_format
variable to decide which output to produce.
- Around line 916-918: The current sort used for selecting best_score
(scores.sort(...); best_score = scores[0]) only considers new_packages and
coverage_percentage, which leaves true ties to fall back to the original
collection_graphs order; make tie-breaking deterministic by adding a stable
filename-derived key as a tertiary sort field (e.g., change the sort key to
(score.new_packages, -score.coverage_percentage, score.filename) or, if Score
objects don’t carry filenames, derive the filename from the corresponding
collection_graphs entry when building scores) so best_score / best_fit is chosen
predictably by filename when the first two metrics tie.

In `@tests/test_suggest_collection.py`:
- Line 5: Replace the fragile regex-based JSON extraction in
tests/test_suggest_collection.py with direct JSON parsing of the entire CLI
output: remove the re import and any use of re.search that extracts a bracketed
block, and instead call json.loads(...) on the full output string produced by
the CLI (e.g., the variable holding stdout or `cli_output`/`result.stdout`) in
each affected test (the blocks around lines 24-29, 294-295, 327 etc.); ensure
the tests assert against the parsed dict/list returned by json.loads so the test
fails if extra bytes surround the JSON payload.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 13fe9c39-2bda-4331-802b-d350ab77b34e

📥 Commits

Reviewing files that changed from the base of the PR and between d86f938 and 16bfc24.

📒 Files selected for processing (2)

src/fromager/commands/graph.py
tests/test_suggest_collection.py

coderabbitai

🧹 Nitpick comments (1)

src/fromager/commands/graph.py (1)

1048-1067: Consider loading collection graphs in parallel for large inputs.

For many collection graphs, sequential loading could be slow. This is fine for typical usage but worth noting if performance becomes a concern with many large graphs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/fromager/commands/graph.py` around lines 1048 - 1067, Parallelize loading
by extracting collection names first (call extract_collection_name for each
graph_path, check for duplicates) and then use a
concurrent.futures.ThreadPoolExecutor to call
DependencyGraph.from_file(graph_path) for each unique path in parallel; for each
future, on success call get_package_names(collection_graph) and populate
collection_packages[collection_name], and on exception wrap and raise a
click.ClickException that includes the graph_path and original error (preserve
the existing logger.debug call per collection_name). Reference symbols:
collection_graphs, extract_collection_name, DependencyGraph.from_file,
get_package_names, collection_packages.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/fromager/commands/graph.py`:
- Around line 1048-1067: Parallelize loading by extracting collection names
first (call extract_collection_name for each graph_path, check for duplicates)
and then use a concurrent.futures.ThreadPoolExecutor to call
DependencyGraph.from_file(graph_path) for each unique path in parallel; for each
future, on success call get_package_names(collection_graph) and populate
collection_packages[collection_name], and on exception wrap and raise a
click.ClickException that includes the graph_path and original error (preserve
the existing logger.debug call per collection_name). Reference symbols:
collection_graphs, extract_collection_name, DependencyGraph.from_file,
get_package_names, collection_packages.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6b23b6d9-1f42-421f-9b08-b2e1bdfe4769

📥 Commits

Reviewing files that changed from the base of the PR and between 16bfc24 and 1836b93.

📒 Files selected for processing (2)

src/fromager/commands/graph.py
tests/test_suggest_collection.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_suggest_collection.py

jlarkin09

LGTM other than the one suggestion by coderabbit for json.loads()

tiran · 2026-04-07T16:51:21Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

smoparth · 2026-04-07T17:39:38Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

@tiran, I had the same thought when I picked this.
@dhellmann, would love to hear your take on right place this feature could land.

dhellmann · 2026-04-07T22:07:34Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

I want this feature here. Let's think of a better name for it.

rd4398 · 2026-04-08T20:10:16Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

I want this feature here. Let's think of a better name for it.

How about find-best-fit as a name?

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fromager/commands/graph.py`:
- Around line 903-923: The JSON being emitted in the results dict in
src/fromager/commands/graph.py must follow the documented schema: replace the
top-level
best_fit/new_packages/existing_packages/coverage_percentage/all_collections
shape with package, version, total_dependencies, and an ordered collections list
named "collections" where each entry has keys name (use score.collection),
new_packages, existing_packages, total_dependencies (compute per-score total if
available), and coverage_pct (rounded to 1 decimal); remove the flattened
top-level best_* fields from this serialized dict and, if you still need those
flattened values for the human table, compute them in
_print_suggest_collection_table() by taking the first-ranked collection from the
collections list rather than emitting a second JSON shape here.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b7011d4c-bcb5-4b92-8f48-db53729ab322

📥 Commits

Reviewing files that changed from the base of the PR and between 1836b93 and 2ec1705.

📒 Files selected for processing (3)

src/fromager/commands/graph.py
src/fromager/dependency_graph.py
tests/test_suggest_collection.py

✅ Files skipped from review due to trivial changes (1)

tests/test_suggest_collection.py

dhellmann · 2026-05-04T11:56:18Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

I want this feature here. Let's think of a better name for it.

How about find-best-fit as a name?

I think that's moving us in the right direction. What do you think, @smoparth ?

smoparth · 2026-05-04T12:21:18Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

I want this feature here. Let's think of a better name for it.

How about find-best-fit as a name?

I think that's moving us in the right direction. What do you think, @smoparth ?

@dhellmann, Thanks for confirming. I'll update the command name and also correct cli argument types to clickext.ClickPath instead of str and update the PR.

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/fromager/commands/lint_requirements.py (1)

82-97: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard resolution after parse failures and empty resolution results.

Line 90 can reference requirement before assignment when Line 64 raises InvalidRequirement, which crashes linting under --resolve-requirements. Line 96 also assumes results is non-empty.

Suggested fix

             except InvalidRequirement as err:
                 msg = f"{path}: {line}: {err}"
                 logger.error(msg)
                 failures.append(msg)
+                continue
 
             # Resolve the requirement to ensure it can be found
             # Skip resolution for constraints files as they should only specify versions
             if resolve_requirements and not is_constraints:
                 token = requirement_ctxvar.set(requirement)
                 try:
                     results = bt.resolve_versions(
                         req=requirement,
                         req_type=RequirementType.TOP_LEVEL,
                     )
+                    if not results:
+                        raise RuntimeError("No versions returned by resolver")
                     _, version = results[0]
                     logger.info(f"{requirement} resolves to {version}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/fromager/commands/lint_requirements.py` around lines 82 - 97, The code
attempts to resolve a requirement even when parsing failed or when resolve
results are empty; update the flow so resolution only runs for successfully
parsed requirements and guard against empty resolve results: inside the block
handling parsing (where InvalidRequirement is caught) ensure you continue/skip
resolution when an exception occurs (so the variable requirement isn't
referenced after the except), and when calling
bt.resolve_versions(req=requirement, ...) check that results is non-empty before
doing results[0] (log an error and append to failures if empty). Refer to
requirement, InvalidRequirement, resolve_requirements, is_constraints,
requirement_ctxvar, bt.resolve_versions, and results when making the changes.

src/fromager/bootstrapper.py (1)

1112-1137: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scan all cache hits before rejecting on build tag.

find_all_matching_from_provider() can return multiple wheel URLs for the same pinned version. This code only inspects results[0], so a mismatched first wheel makes us miss a later candidate with the expected build tag and rebuild unnecessarily.

Suggested fix

-            results = resolver.find_all_matching_from_provider(provider, pinned_req)
-            wheel_url, _ = results[0]
-            wheelfile_name = pathlib.Path(urlparse(wheel_url).path)
             pbi = self.ctx.package_build_info(req)
             expected_build_tag = pbi.build_tag(resolved_version)
-            # Log the expected build tag for debugging
             logger.info(f"has expected build tag {expected_build_tag}")
-            # Get changelogs for debug info
             changelogs = pbi.get_changelog(resolved_version)
             logger.debug(f"has change logs {changelogs}")
-
-            _, _, build_tag, _ = wheels.extract_info_from_wheel_file(
-                req, wheelfile_name
-            )
-            if expected_build_tag and expected_build_tag != build_tag:
-                logger.info(
-                    f"found wheel for {resolved_version} in cache but build tag does not match. Got {build_tag} but expected {expected_build_tag}"
-                )
-                return None, None
+            results = resolver.find_all_matching_from_provider(provider, pinned_req)
+
+            wheel_url = None
+            for candidate_url, _ in results:
+                candidate_name = pathlib.Path(urlparse(candidate_url).path)
+                _, _, build_tag, _ = wheels.extract_info_from_wheel_file(
+                    req, candidate_name
+                )
+                if not expected_build_tag or expected_build_tag == build_tag:
+                    wheel_url = candidate_url
+                    break
+
+            if wheel_url is None:
+                logger.info(
+                    "found wheel(s) for %s in cache but none matched build tag %s",
+                    resolved_version,
+                    expected_build_tag,
+                )
+                return None, None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/fromager/bootstrapper.py` around lines 1112 - 1137, The code inspects
only results[0] from resolver.find_all_matching_from_provider(provider,
pinned_req) and rejects on a build-tag mismatch too early; instead, iterate over
all entries in results, for each wheel_url derive wheelfile_name, call
wheels.extract_info_from_wheel_file(req, wheelfile_name) to get build_tag and
compare against pbi.build_tag(resolved_version) (use the existing pbi and
expected_build_tag), return the matching wheel_url and its metadata immediately
when build_tag matches, and only return (None, None) after all candidates are
checked and none match.

🧹 Nitpick comments (1)

src/fromager/context.py (1)

66-67: ⚡ Quick win

Keep max_release_age validation consistent between __init__ and setter.

set_max_release_age() rejects non-positive values, but direct constructor input currently bypasses that rule.

Suggested fix

-        self.cooldown: Cooldown | None = cooldown
-        self._max_release_age: datetime.timedelta | None = max_release_age
+        self.cooldown: Cooldown | None = cooldown
+        if max_release_age is not None and max_release_age <= datetime.timedelta(0):
+            raise ValueError(
+                f"max_release_age must be positive, got {max_release_age}"
+            )
+        self._max_release_age: datetime.timedelta | None = max_release_age

Also applies to: 116-127

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/fromager/context.py` around lines 66 - 67, The constructor for the
Context class currently accepts max_release_age without applying the same
validation as set_max_release_age; update the __init__ to validate
max_release_age the same way set_max_release_age does (reject None? or
non-positive timedeltas as set_max_release_age enforces) and raise the same
exception/type on invalid values; locate the __init__ parameter max_release_age
and the set_max_release_age method and apply identical checks/path and error
messages so direct construction and setter behavior are consistent (also mirror
this fix for the related validation between lines 116-127 referenced in the
review).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/proposals/release-cooldown.md`:
- Line 236: The "Pre-built wheel exemption" line in the Phase 1 section
conflicts with the earlier "cooldown policy" that applies to pre-built wheels;
update the Phase 1 content so it aligns with the global cooldown rules—either
remove the "Pre-built wheel exemption" clause or clarify that pre-built wheels
remain subject to cooldown checks and describe any narrow, explicitly-scoped
exception; ensure you modify the Phase 1 heading/paragraph containing "Pre-built
wheel exemption" and mention the cooldown policy name or section so readers can
reconcile the two.

In `@e2e/test_bootstrap_cooldown_github.sh`:
- Around line 30-31: The install uses bare pip while the trap uninstall uses
python3 -m pip, which can target different interpreters; change the install
invocation so both use the same interpreter (mirror the uninstall) by running
the install via python3 -m pip with the same package path
("$SCRIPTDIR/github_override_example") to ensure teardown isolation and avoid
plugin leakage.

In `@e2e/test_bootstrap_cooldown_gitlab.sh`:
- Around line 34-35: The uninstall in the trap uses "python3 -m pip uninstall"
but the install uses a bare "pip install", which can target a different
interpreter; change the install invocation to use the same installer as the trap
(e.g., replace the pip install "$SCRIPTDIR/gitlab_override_example" call with
"python3 -m pip install \"$SCRIPTDIR/gitlab_override_example\"") so the install
and teardown use the identical Python/pip environment (refer to the existing
trap line and the current pip install invocation).

In `@e2e/test_mode_fallback.sh`:
- Around line 68-75: The script currently reads only the first match into
failures_file and ignores other test-mode-failures-*.json artifacts; update the
logic to iterate over all matches returned by find (e.g., using find ... -print0
with while read -r -d '' file or a for loop over results), compute the failures
for each file by running jq '.failures | length' on every matched file (or
accumulate a total), and if any file has failure_count > 0 print the error
including the file name and set pass=false (ensure you still send the message to
stderr with 1>&2); reference symbols: failures_file, failure_count, OUTDIR, and
pass.

In `@e2e/test_mode_resolution.sh`:
- Around line 39-40: The script currently reads the first failure entry
(`.failures[0]`) into failure_type which can return the wrong result when
multiple failures exist; update the jq expression that sets failure_type (the
assignment using failures_file and variable failure_type) to select the failure
object whose "dist" equals the shell variable $DIST and then extract its
"failure_type" (i.e., use a jq filter that finds .failures[] | select(.dist ==
env.DIST) | .failure_type and pass DIST into jq as an env/arg) so the test
checks the failure_type for the correct distribution rather than the first JSON
entry.

In `@pyproject.toml`:
- Around line 72-76: The coverage dependency constraint currently allows
versions down to 4.0 ("coverage[toml]!=4.4,>=4.0") but the test command uses
`coverage report --format=markdown` which requires coverage >=7.0; update the
dependency string to require at least 7.0 (for example replace the existing
"coverage[toml]!=4.4,>=4.0" entry with "coverage[toml]>=7.0" or
"coverage[toml]>=7.0,<8.0") so the resolver cannot pick an incompatible coverage
version.

In `@src/fromager/bootstrapper.py`:
- Around line 379-391: When a version bootstrap fails in the multiple_versions
branch, the requirement remains marked in _seen_requirements so later visits are
skipped; modify the failure handling in the block that appends to
_failed_versions (the branch guarded by self.multiple_versions) to also clear
the seen-state for that specific requirement/version (so _has_been_seen() will
not prevent a retry and _add_to_graph() won't reintroduce a failed node).
Specifically, after calling self.ctx.dependency_graph.remove_dependency(...) and
before returning, remove the corresponding entry from self._seen_requirements
(the same key/tuple used by _has_been_seen() and _bootstrap_impl()) and then
call self.ctx.write_to_graph_to_file().

In `@src/fromager/commands/graph.py`:
- Around line 804-810: The code incorrectly adds node.canonicalized_name into
dependency_names before iterating dependencies, causing packages with no deps to
report a dependency on themselves; remove the conditional that adds
node.canonicalized_name and rely solely on node.iter_all_dependencies()
(filtering out ROOT as already done) to build dependency_names so the closure
excludes the package itself. Ensure the code still initializes dependency_names:
set[NormalizedName] and only adds dependency.canonicalized_name for dependencies
where dependency.canonicalized_name != ROOT.

In `@src/fromager/dependency_graph.py`:
- Around line 387-405: After deleting the node and cleaning direct edges,
unreachable descendant nodes remain in self.nodes; update the removal logic in
dependency_graph.py (around deleted_node / self.nodes handling) to prune any
nodes no longer reachable from ROOT: compute the set of reachable nodes (e.g.,
BFS/DFS starting at ROOT) and then remove from self.nodes any key not in that
reachable set (or iteratively delete nodes with zero parents except ROOT), so
get_all_nodes() / get_build_topology() only see reachable packages.

In `@src/fromager/wheels.py`:
- Around line 552-560: The call to resolver.find_all_matching_from_provider is
missing the max_age_cutoff so prebuilt wheels can bypass ctx.max_release_age;
update the call to pass the max release age (e.g.
max_age_cutoff=ctx.max_release_age) so provider.cooldown (set via
resolver.resolve_package_cooldown) and age filtering are applied
consistently—ensure the same cutoff logic used by resolve_source is used here
and that the existing wheel_server_url/supports_upload_time handling remains
unchanged.

In `@tests/test_bootstrap.py`:
- Around line 546-639: The tests currently assert log text instead of the real
side effect; change them to verify constraints generation behavior by either (a)
asserting the Bootstrapper mock's write_constraints_file method was or was not
called (reference mock_bootstrapper.return_value.write_constraints_file) or (b)
checking the presence/absence of the generated constraints file
(constraints.txt) in the isolated filesystem after invoking bootstrap.bootstrap;
update test_multiple_versions_auto_disables_constraints to assert no
write_constraints_file call / no constraints.txt and update
test_multiple_versions_with_skip_constraints_no_duplicate_log similarly to
assert no write_constraints_file call / no constraints.txt so the tests validate
functionality rather than brittle log messages.

---

Outside diff comments:
In `@src/fromager/bootstrapper.py`:
- Around line 1112-1137: The code inspects only results[0] from
resolver.find_all_matching_from_provider(provider, pinned_req) and rejects on a
build-tag mismatch too early; instead, iterate over all entries in results, for
each wheel_url derive wheelfile_name, call
wheels.extract_info_from_wheel_file(req, wheelfile_name) to get build_tag and
compare against pbi.build_tag(resolved_version) (use the existing pbi and
expected_build_tag), return the matching wheel_url and its metadata immediately
when build_tag matches, and only return (None, None) after all candidates are
checked and none match.

In `@src/fromager/commands/lint_requirements.py`:
- Around line 82-97: The code attempts to resolve a requirement even when
parsing failed or when resolve results are empty; update the flow so resolution
only runs for successfully parsed requirements and guard against empty resolve
results: inside the block handling parsing (where InvalidRequirement is caught)
ensure you continue/skip resolution when an exception occurs (so the variable
requirement isn't referenced after the except), and when calling
bt.resolve_versions(req=requirement, ...) check that results is non-empty before
doing results[0] (log an error and append to failures if empty). Refer to
requirement, InvalidRequirement, resolve_requirements, is_constraints,
requirement_ctxvar, bt.resolve_versions, and results when making the changes.

---

Nitpick comments:
In `@src/fromager/context.py`:
- Around line 66-67: The constructor for the Context class currently accepts
max_release_age without applying the same validation as set_max_release_age;
update the __init__ to validate max_release_age the same way set_max_release_age
does (reject None? or non-positive timedeltas as set_max_release_age enforces)
and raise the same exception/type on invalid values; locate the __init__
parameter max_release_age and the set_max_release_age method and apply identical
checks/path and error messages so direct construction and setter behavior are
consistent (also mirror this fix for the related validation between lines
116-127 referenced in the review).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab06fc5b-b61f-48d7-990c-8d6b0ade0e7d

📥 Commits

Reviewing files that changed from the base of the PR and between 2ec1705 and 15236a5.

📒 Files selected for processing (86)

.github/workflows/codeql.yaml
.github/workflows/python-publish.yaml
.github/workflows/scorecard.yaml
.github/workflows/test.yaml
.gitignore
.mergify.yml
AGENTS.md
docs/develop.md
docs/how-tos/index.rst
docs/how-tos/release-age-cooldown.rst
docs/index.rst
docs/proposals/index.rst
docs/proposals/iterative-bootstrap.rst
docs/proposals/new-patcher-config.md
docs/proposals/new-resolver-config.md
docs/proposals/release-cooldown.md
docs/reference/config-reference.rst
docs/reference/hooks.rst
docs/spelling_wordlist.txt
e2e/ci_bootstrap_suite.sh
e2e/cooldown_override_settings/stevedore.yaml
e2e/cooldown_prebuilt_settings/stevedore.yaml
e2e/github_override_example/pyproject.toml
e2e/github_override_example/src/package_plugins/__init__.py
e2e/github_override_example/src/package_plugins/stevedore_github.py
e2e/gitlab_override_example/pyproject.toml
e2e/gitlab_override_example/src/package_plugins/__init__.py
e2e/gitlab_override_example/src/package_plugins/python_gitlab.py
e2e/test_bootstrap_conflicting_requirements.sh
e2e/test_bootstrap_constraints.sh
e2e/test_bootstrap_cooldown.sh
e2e/test_bootstrap_cooldown_github.sh
e2e/test_bootstrap_cooldown_gitlab.sh
e2e/test_bootstrap_cooldown_override.sh
e2e/test_bootstrap_cooldown_prebuilt.sh
e2e/test_bootstrap_cooldown_transitive.sh
e2e/test_bootstrap_max_release_age.sh
e2e/test_bootstrap_multiple_versions.sh
e2e/test_build_failure/pyproject.toml
e2e/test_build_failure/setup.py
e2e/test_build_failure/test_build_failure/__init__.py
e2e/test_mode_build.sh
e2e/test_mode_deps.sh
e2e/test_mode_fallback.sh
e2e/test_mode_resolution.sh
pyproject.toml
src/fromager/__main__.py
src/fromager/bootstrap_requirement_resolver.py
src/fromager/bootstrapper.py
src/fromager/build_environment.py
src/fromager/commands/bootstrap.py
src/fromager/commands/graph.py
src/fromager/commands/lint_requirements.py
src/fromager/context.py
src/fromager/dependencies.py
src/fromager/dependency_graph.py
src/fromager/external_commands.py
src/fromager/finders.py
src/fromager/http_retry.py
src/fromager/packagesettings/__init__.py
src/fromager/packagesettings/_models.py
src/fromager/packagesettings/_pbi.py
src/fromager/pyproject.py
src/fromager/requirements_file.py
src/fromager/resolver.py
src/fromager/sbom.py
src/fromager/sources.py
src/fromager/vendor_rust.py
src/fromager/wheels.py
tests/conftest.py
tests/test_bootstrap.py
tests/test_bootstrap_requirement_resolver.py
tests/test_bootstrap_requirement_resolver_multiple.py
tests/test_bootstrap_test_mode.py
tests/test_bootstrapper.py
tests/test_build_environment.py
tests/test_cooldown.py
tests/test_dependencies.py
tests/test_finders.py
tests/test_hooks.py
tests/test_packagesettings.py
tests/test_sbom.py
tests/test_sources.py
tests/test_suggest_collection.py
tests/testdata/sbom-context/overrides/settings.yaml
tests/testdata/sbom-context/overrides/settings/test_purl_fields.yaml

✅ Files skipped from review due to trivial changes (26)

e2e/test_build_failure/test_build_failure/init.py
tests/testdata/sbom-context/overrides/settings.yaml
docs/reference/config-reference.rst
.github/workflows/codeql.yaml
e2e/cooldown_prebuilt_settings/stevedore.yaml
AGENTS.md
docs/proposals/index.rst
e2e/github_override_example/pyproject.toml
tests/testdata/sbom-context/overrides/settings/test_purl_fields.yaml
docs/how-tos/release-age-cooldown.rst
docs/how-tos/index.rst
.gitignore
docs/index.rst
docs/reference/hooks.rst
docs/proposals/iterative-bootstrap.rst
e2e/test_bootstrap_constraints.sh
src/fromager/requirements_file.py
docs/develop.md
src/fromager/vendor_rust.py
src/fromager/external_commands.py
tests/test_packagesettings.py
src/fromager/pyproject.py
e2e/test_build_failure/setup.py
docs/spelling_wordlist.txt
src/fromager/dependencies.py
tests/test_build_environment.py

Add fromager graph suggest-collection that analyzes dependency overlap between onboarding packages and existing collections to recommend the best-fit collection for each package. Helps pipeline maintainers make data-driven decisions when assigning new packages to permanent collections. - Iterative DFS traversal over all edge types for full dependency closure - Ranks collections by fewest new packages, then highest coverage - Rich table (default) and JSON output formats - Error handling for bad graph files and duplicate collection names - Tests covering helpers, CLI, and analysis logic Co-Authored-By: Claude-4.6-opus-high <claude@anthropic.com> Closes: python-wheel-build#971 Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>

dhellmann · 2026-05-04T13:59:13Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

I want this feature here. Let's think of a better name for it.

How about find-best-fit as a name?

I think that's moving us in the right direction. What do you think, @smoparth ?

@dhellmann, Thanks for confirming. I'll update the command name and also correct cli argument types to clickext.ClickPath instead of str and update the PR.

We may want to update the name in #1103 as well. Let me know what you think?

smoparth · 2026-05-04T13:59:56Z

@coderabbitai resolve

coderabbitai · 2026-05-04T14:00:15Z

✅ Actions performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

smoparth · 2026-05-04T14:23:41Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

I want this feature here. Let's think of a better name for it.

How about find-best-fit as a name?

I think that's moving us in the right direction. What do you think, @smoparth ?

@dhellmann, Thanks for confirming. I'll update the command name and also correct cli argument types to clickext.ClickPath instead of str and update the PR.

We may want to update the name in #1103 as well. Let me know what you think?

IMHO, I preferred the original names (suggest-collection and suggest-base). They were more intuitive, even though they used downstream terminology. find-best-fit does describe the action, but it lacks a clear noun and doesn’t feel immediately intuitive.

dhellmann · 2026-05-04T18:28:59Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

I want this feature here. Let's think of a better name for it.

How about find-best-fit as a name?

I think that's moving us in the right direction. What do you think, @smoparth ?

@dhellmann, Thanks for confirming. I'll update the command name and also correct cli argument types to clickext.ClickPath instead of str and update the PR.

We may want to update the name in #1103 as well. Let me know what you think?

IMHO, I preferred the original names (suggest-collection and suggest-base). They were more intuitive, even though they used downstream terminology. find-best-fit does describe the action, but it lacks a clear noun and doesn’t feel immediately intuitive.

The point about not having a noun is interesting, but we don't have nouns in all of the other graph-related subcommands so I'm not sure it's a requirement. I do agree that we might be able to come up with something more precise than "best fit".

I'll leave the naming up to you two. You've seen what happens when I'm allowed to name things. :-)

smoparth · 2026-05-04T19:30:44Z

I'm not convinced that it makes sense to add this feature to Fromager upstream. Collections are a downstream concept, not a Fromager feature.

I want this feature here. Let's think of a better name for it.

How about find-best-fit as a name?

I think that's moving us in the right direction. What do you think, @smoparth ?

@dhellmann, Thanks for confirming. I'll update the command name and also correct cli argument types to clickext.ClickPath instead of str and update the PR.

We may want to update the name in #1103 as well. Let me know what you think?

IMHO, I preferred the original names (suggest-collection and suggest-base). They were more intuitive, even though they used downstream terminology. find-best-fit does describe the action, but it lacks a clear noun and doesn’t feel immediately intuitive.

The point about not having a noun is interesting, but we don't have nouns in all of the other graph-related subcommands so I'm not sure it's a requirement. I do agree that we might be able to come up with something more precise than "best fit".

I'll leave the naming up to you two. You've seen what happens when I'm allowed to name things. :-)

In that case, we could go with find-best-fit as-is for #1030 and name suggest-base something like find-shared-packages or find-common-packages.

I've also addressed all the previous review feedback -- I think it's ready to merge. PTAL when you get a chance.

cc: @dhellmann, @rd4398

smoparth requested a review from a team as a code owner April 6, 2026 19:26

mergify Bot added the ci label Apr 6, 2026

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread src/fromager/commands/graph.py Outdated

Comment thread src/fromager/commands/graph.py Outdated

Comment thread tests/test_suggest_collection.py

smoparth force-pushed the feat/suggest-collection branch 2 times, most recently from 55faca2 to 1836b93 Compare April 6, 2026 20:01

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

jlarkin09 reviewed Apr 7, 2026

View reviewed changes

dhellmann reviewed Apr 7, 2026

View reviewed changes

Comment thread src/fromager/commands/graph.py Outdated

Comment thread src/fromager/commands/graph.py Outdated

Comment thread src/fromager/commands/graph.py Outdated

Comment thread src/fromager/commands/graph.py

Comment thread src/fromager/commands/graph.py Outdated

smoparth force-pushed the feat/suggest-collection branch from 1836b93 to 2ec1705 Compare April 8, 2026 22:16

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/fromager/commands/graph.py

smoparth mentioned this pull request May 1, 2026

feat(graph): add suggest-base subcommand with collection impact analysis #1103

Open

4 tasks

smoparth force-pushed the feat/suggest-collection branch from 2ec1705 to 15236a5 Compare May 4, 2026 13:16

mergify Bot added the mergify label May 4, 2026

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

smoparth force-pushed the feat/suggest-collection branch from 15236a5 to 8f18545 Compare May 4, 2026 13:51

smoparth force-pushed the feat/suggest-collection branch from 8f18545 to 6e989dc Compare May 4, 2026 13:56

smoparth requested a review from dhellmann May 4, 2026 14:01

Conversation

smoparth commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

What

Why

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jlarkin09 left a comment

Choose a reason for hiding this comment

Uh oh!

tiran commented Apr 7, 2026

Uh oh!

smoparth commented Apr 7, 2026

Uh oh!

dhellmann commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rd4398 commented Apr 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dhellmann commented May 4, 2026

Uh oh!

smoparth commented May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dhellmann commented May 4, 2026

Uh oh!

smoparth commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026

Uh oh!

smoparth commented May 4, 2026

Uh oh!

dhellmann commented May 4, 2026

Uh oh!

smoparth commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

smoparth commented Apr 6, 2026 •

edited

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading