refactor(bootstrapper): convert recursive bootstrap to iterative proessing#1116
refactor(bootstrapper): convert recursive bootstrap to iterative proessing#1116rd4398 wants to merge 1 commit intopython-wheel-build:mainfrom
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
We have to choose between #1109 and this PR to proceed forward. Please take a look and let me know what you think. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fromager/bootstrapper.py`:
- Around line 1489-1494: The code currently unions item.build_backend_deps and
item.build_sdist_deps which relabels BUILD_SDIST edges as BUILD_BACKEND and
loses the duplicate-edge semantics; change the logic in the place that builds
dep_items (call _create_dep_work_items) to handle the two dependency sets
separately instead of doing item.build_backend_deps | item.build_sdist_deps:
call _create_dep_work_items(item.build_backend_deps,
RequirementType.BUILD_BACKEND, item.req, item.resolved_version) and separately
_create_dep_work_items(item.build_sdist_deps, RequirementType.BUILD_SDIST,
item.req, item.resolved_version), then combine/extend their results so both
typed edges are preserved (so self.why, _add_to_graph and deduping behavior
remain correct).
- Around line 1667-1677: When removing a failed version in the multiple_versions
branch, also remove the "seen" marker that _phase_start() previously set so
future traversals can retry that version; specifically after calling
self.ctx.dependency_graph.remove_dependency(...) and before
self.ctx.write_to_graph_to_file(), drop the entry from whatever in-memory
seen-tracking structure used by _phase_start() (e.g.
self._seen.discard((pkg_name, str(item.resolved_version))) or call the
corresponding clear/unmark method), and ensure _failed_versions append and
logging remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cacc4cdb-4289-4fbf-8845-dab740b0d233
📒 Files selected for processing (2)
src/fromager/bootstrapper.pytests/test_bootstrapper.py
…essing Replace the recursive DFS in Bootstrapper.bootstrap() with an explicit LIFO stack and a six-phase state machine (START, PREPARE_SOURCE, PREPARE_BUILD, BUILD, PROCESS_INSTALL_DEPS, COMPLETE). This eliminates Python stack overflow on deep/wide dependency graphs, especially with --multiple-versions enabled. Key changes: - Add BootstrapPhase enum and WorkItem dataclass for phase-based processing - Add six phase handler methods that replace the monolithic _bootstrap_impl - Rewrite bootstrap() to use an iterative DFS loop with explicit work stack - Remove _bootstrap_single_version, _bootstrap_impl, _build_from_source - Preserve all three error handling modes (normal, test, multiple-versions) - Keep _prepare_build_dependencies for git URL resolution path only - Update tests to match new internal structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Rohan Devasthale <rdevasth@redhat.com>
88e0673 to
b8e3f29
Compare
| populated across phases as processing advances. | ||
| """ | ||
|
|
||
| # Identity (set at creation) |
There was a problem hiding this comment.
How important is it that these "identity" values don't change? I wonder if we want to put them in their own data class so we can mark it as frozen, for example. Maybe that's overkill?
| # (the loop modifies self.why for each work item) | ||
| saved_why = list(self.why) | ||
|
|
||
| # Create initial work items (reversed so first version is on top of stack) |
There was a problem hiding this comment.
What does "first" mean in this context? Lowest version number?
| # Main iterative DFS loop | ||
| while stack: | ||
| item = stack.pop() | ||
| self.why = list(item.why_snapshot) |
There was a problem hiding this comment.
I wonder if instead of resetting self.why we could use item.why_snapshot in all of those places? Do we have the item value everywhere we would need to do that?
| if item.phase == BootstrapPhase.START: | ||
| # START phase runs outside _track_why to match | ||
| # original behavior where seen-check precedes | ||
| # pushing onto the why stack. |
There was a problem hiding this comment.
What would be the effect of eliminating that special case?
| if not new_items: | ||
| self.progressbar.update() | ||
|
|
||
| # Push items so first in list ends up on top (processed first) |
There was a problem hiding this comment.
I think we need to reverse the list here because we're not treating the list in the phase functions as a stack. I would have expected the phase to return something like [item] + new_items and that resulting list would then need to have new_items processed before trying item again. That way extend() here just adds to the stack in the same natural order.
|
|
||
| Called outside _track_why context to match original behavior | ||
| where graph addition and seen-check happen before pushing | ||
| the current item onto the why stack. |
There was a problem hiding this comment.
It would help to explicitly explain what the return list is for each phase function. It's here in this paragraph, but calling it out as something like this would help make the flow more obvious:
Returns the current item to push it back onto the stack for the prepare-source phase.
| ) | ||
|
|
||
| item.phase = BootstrapPhase.PREPARE_BUILD | ||
| return dep_items + [item] |
There was a problem hiding this comment.
Treat the return value as a stack. We want to process dep_items before looking at item again.
| return dep_items + [item] | |
| return [item] + dep_items |
| return [] | ||
|
|
||
| def _dispatch_phase(self, item: WorkItem) -> list[WorkItem]: | ||
| """Route a work item to the appropriate phase handler.""" |
|
|
||
| # Create initial work items (reversed so first version is on top of stack) | ||
| stack: list[WorkItem] = [] | ||
| for source_url, resolved_version in reversed(resolved_versions): |
There was a problem hiding this comment.
What do you think about making resolving the versions a phase, too?
| items: list[WorkItem] = [] | ||
| for dep in self._sort_requirements(deps): | ||
| try: | ||
| resolved = self.resolve_versions( |
There was a problem hiding this comment.
Making the version resolution a phase would mean we would only have this logic in 1 place.
Replace the recursive DFS in Bootstrapper.bootstrap() with an explicit LIFO stack and a six-phase state machine (START, PREPARE_SOURCE, PREPARE_BUILD, BUILD, PROCESS_INSTALL_DEPS, COMPLETE). This eliminates Python stack overflow on deep/wide dependency graphs, especially with --multiple-versions enabled.
Key changes: