fix: install __hpc_base_packages early via dedicated task#83
fix: install __hpc_base_packages early via dedicated task#83ggoklani wants to merge 1 commit intolinux-system-roles:mainfrom
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideIntroduces a dedicated, early "Install base packages" task in the main role flow that installs a new __hpc_base_packages list (currently just pssh) using the existing OSTree-aware package backend pattern, ensuring baseline packages are present independent of optional feature blocks like Moneo. Class diagram for __hpc_base_packages variableclassDiagram
class HPCVars {
list~string~ __hpc_base_packages
}
class BasePackageList {
+string pssh
}
HPCVars "1" --> "1" BasePackageList : defines
Flow diagram for early base package installation taskflowchart TD
A[Role start] --> B[Configure repositories
__hpc_nvidia_cuda_repo
__hpc_microsoft_prod_repo]
B --> C[Install base packages
package name __hpc_base_packages
state present
use ostree aware backend]
C --> D[Install optional components
such as Moneo]
C --> E[Continue remaining role tasks]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
Install base packagestask uses anuntil: ... is successloop but does not configureretries/delay, so the loop has no practical effect; either add explicit retry settings or drop theuntilfor clarity. - Consider guarding the
Install base packagestask with a condition that__hpc_base_packagesis defined and non-empty (e.g.when: __hpc_base_packages | default([]) | length > 0) to make the intent explicit and avoid running a no-op task if the list is empty in other contexts.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `Install base packages` task uses an `until: ... is success` loop but does not configure `retries`/`delay`, so the loop has no practical effect; either add explicit retry settings or drop the `until` for clarity.
- Consider guarding the `Install base packages` task with a condition that `__hpc_base_packages` is defined and non-empty (e.g. `when: __hpc_base_packages | default([]) | length > 0`) to make the intent explicit and avoid running a no-op task if the list is empty in other contexts.
## Individual Comments
### Comment 1
<location> `tasks/main.yml:74-81` </location>
<code_context>
- "{{ __hpc_nvidia_cuda_repo }}"
- "{{ __hpc_microsoft_prod_repo }}"
+- name: Install base packages
+ package:
+ name: "{{ __hpc_base_packages }}"
+ state: present
+ use: "{{ (__hpc_server_is_ostree | d(false)) |
+ ternary('ansible.posix.rhel_rpm_ostree', omit) }}"
+ register: __hpc_base_packages_install
+ until: __hpc_base_packages_install is success
+
- name: Replace default RHUI Azure repository with the EUS repository
</code_context>
<issue_to_address>
**suggestion:** The `until` clause is effectively a no-op without explicit `retries`/`delay` and may not provide the intended resilience.
As written, this still executes only once because no `retries`/`delay` are defined, so `until: __hpc_base_packages_install is success` doesn’t add real resilience. If you want to handle transient repo/lock errors, add explicit `retries` and `delay`; otherwise consider removing the `until` to avoid suggesting a retry mechanism that isn’t there.
```suggestion
- name: Install base packages
package:
name: "{{ __hpc_base_packages }}"
state: present
use: "{{ (__hpc_server_is_ostree | d(false)) |
ternary('ansible.posix.rhel_rpm_ostree', omit) }}"
register: __hpc_base_packages_install
retries: 5
delay: 10
until: __hpc_base_packages_install is success
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| - name: Install base packages | ||
| package: | ||
| name: "{{ __hpc_base_packages }}" | ||
| state: present | ||
| use: "{{ (__hpc_server_is_ostree | d(false)) | | ||
| ternary('ansible.posix.rhel_rpm_ostree', omit) }}" | ||
| register: __hpc_base_packages_install | ||
| until: __hpc_base_packages_install is success |
There was a problem hiding this comment.
suggestion: The until clause is effectively a no-op without explicit retries/delay and may not provide the intended resilience.
As written, this still executes only once because no retries/delay are defined, so until: __hpc_base_packages_install is success doesn’t add real resilience. If you want to handle transient repo/lock errors, add explicit retries and delay; otherwise consider removing the until to avoid suggesting a retry mechanism that isn’t there.
| - name: Install base packages | |
| package: | |
| name: "{{ __hpc_base_packages }}" | |
| state: present | |
| use: "{{ (__hpc_server_is_ostree | d(false)) | | |
| ternary('ansible.posix.rhel_rpm_ostree', omit) }}" | |
| register: __hpc_base_packages_install | |
| until: __hpc_base_packages_install is success | |
| - name: Install base packages | |
| package: | |
| name: "{{ __hpc_base_packages }}" | |
| state: present | |
| use: "{{ (__hpc_server_is_ostree | d(false)) | | |
| ternary('ansible.posix.rhel_rpm_ostree', omit) }}" | |
| register: __hpc_base_packages_install | |
| retries: 5 | |
| delay: 10 | |
| until: __hpc_base_packages_install is success |
Add a new top-level "Install base packages" task to ensure the role’s baseline packages (defined in __hpc_base_packages) are installed consistently and independently of optional feature blocks such as Moneo. The task is placed immediately after repository configuration so required repos are available before package installation, and it uses the same OSTree-safe package backend selection pattern used elsewhere in the role. Anything we need that isn't a direct dependency of some other install target should be listed here. Signed-off-by: Gaurav Goklani <ggoklani@redhat.com>
74212e0 to
2fc2198
Compare
|
|
||
| __hpc_microsoft_prod_rpm_key: https://packages.microsoft.com/keys/microsoft.asc | ||
|
|
||
| __hpc_base_packages: |
There was a problem hiding this comment.
The naming here is too vague to me, can you come up with a more descriptive name? Base for what is this?
There was a problem hiding this comment.
"base" as in "we can't build a house without first preparing a solid base to build on". e.g. RHEL 9.6 is the "base image" we build the HPC product on top of.
The "base HPC packages" are packages that we always need to install into the HPC cluster image, regardless of whether there are any downstream installation dependencies on them or not.
FWIW, you've said many times we should group all the packages we really need into a single install rule to avoid having to run DNF over and over again as teh system role script runs. This is where we are going to define that packages list to provide form the base level environment everything else in the system role script is built upon...
Enhancement: Add a new top-level "Install base packages" task to ensure the role’s baseline packages (defined in __hpc_base_packages) are installed consistently and independently of optional feature blocks such as Moneo.
The task is placed immediately after repository configuration so required repos are available before package installation, and it uses the same OSTree-safe package backend selection pattern used elsewhere in the role.
this change only introduces a common baseline package installation step.
Reason: 'pssh required to configure moneo'
Result: TASK [linux-system-roles.hpc : Install base packages] **************************
changed: [localhost] => {"attempts": 1, "changed": true, "msg": "", "rc": 0, "results": ["Installed: pssh-2.3.4-1.el9.noarch"]}
Issue Tracker Tickets (Jira or BZ if any): https://issues.redhat.com/browse/RHELHPC-166
Summary by Sourcery
Add a dedicated task to install baseline HPC packages immediately after repository configuration to ensure required utilities are always present.
New Features:
Enhancements: