feat: install the Azure HPC Diagnostics script#76
Merged
dgchinner merged 3 commits intolinux-system-roles:mainfrom Feb 19, 2026
Merged
feat: install the Azure HPC Diagnostics script#76dgchinner merged 3 commits intolinux-system-roles:mainfrom
dgchinner merged 3 commits intolinux-system-roles:mainfrom
Conversation
Reviewer's GuideAdds automated installation of the Azure HPC Diagnostics tool to the role, including package dependencies, archive download, templated patching of the upstream script, relocation of diagnostics output, and documentation of the new hpc_install_diagnostics toggle. Sequence diagram for Azure HPC Diagnostics installation via Ansible rolesequenceDiagram
participant ControlNode
participant TargetVM
participant GitHubAzhpcDiagnostics
ControlNode->>TargetVM: Run role tasks/main.yml
rect rgb(235,235,235)
Note over ControlNode,TargetVM: Install Azure HPC Diagnostics tool block (when hpc_install_diagnostics)
ControlNode->>TargetVM: stat path=__hpc_azure_tools_dir/gather_azhpc_vm_diagnostics.sh
TargetVM-->>ControlNode: __hpc_azure_diags_installed
alt diagnostics_not_installed
ControlNode->>TargetVM: package name=__hpc_azure_diagnostics_packages state=present
loop until_success
TargetVM-->>ControlNode: __hpc_azure_diagnostics_packages_install
end
ControlNode->>TargetVM: include_tasks download_extract_package.yml
ControlNode->>GitHubAzhpcDiagnostics: HTTP GET hpcdiag_archive (url in __hpc_azhpc_diags_info)
GitHubAzhpcDiagnostics-->>ControlNode: tarball
ControlNode->>TargetVM: Upload and extract archive to __hpc_pkg_extracted.path
ControlNode->>ControlNode: tempfile hpc_diags_XXXX.patch
ControlNode->>ControlNode: template azhpc_vm_diagnostics.sh.patch.j2 -> __hpc_diags_patch_file.path
ControlNode->>TargetVM: patch src=__hpc_diags_patch_file.path dest=__hpc_pkg_extracted.path/Linux/src/gather_azhpc_vm_diagnostics.sh
ControlNode->>TargetVM: copy patched script -> __hpc_azure_tools_dir/gather_azhpc_vm_diagnostics.sh mode=0755
ControlNode->>ControlNode: file state=absent path=__hpc_diags_patch_file.path
ControlNode->>TargetVM: file state=absent path=__hpc_pkg_extracted.path
else diagnostics_already_installed
Note over ControlNode,TargetVM: Skips download, patch, and install steps
end
end
ControlNode-->>TargetVM: Continue with remaining role tasks
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
f12a9f3 to
bd20207
Compare
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `tasks/main.yml:1447-1451` </location>
<code_context>
+ dest: "{{ __hpc_diags_patch_file.path }}"
+ mode: '0644'
+
+ - name: Patch Diagnostics script
+ patch:
+ src: "{{ __hpc_diags_patch_file.path }}"
+ dest: "{{ __hpc_pkg_extracted.path }}/Linux/src/gather_azhpc_vm_diagnostics.sh"
+ remote_src: false
+ state: present
+ strip: 1
</code_context>
<issue_to_address>
**issue (bug_risk):** Patch task likely mixes up controller/remote paths via `remote_src: false`.
Since `__hpc_diags_patch_file.path` is created via `tempfile` on the remote host, `src` points to a remote path. With `remote_src: false`, the `patch` module will look for this file on the controller instead and fail with file-not-found. Set `remote_src: true` so `patch` reads the file from the remote host where it actually exists.
</issue_to_address>
### Comment 2
<location> `README.md:203` </location>
<code_context>
+
+The Azure HPC Diagnostics tool gathers system information for triage and
+debugging purposes. It collects information and state from the hardware, OS,
+azure envinroment and installed applications and packages it into a tarball
+to simplify the process of system support of bug triage.
+
</code_context>
<issue_to_address>
**issue (typo):** Fix typo and capitalization in "azure envinroment".
Suggest using "Azure environment" here to fix the spelling and align capitalization with earlier usage.
```suggestion
Azure environment and installed applications and packages it into a tarball
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
02bd5e0 to
2c6078e
Compare
Define up the Azure HPC diagnostics tarball download location, set up a SHA256 sum for it (as github does not supply one), then add the config documentation and infrastructure to pull it down and unpack it, ready to extract the diagnostics script from it. Signed-off-by: Dave Chinner <dchinner@redhat.com>
2c6078e to
686a52d
Compare
spetrosi
reviewed
Feb 12, 2026
The script itself installs certain tools via repository deep links. Some of these tools are provided by OS pacakages, so pull them in via the system-role as a dependent package. Signed-off-by: Dave Chinner <dchinner@redhat.com>
The Azure HPC diagnostics script captures information about the
system hardware and software for dianostic purposes. It is intended
to supplement the RHEL sosreport diagnostics to cover the Azure
specific hardware and software that the sosreport does not capture.
This script will be used for information gathering in support
contexts, it is not intended to be run on active HPC nodes.
Before we install the downloaded diagnostic script, we need to
change a few things in the script:
- the output should be in {{ __hpc_azure_runtime_dir }}/diagnostics
- permanently disable the auto update code
- fix the version number instead of assuming the script it running
from a local git repository
- change from defaulting to online mode (requires internet access)
to offline mode. --offline option goes away, replaced by --online
option
- Indicate that the diagnostic log files should be passed on to
Red Hat, not Microsoft.
To make this easy, we will add a patch file to the system role that
contains the code changes we need to make to the script. This is
much simpler to apply that needing to do complex parser based
matches and replacements to make the changes we need.
The resultant patch file will then need to be treated as a
template to do path substitution for the runtime output directory.
This will place the diagnostic output in a well known place by
default, rather than where-ever the script was run from.
The script will be installed to {{ __hpc_azure_tools_dir }}. If the
script is already present in this location, then we will skip over
the installation entirely.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
686a52d to
fd05459
Compare
richm
approved these changes
Feb 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enhancement:
The Azure HPC diagnostics script captures information about the
system hardware and software for dianostic purposes. It is intended
to supplement the RHEL sosreport diagnostics to cover the Azure
specific hardware and software that the sosreport does not capture.
This script will be used for information gathering in support
contexts, it is not intended to be run on active HPC nodes.
The script will be installed to /opt/hpc/azure/tools, and should be run
from there.
The log files generated from this script must be treated with the same
care as sosreport log files as they may contain customer information
and/or PII which falls under various data protection laws.
Result:
When run, a tarball containing all the log files will be deposited in /var/hpc/azure/diagnostics/.
The last lines output by the script will indicate the name and location of the tarball containing
the diagnostic information.
Issue Tracker Tickets (Jira or BZ if any): https://issues.redhat.com/browse/RHELHPC-110
Summary by Sourcery
Install and integrate the Azure HPC Diagnostics tool into the Azure HPC role for automated deployment and configuration.
New Features:
Enhancements:
Documentation: