Skip to content

New DeltaAI Module#1306

Merged
sbryngelson merged 1 commit intoMFlowCode:masterfrom
wilfonba:deltaAIFix
Mar 13, 2026
Merged

New DeltaAI Module#1306
sbryngelson merged 1 commit intoMFlowCode:masterfrom
wilfonba:deltaAIFix

Conversation

@wilfonba
Copy link
Contributor

@wilfonba wilfonba commented Mar 13, 2026

Description

Replaces old DeltaAI modules with new working module.

Fixes #(issue)

Type of change

  • Bug fix

Testing

I ran the examples/3D_performance_test/ case with a 360^3 grid on 1, 2, 4, and 8 GPUs on DeltaAI. See results below:

  • 1 rank: 0.4866 ns/gp/eq/rhs
  • 2 rank: 0.2626 ns/gp/eq/rhs
  • 4 rank: 0.1421 ns/gp/eq/rhs
  • 8 rank: 0.0779 ns/gp/eq/rhs (2 Nodes)

Copilot AI review requested due to automatic review settings March 13, 2026 17:36
@github-actions
Copy link

Claude Code Review

Head SHA: 3bebe44

Files changed: 1

  • toolchain/modules

Summary

  • Replaces two separate modules (nvhpc-openmpi3/24.3 + cuda) with the unified nvidia/25.5 module on NCSA DeltaAI
  • The change is minimal (1 addition, 1 deletion) and isolated to the module loader configuration
  • Testing was performed on examples/3D_performance_test/ at 360³ resolution on 1, 2, 4, and 8 GPUs with near-ideal strong scaling observed
  • No Fortran source code changes; no GPU macro, precision, or parameter-system rules apply here

Findings

No issues found.


Improvement Opportunities

  1. PR body references unfiled issue (toolchain/modules, line N/A): The body contains Fixes #(issue) as a placeholder — this looks like an unfilled template. If there's a tracking issue for the broken DeltaAI module, linking it would be helpful; otherwise consider removing the placeholder before merge.

  2. Old module version documentation (toolchain/modules, line 72): The old module string (nvhpc-openmpi3/24.3 cuda) is simply deleted with no comment on why it broke (e.g., module was retired). A brief comment in the modules file or in the PR body about when the old module was deprecated would help future maintainers understand the change history.

  3. dai-gpu CUDA CC list (toolchain/modules, line 73): MFC_CUDA_CC=89,90 targets SM 89 (Ada Lovelace) and SM 90 (Hopper). Verifying that the H100s on DeltaAI are indeed SM 90 (they are) and that no other GPU arch is present on DeltaAI nodes is worth confirming, but this is unchanged and pre-existing.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the DeltaAI (NCSA) cluster module configuration used by the MFC toolchain’s module loader (./mfc.sh load) to reflect a newer NVIDIA toolchain module.

Changes:

  • Replaces the DeltaAI common module list entry from nvhpc-openmpi3/24.3 cuda to nvidia/25.5.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

The pull request modifies a module configuration for NCSA DeltaAI in the toolchain/modules directory. The dai-all module line is updated to replace the nvhpc-openmpi3/24.3 cuda toolchain with nvidia/25.5. All other configuration parameters on the line, including Python, CMake, and compiler settings (CC/CXX/FC), remain unchanged. The dai-gpu module configuration is unaffected by this change. No public or exported entities are altered.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'New DeltaAI Module' is vague and generic, using non-descriptive language that doesn't convey the specific technical change. Replace with a more specific title that clarifies the actual change, such as 'Replace DeltaAI nvhpc-openmpi3 module with nvidia/25.5' or 'Update DeltaAI toolchain module configuration'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description covers the main points but is missing critical details: the specific issue number is not filled in, and the testing section lacks context about what was being validated.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can enforce grammar and style rules using `languagetool`.

Configure the reviews.tools.languagetool setting to enable/disable rules and categories. Refer to the LanguageTool Community to learn more.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e3f345d3-3178-4fb4-be84-a650b3fcbdc0

📥 Commits

Reviewing files that changed from the base of the PR and between 598f5a5 and 3bebe44.

📒 Files selected for processing (1)
  • toolchain/modules


dai NCSA DeltaAI
dai-all python cmake nvhpc-openmpi3/24.3 cuda
dai-all python cmake nvidia/25.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For NCSA DeltaAI, what components are included in the nvidia/25.5` module?
Specifically confirm whether it provides:

  1. CUDA toolkit/runtime paths, and
  2. MPI (OpenMPI/HPC-X) usable for multi-rank jobs with nvc/nvc++/nvfortran.
    Please include official cluster/module documentation links.`

💡 Result:

What the DeltaAI docs publicly confirm (re: NVIDIA HPC SDK 25.5)

1) CUDA toolkit/runtime paths

Yes—DeltaAI provides CUDA via its cudatoolkit module, and the documented nvcc path is inside the NVIDIA HPC SDK 25.5 install tree:

  • cudatoolkit/25.5_12.9
  • nvcc at /opt/nvidia/hpc_sdk/Linux_aarch64/25.5/compilers/bin/nvcc (CUDA 12.9) (docs.ncsa.illinois.edu)

So regardless of whether you reach it through a higher-level nvidia/25.5-style module, the official DeltaAI documentation confirms CUDA paths are being set from the HPC SDK 25.5 installation. (docs.ncsa.illinois.edu)

2) MPI (OpenMPI/HPC‑X) usable for multi-rank jobs with nvc/nvc++/nvfortran

Not as OpenMPI/HPC‑X per the DeltaAI docs. DeltaAI’s documented MPI for multi-rank jobs is Cray MPICH, and GPU-aware MPI is enabled via:

The docs also describe the system programming environments (modules) as PrgEnv-gnu, PrgEnv-cray, and prgenv-nvidia (lowercase), i.e., a Cray Programming Environment workflow rather than an OpenMPI/HPC‑X one. (docs.ncsa.illinois.edu)

Separately, NVIDIA’s own HPC SDK 25.5 release notes indicate HPC‑X is selected by loading a dedicated nvhpc-hpcx-... environment module (i.e., it’s not implied to be automatically “the MPI” you get just by loading the base HPC SDK). (docs.nvidia.com)

Official documentation links (cluster + vendor)

  • DeltaAI “Programming Environment (Building Software)” (CUDA module + GPUDirect/Cray MPICH notes). (docs.ncsa.illinois.edu)
  • DeltaAI “Installed Software” (module system context). (docs.ncsa.illinois.edu)
  • NVIDIA HPC SDK 25.5 Release Notes (HPC‑X selection via nvhpc-hpcx-... module). (docs.nvidia.com)

What I could not confirm from public docs

I did not find a public DeltaAI page that enumerates the exact contents of a module named nvidia/25.5 (e.g., the full module show nvidia/25.5 output and whether it directly exposes an OpenMPI/HPC‑X tree).

If you paste the output of:

module show nvidia/25.5
module show prgenv-nvidia
module spider hpcx

I can tell you definitively whether nvidia/25.5 is setting CUDA vars/paths itself and whether it also wires in HPC‑X/OpenMPI on DeltaAI.

Citations:


🏁 Script executed:

# Locate and inspect the toolchain/modules file
fd -t f "modules" -path "*toolchain*" | head -20

Repository: MFlowCode/MFC

Length of output: 225


🏁 Script executed:

# Find the toolchain/modules file
git ls-files | grep -E "toolchain.*modules" | head -20

Repository: MFlowCode/MFC

Length of output: 105


🏁 Script executed:

# Read the toolchain/modules file with line numbers
cat -n toolchain/modules | head -90

Repository: MFlowCode/MFC

Length of output: 3374


🏁 Script executed:

# Check the git diff to see what was replaced on Line 73
git diff HEAD -- toolchain/modules | head -100

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

# Check git status and recent commits for this file
git status toolchain/modules
echo "---"
git log --oneline -10 -- toolchain/modules

Repository: MFlowCode/MFC

Length of output: 155


🏁 Script executed:

# Show the previous version of toolchain/modules (before "new delta AI module" commit)
git show 3bebe447~1:toolchain/modules | sed -n '59,76p'

Repository: MFlowCode/MFC

Length of output: 97


🏁 Script executed:

# Show the diff for the commit that added "new delta AI module"
git show 3bebe447 -- toolchain/modules

Repository: MFlowCode/MFC

Length of output: 3771


Add explicit MPI configuration to dai-all modules.

The nvidia/25.5 module provides CUDA, but DeltaAI's documented multi-rank workflow requires Cray MPICH with GPU support (set via MPICH_GPU_SUPPORT_ENABLED=1). Add either an explicit prgenv-nvidia module or cray-mpich module to the dai-all line to ensure MPI is available for multi-rank builds and runs, consistent with DeltaAI's programming environment guidelines.

@sbryngelson sbryngelson merged commit 92e751f into MFlowCode:master Mar 13, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants