Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion toolchain/modules
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ o-gpu nvhpc cuda/12.3.0 cmake/3.26.3
o-gpu CC=nvc CXX=nvc++ FC=nvfortran

dai NCSA DeltaAI
dai-all python cmake nvhpc-openmpi3/24.3 cuda
dai-all python cmake nvidia/25.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For NCSA DeltaAI, what components are included in the nvidia/25.5` module?
Specifically confirm whether it provides:

  1. CUDA toolkit/runtime paths, and
  2. MPI (OpenMPI/HPC-X) usable for multi-rank jobs with nvc/nvc++/nvfortran.
    Please include official cluster/module documentation links.`

💡 Result:

What the DeltaAI docs publicly confirm (re: NVIDIA HPC SDK 25.5)

1) CUDA toolkit/runtime paths

Yes—DeltaAI provides CUDA via its cudatoolkit module, and the documented nvcc path is inside the NVIDIA HPC SDK 25.5 install tree:

  • cudatoolkit/25.5_12.9
  • nvcc at /opt/nvidia/hpc_sdk/Linux_aarch64/25.5/compilers/bin/nvcc (CUDA 12.9) (docs.ncsa.illinois.edu)

So regardless of whether you reach it through a higher-level nvidia/25.5-style module, the official DeltaAI documentation confirms CUDA paths are being set from the HPC SDK 25.5 installation. (docs.ncsa.illinois.edu)

2) MPI (OpenMPI/HPC‑X) usable for multi-rank jobs with nvc/nvc++/nvfortran

Not as OpenMPI/HPC‑X per the DeltaAI docs. DeltaAI’s documented MPI for multi-rank jobs is Cray MPICH, and GPU-aware MPI is enabled via:

The docs also describe the system programming environments (modules) as PrgEnv-gnu, PrgEnv-cray, and prgenv-nvidia (lowercase), i.e., a Cray Programming Environment workflow rather than an OpenMPI/HPC‑X one. (docs.ncsa.illinois.edu)

Separately, NVIDIA’s own HPC SDK 25.5 release notes indicate HPC‑X is selected by loading a dedicated nvhpc-hpcx-... environment module (i.e., it’s not implied to be automatically “the MPI” you get just by loading the base HPC SDK). (docs.nvidia.com)

Official documentation links (cluster + vendor)

  • DeltaAI “Programming Environment (Building Software)” (CUDA module + GPUDirect/Cray MPICH notes). (docs.ncsa.illinois.edu)
  • DeltaAI “Installed Software” (module system context). (docs.ncsa.illinois.edu)
  • NVIDIA HPC SDK 25.5 Release Notes (HPC‑X selection via nvhpc-hpcx-... module). (docs.nvidia.com)

What I could not confirm from public docs

I did not find a public DeltaAI page that enumerates the exact contents of a module named nvidia/25.5 (e.g., the full module show nvidia/25.5 output and whether it directly exposes an OpenMPI/HPC‑X tree).

If you paste the output of:

module show nvidia/25.5
module show prgenv-nvidia
module spider hpcx

I can tell you definitively whether nvidia/25.5 is setting CUDA vars/paths itself and whether it also wires in HPC‑X/OpenMPI on DeltaAI.

Citations:


🏁 Script executed:

# Locate and inspect the toolchain/modules file
fd -t f "modules" -path "*toolchain*" | head -20

Repository: MFlowCode/MFC

Length of output: 225


🏁 Script executed:

# Find the toolchain/modules file
git ls-files | grep -E "toolchain.*modules" | head -20

Repository: MFlowCode/MFC

Length of output: 105


🏁 Script executed:

# Read the toolchain/modules file with line numbers
cat -n toolchain/modules | head -90

Repository: MFlowCode/MFC

Length of output: 3374


🏁 Script executed:

# Check the git diff to see what was replaced on Line 73
git diff HEAD -- toolchain/modules | head -100

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

# Check git status and recent commits for this file
git status toolchain/modules
echo "---"
git log --oneline -10 -- toolchain/modules

Repository: MFlowCode/MFC

Length of output: 155


🏁 Script executed:

# Show the previous version of toolchain/modules (before "new delta AI module" commit)
git show 3bebe447~1:toolchain/modules | sed -n '59,76p'

Repository: MFlowCode/MFC

Length of output: 97


🏁 Script executed:

# Show the diff for the commit that added "new delta AI module"
git show 3bebe447 -- toolchain/modules

Repository: MFlowCode/MFC

Length of output: 3771


Add explicit MPI configuration to dai-all modules.

The nvidia/25.5 module provides CUDA, but DeltaAI's documented multi-rank workflow requires Cray MPICH with GPU support (set via MPICH_GPU_SUPPORT_ENABLED=1). Add either an explicit prgenv-nvidia module or cray-mpich module to the dai-all line to ensure MPI is available for multi-rank builds and runs, consistent with DeltaAI's programming environment guidelines.

dai-all CC=nvc CXX=nvc++ FC=nvfortran
dai-gpu MFC_CUDA_CC=89,90

Expand Down
Loading