You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add CUDA extension, fix docs, configure CI for MPICH_jll
- Add LinearAlgebraMPICUDAExt.jl with cu()/cpu() conversions and
CuDSSFactorizationMPI for distributed sparse direct solves via NCCL
- Add codecov.yml to exclude GPU extensions from coverage
- Document cuDSS MGMN bug (status=5 on narrow-bandwidth matrices)
- Fix CUDA setup docs: requires CUDA, NCCL, CUDSS_jll
- Fix MPI.Init() placement in docs (after using statements)
- Configure CI to use MPICH_jll to avoid MUMPS hang
- Add commit message policy to CLAUDE.md
Copy file name to clipboardExpand all lines: CLAUDE.md
+53-8Lines changed: 53 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,10 @@
2
2
3
3
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
4
5
+
## Commit Messages
6
+
7
+
Do NOT add "Co-Authored-By" lines or any other self-attribution to commit messages. Do NOT advertise Claude or Anthropic in commits. Keep commit messages focused on describing the changes only.
8
+
5
9
## Build and Test Commands
6
10
7
11
```bash
@@ -29,15 +33,26 @@ mpiexec -n 4 julia --project=. test/test_factorization.jl
29
33
julia --project=. -e 'using Pkg; Pkg.precompile()'
30
34
```
31
35
36
+
## MPI Configuration
37
+
38
+
By default, MPI.jl uses MPItrampoline_jll. On some Linux clusters, this causes MUMPS to hang during the solve phase. If you experience hangs with multi-rank MUMPS tests, switch to MPICH_jll:
39
+
40
+
```julia
41
+
using MPIPreferences
42
+
MPIPreferences.use_jll_binary("MPICH_jll")
43
+
```
44
+
45
+
This creates/updates `LocalPreferences.toml` (which is gitignored). Restart Julia after changing MPI preferences.
46
+
32
47
## GPU Support
33
48
34
-
GPU acceleration is supported via Metal.jl (macOS) as a package extension.
49
+
GPU acceleration is supported via Metal.jl (macOS) or CUDA.jl (Linux/Windows) as package extensions.
35
50
36
51
### Type Parameters
37
52
38
-
-`VectorMPI{T,AV}` where `AV` is `Vector{T}` (CPU)or `MtlVector{T}` (GPU)
39
-
-`MatrixMPI{T,AM}` where `AM` is `Matrix{T}` (CPU)or `MtlMatrix{T}` (GPU)
40
-
-`SparseMatrixMPI{T,Ti,AV}` where `AV` is `Vector{T}` (CPU) or `MtlVector{T}` (GPU) for the `nzval` array
53
+
-`VectorMPI{T,AV}` where `AV` is `Vector{T}` (CPU), `MtlVector{T}` (Metal), or `CuVector{T}` (CUDA)
54
+
-`MatrixMPI{T,AM}` where `AM` is `Matrix{T}` (CPU), `MtlMatrix{T}` (Metal), or `CuMatrix{T}` (CUDA)
55
+
-`SparseMatrixMPI{T,Ti,AV}` where `AV` is `Vector{T}` (CPU), `MtlVector{T}`, or `CuVector{T}` for the `nzval` array
41
56
- Type aliases: `VectorMPI_CPU{T}`, `MatrixMPI_CPU{T}`, `SparseMatrixMPI_CPU{T,Ti}` for CPU-backed types
42
57
43
58
### Creating Zero Arrays
@@ -55,15 +70,20 @@ A = zeros(MatrixMPI_CPU{Float64}, 50, 30)
55
70
S =zeros(SparseMatrixMPI{Float64,Int,Vector{Float64}}, 100, 100)
56
71
S =zeros(SparseMatrixMPI_CPU{Float64,Int}, 100, 100)
57
72
58
-
# GPU zero arrays (requires Metal.jl loaded)
73
+
# GPU zero arrays (requires Metal.jl or CUDA.jl loaded)
MPI communication always uses CPU buffers (no Metal-aware MPI exists). GPU data is staged through CPU:
86
+
MPI communication always uses CPU buffers (no GPU-aware MPI). GPU data is staged through CPU:
67
87
68
88
1. GPU vector data copied to CPU staging buffer
69
89
2. MPI communication on CPU buffers
@@ -84,7 +104,32 @@ Sparse matrices remain on CPU (Julia's `SparseMatrixCSC` doesn't support GPU arr
84
104
### Extension Files
85
105
86
106
-`ext/LinearAlgebraMPIMetalExt.jl` - Metal extension with `mtl()` and `cpu()` functions
87
-
- Loaded automatically when `using Metal` before `using LinearAlgebraMPI`
107
+
-`ext/LinearAlgebraMPICUDAExt.jl` - CUDA extension with `cu()` and `cpu()` functions, plus cuDSS multi-GPU solver
108
+
- Loaded automatically when `using Metal` or `using CUDA` before `using LinearAlgebraMPI`
109
+
110
+
### CUDA-Specific: cuDSS Multi-GPU Solver
111
+
112
+
The CUDA extension includes `CuDSSFactorizationMPI` for distributed sparse direct solves using NVIDIA's cuDSS library with NCCL inter-GPU communication:
-**Vector operations**: norms, reductions, arithmetic with automatic partition alignment
22
22
- Support for both `Float64` and `ComplexF64` element types
23
-
-**GPU acceleration** via Metal.jl (macOS) with automatic CPU staging for MPI
23
+
-**GPU acceleration** via Metal.jl (macOS) or CUDA.jl (Linux/Windows) with automatic CPU staging for MPI
24
+
-**Multi-GPU sparse direct solver** via cuDSS with NCCL communication (CUDA only)
24
25
25
26
## Installation
26
27
@@ -66,11 +67,11 @@ F = ldlt(A_sym_dist) # LDLT factorization
66
67
x_sol =solve(F, y) # Solve A_sym * x_sol = y
67
68
```
68
69
69
-
## GPU Support (Metal)
70
+
## GPU Support
70
71
71
-
LinearAlgebraMPI supports GPU acceleration on macOS via Metal.jl. GPU support is optional - Metal.jl is loaded as a weak dependency.
72
+
LinearAlgebraMPI supports GPU acceleration via Metal.jl (macOS) or CUDA.jl (Linux/Windows). GPU support is optional - extensions are loaded as weak dependencies.
72
73
73
-
### Converting between CPU and GPU
74
+
### Metal (macOS)
74
75
75
76
```julia
76
77
using Metal # Load Metal BEFORE MPI for GPU detection
0 commit comments