Skip to content

Commit b528867

Browse files
sbryngelsonclaude
andauthored
Fix GPU example, compiler matrix, and AMD flang consistency (#1256)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 7997663 commit b528867

3 files changed

Lines changed: 31 additions & 19 deletions

File tree

.claude/rules/common-pitfalls.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,10 @@
3636
- Boundary condition symmetry requirements must be maintained
3737

3838
## Compiler-Specific Issues
39-
- Code must compile on gfortran, nvfortran, Cray ftn, and Intel ifx
39+
- CI-gated compilers (must always pass): gfortran, nvfortran, Cray ftn, and Intel ifx
40+
- AMD flang is additionally supported for `--gpu mp` builds but not in the CI matrix
4041
- Each compiler has different strictness levels and warning behavior
4142
- Fypp macros must expand correctly for both GPU and CPU builds
42-
- GPU builds only work with nvfortran, Cray ftn, and AMD flang
4343

4444
## Test System
4545
- Tests are generated **programmatically** in `toolchain/mfc/test/cases.py`, not standalone files

.claude/rules/gpu-and-mpi.md

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,20 +38,27 @@ Inline macros (use `$:` prefix):
3838
- `$:GPU_WAIT()` — Synchronization barrier.
3939

4040
Block macros (use `#:call`/`#:endcall`):
41-
- `GPU_PARALLEL(...)` — GPU parallel region wrapping a code block.
41+
- `GPU_PARALLEL(...)` — GPU parallel region (used for scalar reductions like `maxval`/`minval`).
4242
- `GPU_DATA(copy=..., create=..., ...)` — Scoped data region.
4343
- `GPU_HOST_DATA(use_device_addr=[...])` — Host code with device pointers.
4444

45-
Block macro usage:
45+
Typical GPU loop pattern (used 750+ times in the codebase):
4646
```
47-
#:call GPU_PARALLEL(copyin='[var1]', copyout='[var2]')
48-
$:GPU_LOOP(collapse=N)
49-
do k = 0, n; do j = 0, m
50-
! loop body
51-
end do; end do
52-
#:endcall GPU_PARALLEL
47+
$:GPU_PARALLEL_LOOP(private='[i,j,k,l]', collapse=3)
48+
do l = idwbuff(3)%beg, idwbuff(3)%end
49+
do k = idwbuff(2)%beg, idwbuff(2)%end
50+
do j = idwbuff(1)%beg, idwbuff(1)%end
51+
! loop body
52+
end do
53+
end do
54+
end do
55+
$:END_GPU_PARALLEL_LOOP()
5356
```
5457

58+
WARNING: Do NOT use `GPU_PARALLEL` wrapping `GPU_LOOP` for spatial loops. `GPU_LOOP`
59+
emits empty directives on Cray and AMD compilers, causing silent serial execution.
60+
Use `GPU_PARALLEL_LOOP` / `END_GPU_PARALLEL_LOOP` for all parallel spatial loops.
61+
5562
NEVER write raw `!$acc` or `!$omp` directives. Always use `GPU_*` Fypp macros.
5663
The precheck source lint will catch raw directives and fail.
5764

@@ -67,13 +74,17 @@ The precheck source lint will catch raw directives and fail.
6774
- These compile only for Cray (`_CRAYFTN`); other compilers skip them
6875

6976
### Compiler-Backend Matrix
70-
| Compiler | `--gpu acc` (OpenACC) | `--gpu mp` (OpenMP) | CPU-only |
71-
|-----------------|----------------------|---------------------|----------|
72-
| GNU gfortran | No | No | Yes |
73-
| NVIDIA nvfortran| Yes (primary) | Yes | Yes |
74-
| Cray ftn (CCE) | Yes | Yes (primary) | Yes |
75-
| Intel ifx | No | No | Yes |
76-
| AMD flang | No | Yes | Yes |
77+
78+
CI-gated compilers (must always pass): gfortran, nvfortran, Cray ftn, Intel ifx.
79+
AMD flang is additionally supported for GPU builds but not in the CI matrix.
80+
81+
| Compiler | `--gpu acc` (OpenACC) | `--gpu mp` (OpenMP) | CPU-only |
82+
|-----------------|----------------------|------------------------|----------|
83+
| GNU gfortran | No | Experimental (AMD GCN) | Yes |
84+
| NVIDIA nvfortran| Yes (primary) | Yes | Yes |
85+
| Cray ftn (CCE) | Yes | Yes (primary) | Yes |
86+
| Intel ifx | No | Experimental (SPIR64) | Yes |
87+
| AMD flang | No | Yes | Yes |
7788

7889
## Preprocessor Defines (`#ifdef` / `#ifndef`)
7990

CLAUDE.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
MFC is an exascale multi-physics CFD solver written in modern Fortran 2008+ with Fypp
44
preprocessing. It has three executables (pre_process, simulation, post_process), a Python
55
toolchain for building/running/testing, and supports GPU acceleration via OpenACC and
6-
OpenMP target offload. It must compile with gfortran, nvfortran, Cray ftn, and Intel ifx.
6+
OpenMP target offload. It must compile with gfortran, nvfortran, Cray ftn, and Intel ifx (CI-gated).
7+
AMD flang is additionally supported for OpenMP target offload GPU builds.
78

89
## Commands
910

@@ -167,4 +168,4 @@ When reviewing PRs, prioritize in this order:
167168
4. MPI correctness (halo exchange, buffer sizing, GPU_UPDATE calls)
168169
5. GPU code (GPU_* Fypp macros only, no raw pragmas)
169170
6. Physics consistency (pressure formula matches model_eqns)
170-
7. Compiler portability (all four compilers)
171+
7. Compiler portability (4 CI-gated compilers + AMD flang for GPU)

0 commit comments

Comments
 (0)