@@ -38,20 +38,27 @@ Inline macros (use `$:` prefix):
3838- ` $:GPU_WAIT() ` — Synchronization barrier.
3939
4040Block macros (use ` #:call ` /` #:endcall ` ):
41- - ` GPU_PARALLEL(...) ` — GPU parallel region wrapping a code block .
41+ - ` GPU_PARALLEL(...) ` — GPU parallel region (used for scalar reductions like ` maxval ` / ` minval ` ) .
4242- ` GPU_DATA(copy=..., create=..., ...) ` — Scoped data region.
4343- ` GPU_HOST_DATA(use_device_addr=[...]) ` — Host code with device pointers.
4444
45- Block macro usage :
45+ Typical GPU loop pattern (used 750+ times in the codebase) :
4646```
47- #:call GPU_PARALLEL(copyin='[var1]', copyout='[var2]')
48- $:GPU_LOOP(collapse=N)
49- do k = 0, n; do j = 0, m
50- ! loop body
51- end do; end do
52- #:endcall GPU_PARALLEL
47+ $:GPU_PARALLEL_LOOP(private='[i,j,k,l]', collapse=3)
48+ do l = idwbuff(3)%beg, idwbuff(3)%end
49+ do k = idwbuff(2)%beg, idwbuff(2)%end
50+ do j = idwbuff(1)%beg, idwbuff(1)%end
51+ ! loop body
52+ end do
53+ end do
54+ end do
55+ $:END_GPU_PARALLEL_LOOP()
5356```
5457
58+ WARNING: Do NOT use ` GPU_PARALLEL ` wrapping ` GPU_LOOP ` for spatial loops. ` GPU_LOOP `
59+ emits empty directives on Cray and AMD compilers, causing silent serial execution.
60+ Use ` GPU_PARALLEL_LOOP ` / ` END_GPU_PARALLEL_LOOP ` for all parallel spatial loops.
61+
5562NEVER write raw ` !$acc ` or ` !$omp ` directives. Always use ` GPU_* ` Fypp macros.
5663The precheck source lint will catch raw directives and fail.
5764
@@ -67,13 +74,17 @@ The precheck source lint will catch raw directives and fail.
6774- These compile only for Cray (` _CRAYFTN ` ); other compilers skip them
6875
6976### Compiler-Backend Matrix
70- | Compiler | ` --gpu acc ` (OpenACC) | ` --gpu mp ` (OpenMP) | CPU-only |
71- | -----------------| ----------------------| ---------------------| ----------|
72- | GNU gfortran | No | No | Yes |
73- | NVIDIA nvfortran| Yes (primary) | Yes | Yes |
74- | Cray ftn (CCE) | Yes | Yes (primary) | Yes |
75- | Intel ifx | No | No | Yes |
76- | AMD flang | No | Yes | Yes |
77+
78+ CI-gated compilers (must always pass): gfortran, nvfortran, Cray ftn, Intel ifx.
79+ AMD flang is additionally supported for GPU builds but not in the CI matrix.
80+
81+ | Compiler | ` --gpu acc ` (OpenACC) | ` --gpu mp ` (OpenMP) | CPU-only |
82+ | -----------------| ----------------------| ------------------------| ----------|
83+ | GNU gfortran | No | Experimental (AMD GCN) | Yes |
84+ | NVIDIA nvfortran| Yes (primary) | Yes | Yes |
85+ | Cray ftn (CCE) | Yes | Yes (primary) | Yes |
86+ | Intel ifx | No | Experimental (SPIR64) | Yes |
87+ | AMD flang | No | Yes | Yes |
7788
7889## Preprocessor Defines (` #ifdef ` / ` #ifndef ` )
7990
0 commit comments