Skip to content

Commit 93cb53b

Browse files
authored
Merge pull request #32 from converged-computing/add-compute-engine-gpu-size-32
compute-engine gpu size 32
2 parents bdae907 + 6d4c84a commit 93cb53b

182 files changed

Lines changed: 1820328 additions & 2 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,8 @@ This is a checklist for the setups we have tested and timed:
5050
- [x] size 4 (vsoch 9/6/2024)
5151
- [x] size 8 (vsoch 9/7/2024)
5252
- [x] size 16 (vsoch 9/8/2024)
53-
- [ ] size 32 (vsoch TBA 9/2024)
54-
- [ ] quicksilver and osu all reduce need runs at all sizes if/when bug figured out
53+
- [x] size 32 (vsoch 9/8/2024)
54+
- [ ] quicksilver and osu all reduce need runs at all sizes.
5555

5656
### Kubernetes
5757

experiments/google/compute-engine/gpu/size32/README.md

Lines changed: 465 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Running with these driver parameters:
2+
Problem ID = 2
3+
4+
=============================================
5+
Hypre init times:
6+
=============================================
7+
Hypre init:
8+
wall clock time = 0.000063 seconds
9+
Laplacian_7pt:
10+
(Nx, Ny, Nz) = (1024, 1024, 512)
11+
(Px, Py, Pz) = (8, 8, 4)
12+
13+
=============================================
14+
Generate Matrix:
15+
=============================================
16+
Spatial Operator:
17+
wall clock time = 1.495498 seconds
18+
RHS vector has unit components
19+
Initial guess is 0
20+
=============================================
21+
IJ Vector Setup:
22+
=============================================
23+
RHS and Initial Guess:
24+
wall clock time = 0.162705 seconds
25+
=============================================
26+
Problem 2: AMG Setup Time:
27+
=============================================
28+
PCG Setup:
29+
wall clock time = 18.367167 seconds
30+
31+
FOM_Setup: nnz_AP / Setup Phase Time: 3.256486e+08
32+
33+
=============================================
34+
Problem 2: AMG-PCG Solve Time:
35+
=============================================
36+
PCG Solve:
37+
wall clock time = 26.138888 seconds
38+
39+
Iterations = 53
40+
Final Relative Residual Norm = 7.053063e-09
41+
42+
43+
FOM_Solve: nnz_AP * iterations / Solve Phase Time: 2.288254e+08
44+
45+
46+
47+
Figure of Merit (FOM): nnz_AP / (Setup Phase Time + 3 * Solve Phase Time) 6.180002e+07
48+
49+
START OF JOBSPEC
50+
{"resources": [{"type": "node", "count": 32, "with": [{"type": "slot", "count": 8, "with": [{"type": "core", "count": 1}, {"type": "gpu", "count": 1}], "label": "task"}]}], "tasks": [{"command": ["singularity", "exec", "--nv", "/opt/containers/metric-amg2023_spack-older-intel.sif", "/opt/view/bin/amg", "-n", "128", "128", "128", "-P", "8", "8", "4", "-problem", "2"], "slot": "task", "count": {"per_slot": 1}}], "attributes": {"system": {"duration": 0, "cwd": "/home/sochat1_llnl_gov", "shell": {"options": {"rlimit": {"cpu": -1, "fsize": -1, "data": -1, "stack": 8388608, "core": 0, "nofile": 1048576, "as": -1, "rss": -1, "nproc": -1}, "pmi": "pmix", "gpu-affinity": "per-task", "cpu-affinity": "per-task"}}}, "user": {"study_id": "amg2023-32-iter-1"}}, "version": 1}
51+
START OF EVENTLOG
52+
{"timestamp":1725847458.4148834,"name":"init"}
53+
{"timestamp":1725847458.4158838,"name":"starting"}
54+
{"timestamp":1725847458.8013756,"name":"shell.init","context":{"service":"501043911-shell-fYzBAJ1u","leader-rank":0,"size":32}}
55+
{"timestamp":1725847458.8297007,"name":"shell.start","context":{"taskmap":{"version":1,"map":[[0,32,8,1]]}}}
56+
{"timestamp":1725847511.7282193,"name":"shell.task-exit","context":{"localid":0,"rank":24,"state":"Exited","pid":5304,"wait_status":0,"signaled":0,"exitcode":0}}
57+
{"timestamp":1725847512.0038643,"name":"complete","context":{"status":0}}
58+
{"timestamp":1725847512.003891,"name":"done"}
59+
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Running with these driver parameters:
2+
Problem ID = 2
3+
4+
=============================================
5+
Hypre init times:
6+
=============================================
7+
Hypre init:
8+
wall clock time = 0.000050 seconds
9+
Laplacian_7pt:
10+
(Nx, Ny, Nz) = (2048, 1024, 512)
11+
(Px, Py, Pz) = (8, 8, 4)
12+
13+
=============================================
14+
Generate Matrix:
15+
=============================================
16+
Spatial Operator:
17+
wall clock time = 1.805819 seconds
18+
RHS vector has unit components
19+
Initial guess is 0
20+
=============================================
21+
IJ Vector Setup:
22+
=============================================
23+
RHS and Initial Guess:
24+
wall clock time = 0.268025 seconds
25+
=============================================
26+
Problem 2: AMG Setup Time:
27+
=============================================
28+
PCG Setup:
29+
wall clock time = 17.291235 seconds
30+
31+
FOM_Setup: nnz_AP / Setup Phase Time: 6.924117e+08
32+
33+
=============================================
34+
Problem 2: AMG-PCG Solve Time:
35+
=============================================
36+
PCG Solve:
37+
wall clock time = 25.487557 seconds
38+
39+
Iterations = 45
40+
Final Relative Residual Norm = 8.092724e-09
41+
42+
43+
FOM_Solve: nnz_AP * iterations / Solve Phase Time: 4.697451e+08
44+
45+
46+
47+
Figure of Merit (FOM): nnz_AP / (Setup Phase Time + 3 * Solve Phase Time) 1.277030e+08
48+
49+
START OF JOBSPEC
50+
{"resources": [{"type": "node", "count": 32, "with": [{"type": "slot", "count": 8, "with": [{"type": "core", "count": 1}, {"type": "gpu", "count": 1}], "label": "task"}]}], "tasks": [{"command": ["singularity", "exec", "--nv", "/opt/containers/metric-amg2023_spack-older-intel.sif", "/opt/view/bin/amg", "-n", "256", "128", "128", "-P", "8", "8", "4", "-problem", "2"], "slot": "task", "count": {"per_slot": 1}}], "attributes": {"system": {"duration": 0, "cwd": "/home/sochat1_llnl_gov", "shell": {"options": {"rlimit": {"cpu": -1, "fsize": -1, "data": -1, "stack": 8388608, "core": 0, "nofile": 1048576, "as": -1, "rss": -1, "nproc": -1}, "pmi": "pmix", "gpu-affinity": "per-task", "cpu-affinity": "per-task"}}}, "user": {"study_id": "amg2023-32-iter-1"}}, "version": 1}
51+
START OF EVENTLOG
52+
{"timestamp":1725847405.3015261,"name":"init"}
53+
{"timestamp":1725847405.302557,"name":"starting"}
54+
{"timestamp":1725847405.4819455,"name":"shell.init","context":{"service":"501043911-shell-f9aYjyP5","leader-rank":0,"size":32}}
55+
{"timestamp":1725847405.5198989,"name":"shell.start","context":{"taskmap":{"version":1,"map":[[0,32,8,1]]}}}
56+
{"timestamp":1725847457.7163939,"name":"shell.task-exit","context":{"localid":6,"rank":94,"state":"Exited","pid":4929,"wait_status":0,"signaled":0,"exitcode":0}}
57+
{"timestamp":1725847458.1463132,"name":"complete","context":{"status":0}}
58+
{"timestamp":1725847458.1463392,"name":"done"}
59+
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Running with these driver parameters:
2+
Problem ID = 2
3+
4+
=============================================
5+
Hypre init times:
6+
=============================================
7+
Hypre init:
8+
wall clock time = 0.000055 seconds
9+
Laplacian_7pt:
10+
(Nx, Ny, Nz) = (2048, 1024, 512)
11+
(Px, Py, Pz) = (8, 8, 4)
12+
13+
=============================================
14+
Generate Matrix:
15+
=============================================
16+
Spatial Operator:
17+
wall clock time = 1.578065 seconds
18+
RHS vector has unit components
19+
Initial guess is 0
20+
=============================================
21+
IJ Vector Setup:
22+
=============================================
23+
RHS and Initial Guess:
24+
wall clock time = 0.524260 seconds
25+
=============================================
26+
Problem 2: AMG Setup Time:
27+
=============================================
28+
PCG Setup:
29+
wall clock time = 18.042698 seconds
30+
31+
FOM_Setup: nnz_AP / Setup Phase Time: 6.635734e+08
32+
33+
=============================================
34+
Problem 2: AMG-PCG Solve Time:
35+
=============================================
36+
PCG Solve:
37+
wall clock time = 22.877173 seconds
38+
39+
Iterations = 45
40+
Final Relative Residual Norm = 8.092724e-09
41+
42+
43+
FOM_Solve: nnz_AP * iterations / Solve Phase Time: 5.233450e+08
44+
45+
46+
47+
Figure of Merit (FOM): nnz_AP / (Setup Phase Time + 3 * Solve Phase Time) 1.381340e+08
48+
49+
START OF JOBSPEC
50+
{"resources": [{"type": "node", "count": 32, "with": [{"type": "slot", "count": 8, "with": [{"type": "core", "count": 1}, {"type": "gpu", "count": 1}], "label": "task"}]}], "tasks": [{"command": ["singularity", "exec", "--nv", "/opt/containers/metric-amg2023_spack-older-intel.sif", "/opt/view/bin/amg", "-n", "256", "128", "128", "-P", "8", "8", "4", "-problem", "2"], "slot": "task", "count": {"per_slot": 1}}], "attributes": {"system": {"duration": 0, "cwd": "/home/sochat1_llnl_gov", "shell": {"options": {"rlimit": {"cpu": -1, "fsize": -1, "data": -1, "stack": 8388608, "core": 0, "nofile": 1048576, "as": -1, "rss": -1, "nproc": -1}, "pmi": "pmix", "gpu-affinity": "per-task", "cpu-affinity": "per-task"}}}, "user": {"study_id": "amg2023-32-iter-2"}}, "version": 1}
51+
START OF EVENTLOG
52+
{"timestamp":1725847512.2747788,"name":"init"}
53+
{"timestamp":1725847512.2757919,"name":"starting"}
54+
{"timestamp":1725847512.6651525,"name":"shell.init","context":{"service":"501043911-shell-fxivXMSf","leader-rank":0,"size":32}}
55+
{"timestamp":1725847512.6925321,"name":"shell.start","context":{"taskmap":{"version":1,"map":[[0,32,8,1]]}}}
56+
{"timestamp":1725847562.0019486,"name":"shell.task-exit","context":{"localid":5,"rank":181,"state":"Exited","pid":5545,"wait_status":0,"signaled":0,"exitcode":0}}
57+
{"timestamp":1725847562.4609094,"name":"complete","context":{"status":0}}
58+
{"timestamp":1725847562.4609354,"name":"done"}
59+
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Running with these driver parameters:
2+
Problem ID = 2
3+
4+
=============================================
5+
Hypre init times:
6+
=============================================
7+
Hypre init:
8+
wall clock time = 0.000052 seconds
9+
Laplacian_7pt:
10+
(Nx, Ny, Nz) = (1024, 1024, 512)
11+
(Px, Py, Pz) = (8, 8, 4)
12+
13+
=============================================
14+
Generate Matrix:
15+
=============================================
16+
Spatial Operator:
17+
wall clock time = 1.502375 seconds
18+
RHS vector has unit components
19+
Initial guess is 0
20+
=============================================
21+
IJ Vector Setup:
22+
=============================================
23+
RHS and Initial Guess:
24+
wall clock time = 0.288517 seconds
25+
=============================================
26+
Problem 2: AMG Setup Time:
27+
=============================================
28+
PCG Setup:
29+
wall clock time = 17.709456 seconds
30+
31+
FOM_Setup: nnz_AP / Setup Phase Time: 3.377429e+08
32+
33+
=============================================
34+
Problem 2: AMG-PCG Solve Time:
35+
=============================================
36+
PCG Solve:
37+
wall clock time = 25.612248 seconds
38+
39+
Iterations = 53
40+
Final Relative Residual Norm = 7.053063e-09
41+
42+
43+
FOM_Solve: nnz_AP * iterations / Solve Phase Time: 2.335306e+08
44+
45+
46+
47+
Figure of Merit (FOM): nnz_AP / (Setup Phase Time + 3 * Solve Phase Time) 6.326264e+07
48+
49+
START OF JOBSPEC
50+
{"resources": [{"type": "node", "count": 32, "with": [{"type": "slot", "count": 8, "with": [{"type": "core", "count": 1}, {"type": "gpu", "count": 1}], "label": "task"}]}], "tasks": [{"command": ["singularity", "exec", "--nv", "/opt/containers/metric-amg2023_spack-older-intel.sif", "/opt/view/bin/amg", "-n", "128", "128", "128", "-P", "8", "8", "4", "-problem", "2"], "slot": "task", "count": {"per_slot": 1}}], "attributes": {"system": {"duration": 0, "cwd": "/home/sochat1_llnl_gov", "shell": {"options": {"rlimit": {"cpu": -1, "fsize": -1, "data": -1, "stack": 8388608, "core": 0, "nofile": 1048576, "as": -1, "rss": -1, "nproc": -1}, "pmi": "pmix", "gpu-affinity": "per-task", "cpu-affinity": "per-task"}}}, "user": {"study_id": "amg2023-32-iter-2"}}, "version": 1}
51+
START OF EVENTLOG
52+
{"timestamp":1725847562.7275102,"name":"init"}
53+
{"timestamp":1725847562.7285142,"name":"starting"}
54+
{"timestamp":1725847563.1316919,"name":"shell.init","context":{"service":"501043911-shell-f2LxXtdBu","leader-rank":0,"size":32}}
55+
{"timestamp":1725847563.1602051,"name":"shell.start","context":{"taskmap":{"version":1,"map":[[0,32,8,1]]}}}
56+
{"timestamp":1725847615.1242874,"name":"shell.task-exit","context":{"localid":0,"rank":120,"state":"Exited","pid":5952,"wait_status":0,"signaled":0,"exitcode":0}}
57+
{"timestamp":1725847615.487427,"name":"complete","context":{"status":0}}
58+
{"timestamp":1725847615.4874606,"name":"done"}
59+
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Running with these driver parameters:
2+
Problem ID = 2
3+
4+
=============================================
5+
Hypre init times:
6+
=============================================
7+
Hypre init:
8+
wall clock time = 0.000053 seconds
9+
Laplacian_7pt:
10+
(Nx, Ny, Nz) = (2048, 1024, 512)
11+
(Px, Py, Pz) = (8, 8, 4)
12+
13+
=============================================
14+
Generate Matrix:
15+
=============================================
16+
Spatial Operator:
17+
wall clock time = 1.573947 seconds
18+
RHS vector has unit components
19+
Initial guess is 0
20+
=============================================
21+
IJ Vector Setup:
22+
=============================================
23+
RHS and Initial Guess:
24+
wall clock time = 0.328985 seconds
25+
=============================================
26+
Problem 2: AMG Setup Time:
27+
=============================================
28+
PCG Setup:
29+
wall clock time = 17.899451 seconds
30+
31+
FOM_Setup: nnz_AP / Setup Phase Time: 6.688839e+08
32+
33+
=============================================
34+
Problem 2: AMG-PCG Solve Time:
35+
=============================================
36+
PCG Solve:
37+
wall clock time = 24.904452 seconds
38+
39+
Iterations = 45
40+
Final Relative Residual Norm = 8.092724e-09
41+
42+
43+
FOM_Solve: nnz_AP * iterations / Solve Phase Time: 4.807435e+08
44+
45+
46+
47+
Figure of Merit (FOM): nnz_AP / (Setup Phase Time + 3 * Solve Phase Time) 1.292764e+08
48+
49+
START OF JOBSPEC
50+
{"resources": [{"type": "node", "count": 32, "with": [{"type": "slot", "count": 8, "with": [{"type": "core", "count": 1}, {"type": "gpu", "count": 1}], "label": "task"}]}], "tasks": [{"command": ["singularity", "exec", "--nv", "/opt/containers/metric-amg2023_spack-older-intel.sif", "/opt/view/bin/amg", "-n", "256", "128", "128", "-P", "8", "8", "4", "-problem", "2"], "slot": "task", "count": {"per_slot": 1}}], "attributes": {"system": {"duration": 0, "cwd": "/home/sochat1_llnl_gov", "shell": {"options": {"rlimit": {"cpu": -1, "fsize": -1, "data": -1, "stack": 8388608, "core": 0, "nofile": 1048576, "as": -1, "rss": -1, "nproc": -1}, "pmi": "pmix", "gpu-affinity": "per-task", "cpu-affinity": "per-task"}}}, "user": {"study_id": "amg2023-32-iter-3"}}, "version": 1}
51+
START OF EVENTLOG
52+
{"timestamp":1725847615.7553241,"name":"init"}
53+
{"timestamp":1725847615.7563901,"name":"starting"}
54+
{"timestamp":1725847616.1450379,"name":"shell.init","context":{"service":"501043911-shell-f2kKzHzsR","leader-rank":0,"size":32}}
55+
{"timestamp":1725847616.1728113,"name":"shell.start","context":{"taskmap":{"version":1,"map":[[0,32,8,1]]}}}
56+
{"timestamp":1725847668.2972875,"name":"shell.task-exit","context":{"localid":3,"rank":91,"state":"Exited","pid":6212,"wait_status":0,"signaled":0,"exitcode":0}}
57+
{"timestamp":1725847668.7292132,"name":"complete","context":{"status":0}}
58+
{"timestamp":1725847668.7292409,"name":"done"}
59+
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Running with these driver parameters:
2+
Problem ID = 2
3+
4+
=============================================
5+
Hypre init times:
6+
=============================================
7+
Hypre init:
8+
wall clock time = 0.000065 seconds
9+
Laplacian_7pt:
10+
(Nx, Ny, Nz) = (1024, 1024, 512)
11+
(Px, Py, Pz) = (8, 8, 4)
12+
13+
=============================================
14+
Generate Matrix:
15+
=============================================
16+
Spatial Operator:
17+
wall clock time = 1.574598 seconds
18+
RHS vector has unit components
19+
Initial guess is 0
20+
=============================================
21+
IJ Vector Setup:
22+
=============================================
23+
RHS and Initial Guess:
24+
wall clock time = 0.489109 seconds
25+
=============================================
26+
Problem 2: AMG Setup Time:
27+
=============================================
28+
PCG Setup:
29+
wall clock time = 17.448968 seconds
30+
31+
FOM_Setup: nnz_AP / Setup Phase Time: 3.427849e+08
32+
33+
=============================================
34+
Problem 2: AMG-PCG Solve Time:
35+
=============================================
36+
PCG Solve:
37+
wall clock time = 27.760326 seconds
38+
39+
Iterations = 53
40+
Final Relative Residual Norm = 7.053063e-09
41+
42+
43+
FOM_Solve: nnz_AP * iterations / Solve Phase Time: 2.154601e+08
44+
45+
46+
47+
Figure of Merit (FOM): nnz_AP / (Setup Phase Time + 3 * Solve Phase Time) 5.937899e+07
48+
49+
START OF JOBSPEC
50+
{"resources": [{"type": "node", "count": 32, "with": [{"type": "slot", "count": 8, "with": [{"type": "core", "count": 1}, {"type": "gpu", "count": 1}], "label": "task"}]}], "tasks": [{"command": ["singularity", "exec", "--nv", "/opt/containers/metric-amg2023_spack-older-intel.sif", "/opt/view/bin/amg", "-n", "128", "128", "128", "-P", "8", "8", "4", "-problem", "2"], "slot": "task", "count": {"per_slot": 1}}], "attributes": {"system": {"duration": 0, "cwd": "/home/sochat1_llnl_gov", "shell": {"options": {"rlimit": {"cpu": -1, "fsize": -1, "data": -1, "stack": 8388608, "core": 0, "nofile": 1048576, "as": -1, "rss": -1, "nproc": -1}, "pmi": "pmix", "gpu-affinity": "per-task", "cpu-affinity": "per-task"}}}, "user": {"study_id": "amg2023-32-iter-3"}}, "version": 1}
51+
START OF EVENTLOG
52+
{"timestamp":1725847668.9972515,"name":"init"}
53+
{"timestamp":1725847668.998266,"name":"starting"}
54+
{"timestamp":1725847669.3921137,"name":"shell.init","context":{"service":"501043911-shell-f39nuxirj","leader-rank":0,"size":32}}
55+
{"timestamp":1725847669.4202955,"name":"shell.start","context":{"taskmap":{"version":1,"map":[[0,32,8,1]]}}}
56+
{"timestamp":1725847722.7744753,"name":"shell.task-exit","context":{"localid":4,"rank":84,"state":"Exited","pid":6530,"wait_status":0,"signaled":0,"exitcode":0}}
57+
{"timestamp":1725847723.1359589,"name":"complete","context":{"status":0}}
58+
{"timestamp":1725847723.1359909,"name":"done"}
59+

0 commit comments

Comments
 (0)