Skip to content

WIP: Start GPU porting#278

Draft
EmilyBourne wants to merge 38 commits into
mainfrom
ebourne_gradual_gpu_port_part_1
Draft

WIP: Start GPU porting#278
EmilyBourne wants to merge 38 commits into
mainfrom
ebourne_gradual_gpu_port_part_1

Conversation

@EmilyBourne
Copy link
Copy Markdown
Collaborator

This PR is not ready to review but I want to check the cuda compiler output as the output on GPU is not verbose enough

Merge Request - GuideLine Checklist

Guideline to check code before resolve WIP and approval, respectively.
As many checkboxes as possible should be ticked.

Checks by code author:

Always to be checked:

  • There is at least one issue associated with the pull request.
  • New code adheres with the coding guidelines
  • No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

If functions were changed or functionality was added:

  • Tests for new functionality has been added
  • A local test was succesful

If new functionality was added:

  • There is appropriate documentation of your work. (use doxygen style comments)

If new third party software is used:

  • Did you pay attention to its license? Please remember to add it to the wiki after successful merging.

If new mathematical methods or epidemiological terms are used:

  • Are new methods referenced? Did you provide further documentation?

Checks by code reviewer(s):

  • Is the code clean of development artifacts e.g., unnecessary comments, prints, ...
  • The ticket goals for each associated issue are reached or problems are clearly addressed (i.e., a new issue was introduced).
  • There are appropriate unit tests and they pass.
  • The git history is clean and linearized for the merge request. All reviewers should squash commits and write a simple and meaningful commit message.
  • Coverage report for new code is acceptable.
  • No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

@EmilyBourne EmilyBourne marked this pull request as draft May 20, 2026 11:23
@EmilyBourne EmilyBourne marked this pull request as ready for review May 20, 2026 12:19
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 99.69372% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.77%. Comparing base (a87782a) to head (9c83c51).

Files with missing lines Patch % Lines
include/GMGPolar/setup.h 90.00% 1 Missing ⚠️
include/GMGPolar/utils.h 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #278      +/-   ##
==========================================
+ Coverage   95.73%   95.77%   +0.04%     
==========================================
  Files          79       79              
  Lines        8492     8575      +83     
==========================================
+ Hits         8130     8213      +83     
  Misses        362      362              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julianlitz julianlitz self-requested a review May 20, 2026 13:06
julianlitz
julianlitz previously approved these changes May 20, 2026
@EmilyBourne
Copy link
Copy Markdown
Collaborator Author

@julianlitz
WIP = Work In Progress
This PR does not yet run correctly on GPU. I took it out of draft temporarily to check if the problem is also present in the modified CPU version

@EmilyBourne EmilyBourne dismissed julianlitz’s stale review May 20, 2026 13:11

Review left on unfinished modifications

@EmilyBourne EmilyBourne marked this pull request as draft May 20, 2026 13:12
@julianlitz
Copy link
Copy Markdown
Collaborator

image
"Ready to review" 😊😉
I just approved it so you don't habe to wait until someone approves it. You can just leave it unmerged and request a second review when you think you are done.

@EmilyBourne
Copy link
Copy Markdown
Collaborator Author

On v100 at aaec652:

0% tests passed, 2 tests failed out of 2

Total Test time (real) =   0.59 sec

The following tests FAILED:
	102 - SerialDirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry (Failed)
	350 - ParallelDirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry (Failed)
Errors while running CTest
Details
-bash $> ctest --rerun-failed --output-on-failure
Test project /home/EB030696/GMGPolar_2/build
    Start 102: SerialDirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry
1/2 Test #102: SerialDirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry .....***Failed    0.29 sec
Note: Google Test filter = DirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from DirectSolverTakeTest_CircularGeometry
[ RUN      ] DirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry
/home/EB030696/GMGPolar_2/tests/DirectSolver/directSolver.cpp:1026: Failure
The difference between infinity_norm(HostConstVector<double>(residuum)) and 0.0 is 1.0444978215673473e-12, which exceeds 1e-12, where
infinity_norm(HostConstVector<double>(residuum)) evaluates to 1.0444978215673473e-12,
0.0 evaluates to 0, and
1e-12 evaluates to 9.9999999999999998e-13.

[  FAILED  ] DirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry (8 ms)
[----------] 1 test from DirectSolverTakeTest_CircularGeometry (8 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry

 1 FAILED TEST

    Start 350: ParallelDirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry
2/2 Test #350: ParallelDirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry ...***Failed    0.25 sec
Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores.
  process threads available :   8,  requested thread :  16
Note: Google Test filter = DirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from DirectSolverTakeTest_CircularGeometry
[ RUN      ] DirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry
/home/EB030696/GMGPolar_2/tests/DirectSolver/directSolver.cpp:1026: Failure
The difference between infinity_norm(HostConstVector<double>(residuum)) and 0.0 is 1.0444978215673473e-12, which exceeds 1e-12, where
infinity_norm(HostConstVector<double>(residuum)) evaluates to 1.0444978215673473e-12,
0.0 evaluates to 0, and
1e-12 evaluates to 9.9999999999999998e-13.

[  FAILED  ] DirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry (7 ms)
[----------] 1 test from DirectSolverTakeTest_CircularGeometry (7 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry

 1 FAILED TEST


0% tests passed, 2 tests failed out of 2

Total Test time (real) =   0.59 sec

The following tests FAILED:
	102 - SerialDirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry (Failed)
	350 - ParallelDirectSolverTakeTest_CircularGeometry.DirectSolverAcrossOriginHigherPrecision2_CircularGeometry (Failed)
Errors while running CTest

@EmilyBourne
Copy link
Copy Markdown
Collaborator Author

@julianlitz do you have any idea about this failure?
The failing assertion is:

ASSERT_NEAR(infinity_norm(HostConstVector<double>(residuum)), 0.0, 1e-12);

I assume that GPU usage has reordered calculations to introduce rounding errors. However 1e-12 is fairly large for a floating-point error. I am not sure what the "HigherPrecision" in the test title refers to?

@julianlitz
Copy link
Copy Markdown
Collaborator

julianlitz commented May 27, 2026

In Give methods I also observed multiple times that reordering loops can cause these tests to fail by minor margins. e.g making 9,9e-12 to 1,1e-11 which fails the test.
You can just decrement it to 1e-11.

@julianlitz
Copy link
Copy Markdown
Collaborator

julianlitz commented May 27, 2026

Especially if Across Origin is true since there is a lot precision loss at the boundary depending on the addition order. which is what you have here as welly

@julianlitz
Copy link
Copy Markdown
Collaborator

julianlitz commented May 27, 2026

@EmilyBourne High Precision refers to the fact that Rmin = 0.15 is quite big. Thats the reasons these 1e-11 errors can be reached in the first place. Otherwise they are in the range of 1e-7 for across origin. Thats because the 1/Rmin gets very large and the other contribution like TopRight are way smaller.

@julianlitz
Copy link
Copy Markdown
Collaborator

julianlitz commented May 27, 2026

@EmilyBourne It is not even necessarily the order of execution of the grid nodes.

Loop

vs.

#pragma omp parallel for num_threads(1)
Loop

even produced different results in my testing.

I think this happens because the code gets compiled differently as the compiler chooses to add terms in different ways together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants