Pr501 pcg#543
Conversation
Reuse PCG and preconditioner work fields across iterations instead of allocating temporary fields during every solve step. This also passes the solver field by reference through OperatorF to avoid extra halo-related allocations.
|
Two things I don't understand:
For the first one, I don't think I understand why this change is necessary. @srikrrish do you know? For the second one, I am a bit confused by the wording, and I also think the comment in the code regarding this could be made more informative for future developers, especially considering (if I understood correctly) that this was the source of deadlocks in PCG. |
I think it is due to the explanation written here |
Okay thanks! |
|
In general, I think I disagree with the use of "result buffer" in most of the comments regarding the fields stored in CG and PCG. It might create a confusion with the communication buffers, "results field" would be less ambiguous. |
|
Note: the comments I have checked random are the original ones, not some Hugo hallucination |
|
I think I checked on all of the items but not sure because git not allowing me to resolve some of them ... pls. check |
Saw the PR501_SPLIT_MAP.md is still there - this file does not belong in the IPPL repo and should not be committed.
PCG/Preconditioner
Summary
This PR extracts the PCG allocation/performance part from the larger PR501 work.
It keeps the scope intentionally small:
The primary motivation is to remove repeated allocation and avoid repeated
halo-buffer growth on copied fields in the PCG/preconditioner path. This was
observed as repeated allocation work in multi-rank profiling.
What Changed
PCG workspaces are reused
src/LinearSolvers/PCG.hIn the CG/PCG algorithm these are:
r: the current residual, initiallyrhs - A * lhs,d: the current search direction,q: the operator-applied search direction, i.e.A * d.PCG adds reusable
sandpcond_outbuffers.Here
sstores the preconditioned residualM^{-1} r, andpcond_outstores the initial preconditioner output before it is folded into
d.PCG::operator()refreshes workspace layouts instead of constructing localfields on every solve.
op_m(d)now assigns into the existingqstorage instead of forcing adeepCopy().Preconditioner applications now write into
pcond_out/sinstead ofreturning freshly allocated fields.
Important BC detail:
pcond_outandsintentionally keep their defaultNoBcFaceboundaryconditions. This preserves the old behavior where the preconditioner returned
a fresh field without periodic BCs. If these buffers inherited periodic BCs,
preconditioner-internal operators could trigger extra
PeriodicFace::applyMPI calls and diverge from the expected multi-rank communication sequence.
Preconditioners now write into caller-owned output
src/LinearSolvers/Preconditioner.hThe base preconditioner interface changed from returning
Fieldto:Concrete preconditioners now reuse member scratch fields where possible.
init_fields()allocates scratch fields once and updates their layout onsubsequent calls.
Several preconditioners still keep necessary
deepCopy()calls whereexpression aliasing can occur, especially for field-valued inverse-diagonal
operators.
Covered preconditioners include:
Solver operators take fields by reference
src/LinearSolvers/PCG.hsrc/LinearSolvers/Preconditioner.hsrc/PoissonSolvers/PoissonCG.hThe solver operator wrapper now takes the field by reference:
This matters because a by-value
Fieldcopy shares the underlying data views butdoes not share all mutable side state, notably halo exchange buffers that may be
grown through
Kokkos::realloc. When the operator acts on a copied field, thecopy can grow its halo buffer without updating the original. In multi-rank runs
this showed up as repeated allocation work.
PoissonCG.hno longer redefines the wrapper locally. Keeping a singleby-reference wrapper avoids silently reintroducing the by-value behavior.
ALPINE Kokkos view lifetime fixes
The ALPINE managers no longer take addresses of temporary view handles returned
by
getView().Changed files:
alpine/LandauDampingManager.halpine/BumponTailInstabilityManager.halpine/PenningTrapManager.hBefore:
view_type* R = &(this->pcontainer_m->R.getView()); samplingR.generate(*R, rand_pool64);After:
view_type R = this->pcontainer_m->R.getView(); samplingR.generate(R, rand_pool64);This fixes compilers/backends that reject taking the address of a temporary
Kokkos view object and avoids dangling view-handle pointers.
Documentation Check
The implementation is documented in the places where review context matters:
Preconditioner.hdocuments why the operator wrapper must take fields byreference.
PCG.hdocuments why workspaces are reused and whypcond_out/skeepNoBcFaceboundary conditions.PoissonCG.hdocuments why the local wrapper definition was removed.PR501_SPLIT_MAP.mddocuments the split rationale and dependencies betweenthe planned PR501 follow-up branches.
No user-facing manual update appears necessary because this PR changes internal
solver allocation behavior and preserves the public PCG/Poisson solver behavior.
Validation
Original split-map validation
The PCG split prototype was built and tested with unit tests:
Solver integration validation
A dedicated solver integration build was configured:
cmake -S . -B build-pr501-pcg-ssor-tests-release \ -DCMAKE_BUILD_TYPE=Release \ -DIPPL_PLATFORMS=SERIAL \ -DIPPL_ENABLE_TESTS=ON \ -DIPPL_ENABLE_UNIT_TESTS=OFF \ -DIPPL_ENABLE_SOLVERS=ON \ -DIPPL_ENABLE_FFT=OFFBuilt targets:
Results:
TestCGSolver: built and passed through CTest.TestCGSolver_convergence: compile-only target built successfully.TestSolverDesign: compile-only target built successfully.Registered CTest:
ctest --test-dir build-pr501-pcg-ssor-tests-release \ --output-on-failure -R '^TestCGSolver$'Result:
The SSOR path was also run explicitly because the registered CTest uses Jacobi:
and with MPI:
Results:
1.514653902775893e-131.196970356284836e-131.588802017354866e-13The same explicit SSOR test was temporarily rerun with the preconditioner
changes reverted. Residuals and iteration counts were identical for 1, 2, and 4
ranks.
Reviewer Notes
into a supplied output field rather than returning a new field.
buffer reuse.
pcond_outandsshould not inherit periodic BCs; that would change the MPIcall pattern inside preconditioner operator chains.
deepCopy()calls remain intentionally where expression aliasing canotherwise corrupt field-valued preconditioner operations.