Skip to content

Pr501 pcg#543

Merged
aaadelmann merged 7 commits into
masterfrom
pr501-pcg
Jun 15, 2026
Merged

Pr501 pcg#543
aaadelmann merged 7 commits into
masterfrom
pr501-pcg

Conversation

@aaadelmann

Copy link
Copy Markdown
Member

PCG/Preconditioner

Summary

This PR extracts the PCG allocation/performance part from the larger PR501 work.
It keeps the scope intentionally small:

  • reuse CG/PCG work fields instead of constructing them inside every solve,
  • make preconditioners write into caller-provided output fields,
  • pass solver operator fields by reference instead of by value,
  • keep preconditioner scratch fields synchronized with layout changes,
  • fix ALPINE particle initialization code that took pointers/references to temporary Kokkos views.

The primary motivation is to remove repeated allocation and avoid repeated
halo-buffer growth on copied fields in the PCG/preconditioner path. This was
observed as repeated allocation work in multi-rank profiling.

What Changed

PCG workspaces are reused

src/LinearSolvers/PCG.h

In the CG/PCG algorithm these are:

  • r: the current residual, initially rhs - A * lhs,

  • d: the current search direction,

  • q: the operator-applied search direction, i.e. A * d.

  • PCG adds reusable s and pcond_out buffers.
    Here s stores the preconditioned residual M^{-1} r, and pcond_out
    stores the initial preconditioner output before it is folded into d.

  • PCG::operator() refreshes workspace layouts instead of constructing local
    fields on every solve.

  • op_m(d) now assigns into the existing q storage instead of forcing a
    deepCopy().

  • Preconditioner applications now write into pcond_out / s instead of
    returning freshly allocated fields.

Important BC detail:

  • pcond_out and s intentionally keep their default NoBcFace boundary
    conditions. This preserves the old behavior where the preconditioner returned
    a fresh field without periodic BCs. If these buffers inherited periodic BCs,
    preconditioner-internal operators could trigger extra PeriodicFace::apply
    MPI calls and diverge from the expected multi-rank communication sequence.

Preconditioners now write into caller-owned output

src/LinearSolvers/Preconditioner.h

  • The base preconditioner interface changed from returning Field to:

    void operator()(Field& input, Field& result)
  • Concrete preconditioners now reuse member scratch fields where possible.

  • init_fields() allocates scratch fields once and updates their layout on
    subsequent calls.

  • Several preconditioners still keep necessary deepCopy() calls where
    expression aliasing can occur, especially for field-valued inverse-diagonal
    operators.

Covered preconditioners include:

  • identity/default,
  • Jacobi,
  • polynomial Newton,
  • polynomial Chebyshev,
  • Richardson,
  • alternative Richardson,
  • Gauss-Seidel,
  • SSOR.

Solver operators take fields by reference

src/LinearSolvers/PCG.h
src/LinearSolvers/Preconditioner.h
src/PoissonSolvers/PoissonCG.h

The solver operator wrapper now takes the field by reference:

#define IPPL_SOLVER_OPERATOR_WRAPPER(fun, type) \
    [](type& arg) {                             \
        return fun(arg);                        \
    }

This matters because a by-value Field copy shares the underlying data views but
does not share all mutable side state, notably halo exchange buffers that may be
grown through Kokkos::realloc. When the operator acts on a copied field, the
copy can grow its halo buffer without updating the original. In multi-rank runs
this showed up as repeated allocation work.

PoissonCG.h no longer redefines the wrapper locally. Keeping a single
by-reference wrapper avoids silently reintroducing the by-value behavior.

ALPINE Kokkos view lifetime fixes

The ALPINE managers no longer take addresses of temporary view handles returned
by getView().

Changed files:

  • alpine/LandauDampingManager.h
  • alpine/BumponTailInstabilityManager.h
  • alpine/PenningTrapManager.h

Before:

view_type* R = &(this->pcontainer_m->R.getView());
samplingR.generate(*R, rand_pool64);

After:

view_type R = this->pcontainer_m->R.getView();
samplingR.generate(R, rand_pool64);

This fixes compilers/backends that reject taking the address of a temporary
Kokkos view object and avoids dangling view-handle pointers.

Documentation Check

The implementation is documented in the places where review context matters:

  • Preconditioner.h documents why the operator wrapper must take fields by
    reference.
  • PCG.h documents why workspaces are reused and why pcond_out / s keep
    NoBcFace boundary conditions.
  • PoissonCG.h documents why the local wrapper definition was removed.
  • PR501_SPLIT_MAP.md documents the split rationale and dependencies between
    the planned PR501 follow-up branches.

No user-facing manual update appears necessary because this PR changes internal
solver allocation behavior and preserves the public PCG/Poisson solver behavior.

Validation

Original split-map validation

The PCG split prototype was built and tested with unit tests:

100% tests passed, 0 tests failed out of 34
Total Test time (real) = 54.17 sec

Solver integration validation

A dedicated solver integration build was configured:

cmake -S . -B build-pr501-pcg-ssor-tests-release \
  -DCMAKE_BUILD_TYPE=Release \
  -DIPPL_PLATFORMS=SERIAL \
  -DIPPL_ENABLE_TESTS=ON \
  -DIPPL_ENABLE_UNIT_TESTS=OFF \
  -DIPPL_ENABLE_SOLVERS=ON \
  -DIPPL_ENABLE_FFT=OFF

Built targets:

cmake --build build-pr501-pcg-ssor-tests-release \
  --target TestCGSolver TestCGSolver_convergence TestSolverDesign -j 8

Results:

  • TestCGSolver: built and passed through CTest.
  • TestCGSolver_convergence: compile-only target built successfully.
  • TestSolverDesign: compile-only target built successfully.

Registered CTest:

ctest --test-dir build-pr501-pcg-ssor-tests-release \
  --output-on-failure -R '^TestCGSolver$'

Result:

TestCGSolver passed

The SSOR path was also run explicitly because the registered CTest uses Jacobi:

./test/solver/TestCGSolver 4 s 2 1 1.57079632679 --info 5

and with MPI:

/opt/homebrew/Cellar/open-mpi/5.0.8/bin/mpiexec -n 2 \
  ./test/solver/TestCGSolver 4 s 2 1 1.57079632679 --info 5

/opt/homebrew/Cellar/open-mpi/5.0.8/bin/mpiexec -n 4 \
  ./test/solver/TestCGSolver 4 s 2 1 1.57079632679 --info 5

Results:

ranks residual iterations
1 1.514653902775893e-13 42
2 1.196970356284836e-13 44
4 1.588802017354866e-13 38

The same explicit SSOR test was temporarily rerun with the preconditioner
changes reverted. Residuals and iteration counts were identical for 1, 2, and 4
ranks.

Reviewer Notes

  • The key semantic change is the preconditioner API: preconditioners now write
    into a supplied output field rather than returning a new field.
  • The by-reference solver operator wrapper is intentional and important for halo
    buffer reuse.
  • pcond_out and s should not inherit periodic BCs; that would change the MPI
    call pattern inside preconditioner operator chains.
  • Some deepCopy() calls remain intentionally where expression aliasing can
    otherwise corrupt field-valued preconditioner operations.

PaulFisch and others added 4 commits May 31, 2026 21:34
Reuse PCG and preconditioner work fields across iterations instead of allocating temporary fields during every solve step. This also passes the solver field by reference through OperatorF to avoid extra halo-related allocations.
@aaadelmann aaadelmann self-assigned this Jun 10, 2026
@aaadelmann aaadelmann added enhancement New feature or request cleanup labels Jun 10, 2026
@aaadelmann aaadelmann requested a review from s-mayani June 10, 2026 21:06
@s-mayani

Copy link
Copy Markdown
Collaborator

Two things I don't understand:

  • "fix ALPINE particle initialization code that took pointers/references to temporary Kokkos views."
  • "pcond_out and s intentionally keep their default NoBcFace boundary
    conditions. This preserves the old behavior where the preconditioner returned
    a fresh field without periodic BCs. If these buffers inherited periodic BCs,
    preconditioner-internal operators could trigger extra PeriodicFace::apply
    MPI calls and diverge from the expected multi-rank communication sequence."

For the first one, I don't think I understand why this change is necessary. @srikrrish do you know?

For the second one, I am a bit confused by the wording, and I also think the comment in the code regarding this could be made more informative for future developers, especially considering (if I understood correctly) that this was the source of deadlocks in PCG.

Comment thread alpine/BumponTailInstabilityManager.h
Comment thread src/LinearSolvers/PCG.h Outdated
Comment thread src/LinearSolvers/PCG.h Outdated
Comment thread src/LinearSolvers/PCG.h
Comment thread src/LinearSolvers/PCG.h Outdated
Comment thread src/LinearSolvers/PCG.h
Comment thread src/LinearSolvers/PCG.h Outdated
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h Outdated
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/LinearSolvers/Preconditioner.h Outdated
Comment thread src/LinearSolvers/Preconditioner.h
Comment thread src/PoissonSolvers/PoissonCG.h
Comment thread PR501_SPLIT_MAP.md Outdated
@srikrrish

Copy link
Copy Markdown
Member

This fixes compilers/backends that reject taking the address of a temporary Kokkos view object and avoids dangling view-handle pointers.

I think it is due to the explanation written here
"This fixes compilers/backends that reject taking the address of a temporary
Kokkos view object and avoids dangling view-handle pointers."
I am not sure if these things were observed before or it is just a pre-cautionary measure.

@s-mayani

Copy link
Copy Markdown
Collaborator

This fixes compilers/backends that reject taking the address of a temporary Kokkos view object and avoids dangling view-handle pointers.

I think it is due to the explanation written here "This fixes compilers/backends that reject taking the address of a temporary Kokkos view object and avoids dangling view-handle pointers." I am not sure if these things were observed before or it is just a pre-cautionary measure.

Okay thanks!

@s-mayani

s-mayani commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

In general, I think I disagree with the use of "result buffer" in most of the comments regarding the fields stored in CG and PCG. It might create a confusion with the communication buffers, "results field" would be less ambiguous.

@aaadelmann

Copy link
Copy Markdown
Member Author

Note: the comments I have checked random are the original ones, not some Hugo hallucination

@aaadelmann aaadelmann requested a review from s-mayani June 15, 2026 12:05
@aaadelmann

Copy link
Copy Markdown
Member Author

I think I checked on all of the items but not sure because git not allowing me to resolve some of them ... pls. check

Comment thread src/LinearSolvers/PCG.h
s-mayani
s-mayani previously approved these changes Jun 15, 2026
Comment thread PR501_SPLIT_MAP.md Outdated
@s-mayani s-mayani dismissed their stale review June 15, 2026 12:16

Saw the PR501_SPLIT_MAP.md is still there - this file does not belong in the IPPL repo and should not be committed.

@aaadelmann aaadelmann requested a review from s-mayani June 15, 2026 12:28
@aaadelmann aaadelmann added this pull request to the merge queue Jun 15, 2026
Merged via the queue into master with commit b35146d Jun 15, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants