This repository contains a high-performance, matrix-free parallel solver for the 2D Laplace equation:
over a square domain
The solver utilizes the Jacobi iteration method to compute the discrete solution on a Cartesian grid. To achieve high computational efficiency and scalability, the architecture implements a hybrid parallelization strategy utilizing both MPI and OpenMP.
- Hybrid Parallelization: Combines MPI for distributed memory domain decomposition (block partitioning by rows) and OpenMP for shared-memory multi-threading within each node.
- Customizable Boundary Conditions: Supports both standard homogeneous Dirichlet boundary conditions (default) and user-defined non-homogeneous conditions.
- Matrix-Free Architecture: Implements a custom dense matrix class using flattened 1D
std::vector<double>arrays to optimize memory access patterns and cache utilization. - VTK Export: Automatically exports solutions and grid coordinates in
.vtkformat for immediate 3D visualization using tools like ParaView.
The codebase is structured around two primary C++ classes to separate mathematical logic from data structure management:
A custom data structure designed to handle dense matrices efficiently. Data is stored sequentially in a row-major 1D vector. The class provides:
- Overloaded operators (
+,-, etc.) for clean mathematical syntax. - Optimized data access methods.
- Built-in calculation of matrix norms for convergence checking.
- Native
.vtkfile generation for data export.
This class encapsulates the physics and numerical methods of the problem.
- State Management: Stores boundary conditions (
bc),max_iter, andtolerance. solver(n, f): The core parallel solver. It takes grid resolutionnand forcing termfas parameters.- MPI Implementation: The 2D grid is divided horizontally into row blocks and distributed across MPI ranks. Halo exchange (ghost cells) is implemented to communicate the top and bottom boundary rows between adjacent ranks during each Jacobi iteration.
- OpenMP Implementation: The heavy computational loops calculating the 4-point Jacobi stencil are parallelized across local threads. Edge rows are deliberately excluded from OpenMP parallelization to minimize thread overhead.
serial_solver: A baseline single-core implementation provided for direct performance comparison and benchmarking.
The repository includes a Makefile configured to compile the code and run automated test suites.
Use the following commands to execute specific test scenarios (make run#N):
| Command | Description | Output |
|---|---|---|
make run1 |
Standard parallel test. Solves with homogeneous boundary conditions. | Generates results.vtk
|
make run2 |
Non-homogeneous test. Solves with |
Generates results.vtk
|
make run3 |
Performance benchmark. Direct comparison between the parallel solver and the serial solver baseline. | Console output |
make run4 |
Convergence analysis. Evaluates how the computational error scales with an increasing number of grid partitions ( |
Console output |
make run5 |
Scalability test. Evaluates elapsed compute time as the number of CPU cores increases (1, 2, and 4 cores). | Console output |
To remove compiled objects and generated files, run:
make clean
- Accuracy: Tests 1 and 2 successfully validate the solver against both homogeneous and non-homogeneous boundary conditions.
-
Convergence: Test 4 confirms that the numerical error decreases smoothly as the grid resolution (
$n$ ) increases, proving the mathematical stability of the implementation. - Parallel Efficiency & Scalability: Test 3 demonstrates significant speedups over the serial baseline. Furthermore, Test 5 showcases excellent strong scaling: doubling the core count results in a near-perfect 50% reduction in elapsed execution time.