This repository implements Parallel Prefix Sum (Scan) and Tree-Based Reduction—and benchmarks their performance across physical clusters and simulated network topologies (Star, Ring, Mesh) using SimGrid.
.
├── prefix_parallel/ # Q1: Python MPI Implementation
│ ├── src/
│ │ ├── prefix_sum.py # Main entry point and 3-step logic
│ │ └── utils.py # Verification and timing utilities
│ ├── mpi_host # Hostfile for physical cluster
│ └── requirements.txt # Python dependencies (mpi4py, numpy)
│
├── tree_reduce_imp/ # Q2: C++ SimGrid Implementation
│ ├── config/ # Generated Topologies
│ │ ├── gen_topo.py # Script to generate Star, Ring, Mesh XMLs
│ │ ├── hostfile.txt # SimGrid hostfile
│ │ └── platform_*.xml # Platform descriptions
│ ├── tree_reduce.cpp # C++ Binary Tree Reduction Logic
│ └── Makefile # Build script for smpicxx
│
├── report/
│ └── main.tex # Final Report PDF
│
└── README.md # Project Documentation
This module implements a distributed Cumulative Sum (Prefix Sum) algorithm. Since prefix sum calculation has a strict sequential dependency (
- Local Scan: Each process calculates the prefix sum of its local chunk.
- Offset Scan: The total sums of all chunks are gathered, scanned at the root, and scattered back as offsets.
- Local Update: Each process adds its specific offset to its local array.
- Python 3.10+
- OpenMPI / MPICH installed on the host.
mpi4pyandnumpy.
To run the benchmark on a physical cluster (or locally with multiple slots):
cd prefix_parallel
# Install dependencies
pip install -r requirements.txt
# Run on 4 processes with N=1000 integers
mpiexec -n 4 --hostfile mpi_host python src/prefix_sum.py 1000Key Features:
-
Automatic Padding: Handles input sizes (
$N$ ) that are not divisible by processor count ($P$ ). - Verification: Automatically compares distributed results against a sequential ground truth on Rank 0.
This module implements a Binary Tree Reduction algorithm in C++ and simulates its performance against a standard Sequential (Flat) Reduction. The goal is to analyze how logical algorithms ($O(\log P)$) perform on different physical network topologies (
The gen_topo.py script generates XML platform descriptions for SimGrid:
- Star: Central switch (Low Diameter, Congestion at Root).
- Ring: Circular connection (High Diameter, High Latency).
- Mesh: Grid layout (Balanced Bandwidth/Latency).
This project uses SimGrid within a Docker container to ensure reproducible simulation environments.
-
Generate Configurations:
cd tree_reduce_imp python3 config/gen_topo.py # Output: config/platform_ring.xml, config/platform_mesh.xml, etc.
-
Enter SimGrid Container:
docker run -it --rm -v $(pwd):/home/simgrid/project simgrid/tuto-smpi /bin/bash -
Compile C++ Code: Inside the container:
make # Creates the 'tree_reduce' binary linked with SMPI
Execute the binary using smpirun for specific topologies:
Ring Topology:
smpirun -platform config/platform_ring.xml -hostfile config/hostfile.txt ./tree_reduceStar Topology:
smpirun -platform config/platform_star.xml -hostfile config/hostfile.txt ./tree_reduceMesh Topology:
smpirun -platform config/platform_mesh.xml -hostfile config/hostfile.txt ./tree_reduceThe implementation was successfully benchmarked. Detailed analysis is available in the report/ directory.
| Topology | Sequential Performance | Tree Performance | Observation |
|---|---|---|---|
| Star | Poor (Bottleneck) | Excellent | Tree relieves root congestion. |
| Ring | Moderate | Poor | Tree logic creates long-distance hops on the ring. |
| Mesh | Moderate | Good | Balanced performance. |
report/report.pdf contains the documentation for the work done in this assignment.