Parallel Processing & Collective Communication

This repository implements Parallel Prefix Sum (Scan) and Tree-Based Reduction—and benchmarks their performance across physical clusters and simulated network topologies (Star, Ring, Mesh) using SimGrid.

📂 Repository Structure

.
├── prefix_parallel/              # Q1: Python MPI Implementation
│   ├── src/
│   │   ├── prefix_sum.py         # Main entry point and 3-step logic
│   │   └── utils.py              # Verification and timing utilities
│   ├── mpi_host                  # Hostfile for physical cluster
│   └── requirements.txt          # Python dependencies (mpi4py, numpy)
│
├── tree_reduce_imp/              # Q2: C++ SimGrid Implementation
│   ├── config/                   # Generated Topologies
│   │   ├── gen_topo.py           # Script to generate Star, Ring, Mesh XMLs
│   │   ├── hostfile.txt          # SimGrid hostfile
│   │   └── platform_*.xml        # Platform descriptions
│   ├── tree_reduce.cpp           # C++ Binary Tree Reduction Logic
│   └── Makefile                  # Build script for smpicxx
│
├── report/                     
│   └── main.tex                  # Final Report PDF
│
└── README.md                     # Project Documentation

Part 1: Parallel Prefix Sum (MPI Scan)

Overview

This module implements a distributed Cumulative Sum (Prefix Sum) algorithm. Since prefix sum calculation has a strict sequential dependency ($S_i = S_{i-1} + A_i$), naive parallelization is impossible. We utilize a 3-Step Block Decomposition Algorithm:

Local Scan: Each process calculates the prefix sum of its local chunk.
Offset Scan: The total sums of all chunks are gathered, scanned at the root, and scattered back as offsets.
Local Update: Each process adds its specific offset to its local array.

Prerequisites

Python 3.10+
OpenMPI / MPICH installed on the host.
mpi4py and numpy.

Execution

To run the benchmark on a physical cluster (or locally with multiple slots):

cd prefix_parallel
# Install dependencies
pip install -r requirements.txt

# Run on 4 processes with N=1000 integers
mpiexec -n 4 --hostfile mpi_host python src/prefix_sum.py 1000

Key Features:

Automatic Padding: Handles input sizes ($N$) that are not divisible by processor count ($P$).
Verification: Automatically compares distributed results against a sequential ground truth on Rank 0.

Part 2: Tree Reduction with SimGrid

Overview

This module implements a Binary Tree Reduction algorithm in C++ and simulates its performance against a standard Sequential (Flat) Reduction. The goal is to analyze how logical algorithms ($O(\log P)$) perform on different physical network topologies ($O(D)$ diameter).

Topologies Simulated

The gen_topo.py script generates XML platform descriptions for SimGrid:

Star: Central switch (Low Diameter, Congestion at Root).
Ring: Circular connection (High Diameter, High Latency).
Mesh: Grid layout (Balanced Bandwidth/Latency).

Setup & Compilation

This project uses SimGrid within a Docker container to ensure reproducible simulation environments.

Generate Configurations:

cd tree_reduce_imp
python3 config/gen_topo.py
# Output: config/platform_ring.xml, config/platform_mesh.xml, etc.

Enter SimGrid Container:

docker run -it --rm -v $(pwd):/home/simgrid/project simgrid/tuto-smpi /bin/bash

Compile C++ Code: Inside the container:

make
# Creates the 'tree_reduce' binary linked with SMPI

Running Simulations

Execute the binary using smpirun for specific topologies:

Ring Topology:

smpirun -platform config/platform_ring.xml -hostfile config/hostfile.txt ./tree_reduce

Star Topology:

smpirun -platform config/platform_star.xml -hostfile config/hostfile.txt ./tree_reduce

Mesh Topology:

smpirun -platform config/platform_mesh.xml -hostfile config/hostfile.txt ./tree_reduce

Results Summary

The implementation was successfully benchmarked. Detailed analysis is available in the report/ directory.

Topology	Sequential Performance	Tree Performance	Observation
Star	Poor (Bottleneck)	Excellent	Tree relieves root congestion.
Ring	Moderate	Poor	Tree logic creates long-distance hops on the ring.
Mesh	Moderate	Good	Balanced performance.

Report

report/report.pdf contains the documentation for the work done in this assignment.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
prefix_parallel		prefix_parallel
report		report
tree_reduce_imp		tree_reduce_imp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Processing & Collective Communication

📂 Repository Structure

Part 1: Parallel Prefix Sum (MPI Scan)

Overview

Prerequisites

Execution

Part 2: Tree Reduction with SimGrid

Overview

Topologies Simulated

Setup & Compilation

Running Simulations

Results Summary

Report

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parallel Processing & Collective Communication

📂 Repository Structure

Part 1: Parallel Prefix Sum (MPI Scan)

Overview

Prerequisites

Execution

Part 2: Tree Reduction with SimGrid

Overview

Topologies Simulated

Setup & Compilation

Running Simulations

Results Summary

Report

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages