Skip to content

GitGud-f/mpi4py-simgrid-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Processing & Collective Communication

This repository implements Parallel Prefix Sum (Scan) and Tree-Based Reduction—and benchmarks their performance across physical clusters and simulated network topologies (Star, Ring, Mesh) using SimGrid.

📂 Repository Structure

.
├── prefix_parallel/              # Q1: Python MPI Implementation
│   ├── src/
│   │   ├── prefix_sum.py         # Main entry point and 3-step logic
│   │   └── utils.py              # Verification and timing utilities
│   ├── mpi_host                  # Hostfile for physical cluster
│   └── requirements.txt          # Python dependencies (mpi4py, numpy)
│
├── tree_reduce_imp/              # Q2: C++ SimGrid Implementation
│   ├── config/                   # Generated Topologies
│   │   ├── gen_topo.py           # Script to generate Star, Ring, Mesh XMLs
│   │   ├── hostfile.txt          # SimGrid hostfile
│   │   └── platform_*.xml        # Platform descriptions
│   ├── tree_reduce.cpp           # C++ Binary Tree Reduction Logic
│   └── Makefile                  # Build script for smpicxx
│
├── report/                     
│   └── main.tex                  # Final Report PDF
│
└── README.md                     # Project Documentation

Part 1: Parallel Prefix Sum (MPI Scan)

Overview

This module implements a distributed Cumulative Sum (Prefix Sum) algorithm. Since prefix sum calculation has a strict sequential dependency ($S_i = S_{i-1} + A_i$), naive parallelization is impossible. We utilize a 3-Step Block Decomposition Algorithm:

  1. Local Scan: Each process calculates the prefix sum of its local chunk.
  2. Offset Scan: The total sums of all chunks are gathered, scanned at the root, and scattered back as offsets.
  3. Local Update: Each process adds its specific offset to its local array.

Prerequisites

  • Python 3.10+
  • OpenMPI / MPICH installed on the host.
  • mpi4py and numpy.

Execution

To run the benchmark on a physical cluster (or locally with multiple slots):

cd prefix_parallel
# Install dependencies
pip install -r requirements.txt

# Run on 4 processes with N=1000 integers
mpiexec -n 4 --hostfile mpi_host python src/prefix_sum.py 1000

Key Features:

  • Automatic Padding: Handles input sizes ($N$) that are not divisible by processor count ($P$).
  • Verification: Automatically compares distributed results against a sequential ground truth on Rank 0.

Part 2: Tree Reduction with SimGrid

Overview

This module implements a Binary Tree Reduction algorithm in C++ and simulates its performance against a standard Sequential (Flat) Reduction. The goal is to analyze how logical algorithms ($O(\log P)$) perform on different physical network topologies ($O(D)$ diameter).

Topologies Simulated

The gen_topo.py script generates XML platform descriptions for SimGrid:

  • Star: Central switch (Low Diameter, Congestion at Root).
  • Ring: Circular connection (High Diameter, High Latency).
  • Mesh: Grid layout (Balanced Bandwidth/Latency).

Setup & Compilation

This project uses SimGrid within a Docker container to ensure reproducible simulation environments.

  1. Generate Configurations:

    cd tree_reduce_imp
    python3 config/gen_topo.py
    # Output: config/platform_ring.xml, config/platform_mesh.xml, etc.
  2. Enter SimGrid Container:

    docker run -it --rm -v $(pwd):/home/simgrid/project simgrid/tuto-smpi /bin/bash
  3. Compile C++ Code: Inside the container:

    make
    # Creates the 'tree_reduce' binary linked with SMPI

Running Simulations

Execute the binary using smpirun for specific topologies:

Ring Topology:

smpirun -platform config/platform_ring.xml -hostfile config/hostfile.txt ./tree_reduce

Star Topology:

smpirun -platform config/platform_star.xml -hostfile config/hostfile.txt ./tree_reduce

Mesh Topology:

smpirun -platform config/platform_mesh.xml -hostfile config/hostfile.txt ./tree_reduce

Results Summary

The implementation was successfully benchmarked. Detailed analysis is available in the report/ directory.

Topology Sequential Performance Tree Performance Observation
Star Poor (Bottleneck) Excellent Tree relieves root congestion.
Ring Moderate Poor Tree logic creates long-distance hops on the ring.
Mesh Moderate Good Balanced performance.

Report

report/report.pdf contains the documentation for the work done in this assignment.

About

Parallel Processing Course Assignment

Topics

Resources

License

Stars

Watchers

Forks

Contributors