Skip to content

Avalon-S/HPC4DS-Project-GWO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HPC4DS_Project_GWO

logo

The logo of this project.

Yuhang Jiang
University of Trento (DISI)
Ebuka Nwafornso
University of Trento (DISI)
Comfort Munyaradzi
University of Trento (DISI)

Table of Contents

  1. Introduction
  2. Motivation
  3. Contribution
  4. Overview of Algorithms Ideas and Implementation
  5. Experiment Platform Introduction
  6. Usage
  7. License

Introduction

This project is based on the High Performance Computing for Data Science course group project at the University of Trento (UNITN). The main theme is the parallelization of the Grey Wolf Optimizer (GWO). All instructions and codes in the project are designed for UNITN's HPC cluster.


Motivation

  • The original Grey Wolf Optimizer, while simple and effective, heavily relies on the top three wolves (α, β, δ) and lacks additional information fusion, leading to unsatisfactory convergence speed in many cases.
  • GWO’s dependence on the top three wolves means that every iteration requires collecting information about all wolves to update positions, creating a significant sorting and communication overhead for the master process. This often results in the parallel algorithm having similar execution times to the serial one (the main cost comes from sorting), wasting computational resources.

Back to Table of Contents


Contribution

  1. Proposed the HGT-GWO algorithm:
    • Introduced global historical best positions and individual trend guidance (similar to velocity vectors).
    • Outperformed GWO on 3 test functions.
  2. Proposed a master-worker island parallelization scheme:
    • Subpopulations operate independently, with sync_interval controlling the frequency of global synchronization.
    • Most of the sorting work is done on the worker processes, reducing communication and sorting overhead, and improving parallel efficiency.
  3. Experimental validation:
    • HGT-GWO was first implemented in Python in November 2024 and demonstrated comprehensive superiority over GWO on 15 test functions.
    • The parallelized code was implemented using C, MPI, and OpenMP.

Back to Table of Contents


Overview of Algorithms Ideas and Implementation

Aspect GWO HGT-GWO
Optimization Mechanism Follows the standard GWO, with updates based purely on the positions of alpha, beta, and delta. Incorporates additional historical guidance and trend adjustment into the position update formula.
Position Update Formula [ X = (X1 + X2 + X3) / 3 ], where ( X1, X2, X3 ) are guided by alpha, beta, and delta. [ X = (1 - α_w - β_w) * X_GWO + α_w * X_history + β_w * X_trend ], including historical and trend components.
History Guidance Not utilized. Utilizes previous_best_pos (global historical best position) to guide the search process.
Trend Adjustment Not included. Accounts for the movement trend of individuals (previous_positions) to adjust their trajectory.
Data Stored Only the current positions and fitness of all individuals. Stores: 1) Global historical best position (previous_best_pos), 2) Previous generation positions (previous_positions).
Exploration vs Exploitation Balances exploration and exploitation purely based on alpha, beta, and delta. Enhances exploitation by leveraging historical information and trend data, improving convergence.
Convergence Speed Moderate. Faster due to history and trend-based refinement.
Execution Speed Moderate, as sorting is the most time-consuming operation. Faster overall, as faster convergence reduces the number of iterations and sorting time despite more operations per iteration.
Avoiding Local Optima May struggle in multi-modal functions; prone to premature convergence. Better at escaping local optima by utilizing historical and trend information.
Computational Overhead Lower, as it does not require additional historical data. Moderate, due to the need for storing and processing historical positions and trends.
Application Scenarios Suitable for simpler or small-scale optimization problems. More suitable for complex, high-dimensional, and multi-modal optimization problems.

  • The larger the sync_interval, the stronger the independence of subpopulations, leading to faster runtime but slower convergence.
  • When sync_interval=1, the algorithm reverts to the standard GWO or HGT-GWO.
Code Version Responsibilities of Master Process (rank 0) Responsibilities of Worker Processes (rank > 0)
Standard Parallelization Initialize the entire population (population) and distribute it to worker processes (Scatter). Receive the distributed subpopulation (local_pop) and calculate the fitness of assigned individuals.
Gather the subpopulation results from all processes (Gather) and combine them into the complete population. Send the calculated subpopulation back to the master process (Gather).
Sort the complete population, update alpha, beta, delta, and all individual positions. Wait for the updated subpopulation to be distributed, then receive it and begin the next round of fitness calculations.
Redistribute the updated population's subgroups to worker processes, completing the synchronization of a new generation (Scatter).
Master-worker Island Parallelization Initialize the entire population (population) and distribute it to worker processes (Scatter). Receive the distributed subpopulation (local_pop) and calculate the fitness of assigned individuals.
Gather the top 3 best individuals (local_top3) from each worker process (Gather) and combine them into global candidates (global_top3). Select the top 3 best individuals from the subpopulation (local_top3) and send them to the master process (Gather).
Sort the global best individuals (global_top3) and determine the global alpha, beta, delta wolves. Receive the global best individuals (alpha, beta, delta) broadcasted by the master process (Broadcast).
Broadcast the global top 3 individuals (alpha, beta, delta) to all worker processes (Broadcast) to ensure global consistency. Use the broadcasted global best individuals to update the positions of the local population and start the next round of calculations.

Some faster methods

  • Standard Parallelization: The master process only sends alpha, beta, and delta information back to each process, but the largest sorting overhead still occurs in the master process.
  • Master-worker Island Parallelization: Let each child process's local alpha, beta, and delta directly update its population individuals, which will converge faster (essentially independent GWO), but this will dilute the idea of "master-worker" too much.

Back to Table of Contents


Experiment Platform Introduction

This repository contains the codes, scripts, and experimental results for GWO (serial, MPI, MPI+OpenMP) and HGT-GWO (serial, MPI, MPI+OpenMP). The experiments folder contains:

  • Standard_Parallelization_serial_mpi version
  • Standard_Parallelization_hybrid version
  • Master-Worker_Island_Parallelization_serial_mpi version
  • Master-Worker_Island_Parallelization_hybrid version

The above four use quick sort.

  • bubble_sort/Standard_Parallelization version
  • bubble_sort/Master-worker_Island_Parallelization version

Prove that execution time is mainly affected by the sorting algorithm.

The time complexity of bubble sort is ( O(n^2) ), and the time complexity of quick sort is ( O(n \log n) ) (close to ( O(n) )). This is why when the population size is halved (( n \to \frac{n}{2} )), the speedup ratio of the parallel solution using bubble sort is approximately 4, while the speedup ratio of the parallel solution using quick sort is approximately 2. And this is why letting rank 0 sort the entire population results in almost the same serial and parallel run times. Most of the time is spent on sorting.

Due to time constraints, the framework code for running serial, MPI, and MPI+OpenMP together has not been fully integrated (the core definitions differ between MPI and MPI+OpenMP, requiring additional code restructuring and bug testing). However, combining the scripts would complete the full testing framework.

Parameter Settings

Parameter Configuration
Maximum Iterations 500
Population Size 1000
Test Functions Sphere, Schwefel 2.22, Schwefel 1.2
Dimension 256, 512, 1024
Cores 1, 2, 4, 8, 16, 32, 64
Threads 1, 2, 4
Sorting Algorithms Bubble sort, quick sort

Experimental Conclusion

Back to Table of Contents


Usage

To avoid bugs caused by relative paths, this project uses absolute paths. Before using it, please rename the folder under experiments, experiments/bubble_sort to Project, and replace yuhang.jiang with your <user_id>.

cd Project
qsub src/scripts/0.cleanup.sh # Clean all experiment files (compiled files, logs, results)
qsub src/scripts/1.compile.sh # Compile the code
qsub src/scripts/2.run_experiments.sh # Run serial, MPI, MPI+OpenMP executables
qsub src/scripts/3.generate_scripts.sh # Generate scripts needed for 2.run_experiments.sh
qsub src/scripts/4.del_tasks.sh # Query and delete tasks in the cluster (replace <user_id>)

If your scripts don't run, the reason should be line break format. Use dos2unix xxx.sh to convert.

Back to Table of Contents


License

This project is licensed under the MIT License. See the LICENSE file for details.

Back to Table of Contents


About

This repository showcases the parallelization of the Grey Wolf Optimizer (GWO) and the development of the History-Guided Trend-adjusted GWO (HGT-GWO) with a focus on master-worker island parallelization, as part of the High Performance Computing for Data Science course group project at the University of Trento.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors