This repository contains the source code and documentation for my Master's project, which investigates the presence of gender bias in facial recognition systems and evaluates technical methods for its mitigation.
Facial recognition technology is increasingly utilized across various sectors, including security and recruitment. However, studies have shown that these systems can demonstrate significant performance disparities across different demographic groups. This study specifically examines gender bias within a Convolutional Neural Network (CNN) framework.
The project involves building a robust gender classification model and using the IBM AI Fairness 360 (AIF360) toolkit to audit and mitigate bias. The objective is to achieve equitable model performance without compromising overall classification accuracy.
The model training process utilizes the FERET dataset along with other facial image sets. Preprocessing steps are implemented to ensure the model focuses on structural features rather than noise:
- Resizing images to 64x64 pixels for computational efficiency.
- Conversion to grayscale to eliminate potential color-based bias.
- Histogram equalization to standardize lighting conditions and contrast across the dataset.
- Normalizing data to support reliable convergence during training.
Bias is quantified using several fairness metrics provided by the AIF360 library:
- Disparate Impact: Compares the probability of favorable outcomes between unprivileged and privileged groups.
- Statistical Parity Difference: Measures the difference in favorable rate between the two groups.
- Equal Opportunity Difference: Evaluates the difference in true positive rates.
To resolve identified disparities, the Reweighing algorithm is implemented. This pre-processing technique calculates weights for each training instance to ensure that the distribution is balanced across protected attributes (gender) and labels (male/female) before the model begins the learning phase.
The project is implemented in Python using the following primary libraries:
- TensorFlow and Keras: For designing and training the CNN architecture.
- AI Fairness 360 (AIF360): Used for both the auditing and mitigation phases of the study.
- OpenCV: Employed for image processing and manipulation.
- Scikit-learn: Used for evaluating traditional performance metrics such as precision and recall.
The effectiveness of the bias mitigation strategy was evaluated across multiple dataset configurations using standard classification metrics and fairness indicators.
The following table summarizes the comparative results between the baseline model and the de-biased model using the Reweighing technique:
| Configuration | Dataset | Test Accuracy | Bias Mitigation |
|---|---|---|---|
| Baseline | Dataset 1 | 95.77% | None |
| Mitigated | Dataset 1 | 95.91% | Reweighing |
| Baseline | Dataset 3 (Female Privileged) | 84.69% | None |
| Mitigated | Dataset 3 (Female Privileged) | 86.36% | Reweighing |
| Baseline | Dataset 3 (Male Privileged) | 84.69% | None |
| Mitigated | Dataset 3 (Male Privileged) | 83.97% | Reweighing |
- Bias Reduction: The application of the Reweighing algorithm significantly improved parity in classification errors. For instance, in Dataset 1, the mitigation process reduced false positive disparities while maintaining an exceptionally high overall accuracy.
- Metric Trade-offs: The results highlight a common challenge in algorithmic fairness: the trade-off between raw accuracy and demographic parity. In some dataset configurations, achieving a more equitable outcome required a slight reduction in overall performance metrics.
- Reliability: The training curves demonstrate that the de-biased models achieve stable convergence, indicating that the reweighing process does not introduce training instabilities.
The following training curves illustrate model convergence across different dataset experimental setups:
Figure 1: Accuracy and Loss curves for the initial gender classification model on Dataset 1.
Figure 2: Performance metrics for a dataset configuration where female subjects were the privileged group.
Figure 3: Performance metrics for a dataset configuration where male subjects were the privileged group.
datasets/: Contains the image data subsets used for training and validation.images/: Training performance visualizations and graphs.models/: Stores the saved weights and architectures of trained models.gc-ds1.ipynb: Notebook containing the investigation and mitigation steps for Dataset 1.gc-ds3-femalepriv.ipynb: Analysis focused on datasets with a female-privileged distribution.gc-ds3-malepriv.ipynb: Analysis focused on datasets with a male-privileged distribution.notesandinstructions.py: Documentation of utility functions and preprocessing logic.codeanalysis.txt: Detailed comparative results and metric summaries.requirements.txt: List of dependencies required to run the notebooks.
The project requires Python 3.8 or higher. To install the necessary dependencies, run:
pip install -r requirements.txtTo view the experiments, launch Jupyter Notebook and open the relevant .ipynb file:
jupyter notebook gc-ds1.ipynbIf you use this work in your research, please cite it as follows:
Patel, R. (2026). Detecting and Mitigating Algorithmic Bias in Face Recognition Algorithms: A Research Study. MSc Dissertation, University of Hertfordshire.
- Datasets: The facial images used involve the FERET database and various synthetic demographic splits.
- Fairness Framework: IBM AI Fairness 360 (AIF360) - https://aif360.mybluemix.net/
- Literature:
- Buolamwini, J., & Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification."
- Bellamy, R. K. E., et al. (2018). "AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating AI bias."
Author: Rishabh Patel
Course: MSc Computer Science
Date: March 2026
Project: Detecting and Mitigating Algorithmic Bias in Face Recognition Algorithms