ASL Hand Sign Detection - Learning Project

Project Overview

ASL Hand Sign Detection is an end-to-end computer vision learning project focused on detecting American Sign Language (ASL) signs A, B, and C. The repository provides tools for custom data collection, preprocessing (including MediaPipe landmark extraction), training two model approaches (landmark-based and CNN), and running real-time inference from a webcam.

Purpose: Bridge the gap between ML theory and practical implementation by building a full pipeline: data collection → preprocessing → training → inference → evaluation.

Learning objectives:

Custom data collection and dataset curation
Feature engineering with MediaPipe landmarks vs end-to-end CNNs
Model evaluation and comparison (accuracy, confusion matricies, FPS)
Real-time inference and performance tuning

Project Architecture

Two complementary approaches are supported:

Landmarks-based: Use MediaPipe to extract hand landmarks and train a feedforward neural network on those features.
CNN-based: Train a convolutional neural network directly on images.

Key technologies: OpenCV, MediaPipe, PyTorch + torchvision, Python 3.8+.

graph TD
    A[Data Collection] --> B[Preprocessing]
    B --> C[Landmarks Extraction]
    B --> D[Image Dataset]
    C --> E[Landmarks Model Training]
    D --> F[CNN Model Training]
    E --> G[Inference]
    F --> G[Inference]
    G --> H[Results & Comparison]

Directory Structure

HandSignDetection/
├── src/
│   ├── __init__.py
│   ├── collect_data.py
│   ├── preprocess_data.py (TBD)
│   ├── train_landmarks.py (TBD)
│   ├── train_cnn.py (TBD)
│   ├── compare_models.py
│   └── utils/
│       ├── __init__.py
│       ├── config.py
│       ├── logger.py
│       ├── data.py
│       ├── mediapipe_utils.py
│       ├── metrics.py
│       └── visualization.py
├── config/
│   └── config.yaml
├── data/
│   ├── raw/
│   ├── processed/
│   └── landmarks/
├── models/
│   ├── landmarks/
│   └── cnn/
├── logs/
├── results/
├── README.md
├── requirements.txt
└── LICENSE

src/: Source code and scripts
config/: YAML configuration files
data/raw/: Collected images arranged by sign label
data/processed/: Validated and split datasets
data/landmarks/: MediaPipe landmark arrays
models/: Trained models (timestamped)
logs/: Runtime logs
results/: Evaluation reports and visualizations

Setup Instructions

Prerequisites: Python 3.8+, webcam for data collection.

Installation:

Create and activate a virtual environment:

macOS/Linux:

python -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Verify installation by importing core packages (OpenCV, MediaPipe, TensorFlow).

Auto-create directories: Running any script that calls create_directories() from src.utils.config will create required folders automatically.

Usage Guidelines

1) Data Collection — `src/collect_data.py`

Purpose: Capture hand sign images via webcam.
Usage: Run the script, press A, B, or C keys to save frames for the respective sign. Press q to quit.
Output: Files saved to data/raw/{sign}/.

Collect 100–200 varied images per sign (different lighting, backgrounds, hand poses).

2) Preprocessing — `src/preprocess_data.py` (TBD)

Purpose: Validate images, extract landmarks, and create train/val/test splits.

3) Model Training

src/train_landmarks.py (TBD): Train a FFN on landmark features.
src/train_cnn.py (TBD): Train a CNN on images.

Models are saved under models/{landmarks,cnn}/ with timestamps and metrics.

4) Inference

src/inference_landmarks.py and src/inference_cnn.py (TBD)
Run to perform real-time predictions on webcam feed. Press q to quit.

5) Model Comparison — `src/compare_models.py`

Run:

python -m src.compare_models

Prerequisites:

Both model checkpoints must exist (i.e., training has been run for both pipelines):
- models/landmarks/model_latest.pth
- models/cnn/model_latest.pth
Test data must be populated by python -m src.preprocess_data:
- data/landmarks/test.npy
- data/processed/test/{A,B,C}/

Outputs written to results/ with a YYYYMMDD_HHMMSS timestamp suffix:

File	Description
`comparison_{ts}.json`	Full comparison report (accuracy, confusion matrices, inference times, winners)
`confusion_matrices_{ts}.png`	Side-by-side confusion matrix plots for both models
`accuracy_comparison_{ts}.png`	Side-by-side overall accuracy bar chart
`per_class_accuracy_{ts}.png`	Per-class accuracy breakdown

Latest results (results/comparison_20260503_113334.json, 242 test samples, classes A/B/C):

Metric	Landmarks	CNN
Test accuracy	99.17%	99.59%
Avg inference time	0.019 ms/sample	1.313 ms/sample
Speed advantage	70× faster	—

Class	Winner
A	CNN (100% vs 98.7%)
B	Landmarks (100% vs 98.8%)
C	CNN (100% vs 98.8%)

Overall winner: CNN (accuracy) · Faster model: Landmarks (70× faster, 0.4% accuracy trade-off)

Configuration

All runtime parameters (paths, hyperparameters, thresholds) are centralized in config/config.yaml. Modify this file to change behavior without editing code.

Logging

Use the centralized logger:

from src.utils.logger import setup_logger
logger = setup_logger(__name__)
logger.info("Starting data collection...")

Log files are written to logs/ with a timestamped filename.

Troubleshooting

Webcam not detected: Check permissions, try different camera index.
MediaPipe failures: Improve lighting, adjust min_detection_confidence in the config.
Slow inference: Lower resolution or use the landmarks-based model.
Imports failing: Ensure virtualenv is activated and dependencies are installed.

Future Work

Expand supported signs (D–Z)
Add more robust preprocessing and augmentation
Evaluate MobileNet/ResNet architectures
Package the project for deployment (CLI, web, or mobile)

License & Acknowledgments

See LICENSE for licensing details. Acknowledge MediaPipe, TensorFlow, and OpenCV projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASL Hand Sign Detection - Learning Project

Project Overview

Project Architecture

Directory Structure

Setup Instructions

Usage Guidelines

1) Data Collection — `src/collect_data.py`

2) Preprocessing — `src/preprocess_data.py` (TBD)

3) Model Training

4) Inference

5) Model Comparison — `src/compare_models.py`

Configuration

Logging

Troubleshooting

Future Work

License & Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
config		config
data		data
logs		logs
models		models
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ASL Hand Sign Detection - Learning Project

Project Overview

Project Architecture

Directory Structure

Setup Instructions

Usage Guidelines

1) Data Collection — src/collect_data.py

2) Preprocessing — src/preprocess_data.py (TBD)

3) Model Training

4) Inference

5) Model Comparison — src/compare_models.py

Configuration

Logging

Troubleshooting

Future Work

License & Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1) Data Collection — `src/collect_data.py`

2) Preprocessing — `src/preprocess_data.py` (TBD)

5) Model Comparison — `src/compare_models.py`

Packages