Skip to content

Mohamed-Mohamed-Ibrahim/Object-Detection-Models-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Object Detection Models Comparison


👥 Team Members

  1. Mohamed Hassan
  2. Omar Hany
  3. Mohamed Mohamed Ibrahim

1. Model Evaluation: Faster R-CNN ResNet-101 (1024×1024)

Model Architecture

Faster R-CNN (ResNet-101 backbone) with 1024×1024 input. A two-stage detector consisting of:

  • Backbone (ResNet-101): Deep backbone for high-level semantic feature extraction.
  • RPN: Generates candidate object regions via anchors.
  • RoI Align: Resizes regions to fixed resolution, preserving spatial alignment.
  • Detection Head: Classifies regions and refines bounding boxes.

Quantitative Results

COCO 2017 Validation

Metric Value Metric Value
mAP @[0.50:0.95] 0.317 AP @ 0.50 0.456
Avg IoU (Matched) 0.830 AP (Small/Med/Lrg) 0.127 / 0.347 / 0.498

Pascal VOC 2007 (COCO metrics)

Metric Value Metric Value
mAP @[0.50:0.95] 0.524 AP @ 0.50 0.744
Avg IoU (Matched) 0.841 AP (Small/Med/Lrg) 0.168 / 0.407 / 0.619

Analysis

  • Performance: mAP 0.317 on COCO; significantly better on VOC (0.524).
  • Localization: High Avg IoU (~0.83) confirms the two-stage refinement is highly effective.
  • Scale: Performs well on Medium/Large objects but struggles with Small objects.

Qualitative Analysis

Successful Detections:

frcnn_success_1

Failure Cases:

frcnn_failure_1

Feature Maps:

frcnn_feature_map_1


2. Model Evaluation: SSDLite320 MobileNetV3-Large

Model Architecture

SSDLite320 (MobileNetV3-Large backbone). A single-stage, efficient detector for edge devices.

  • Backbone: Lightweight MobileNetV3 with SE modules.
  • SSD: Direct bounding box prediction without region proposals.
  • Resolution: Fixed 320×320 input for speed, limiting small object visibility.

Quantitative Results

COCO 2017 Validation

Metric Value
mAP @[0.50:0.95] 0.2107
AP @ 0.50 0.3388
AP @ 0.75 0.2190
AP (Small) 0.0034
AP (Medium) 0.0947
AP (Large) 0.4119
Mean IoU (>0.7) 0.8062

Pascal VOC 2007 (COCO metrics)

Metric Value
mAP @[0.50:0.95] 0.4150
AP @ 0.50 0.6364
AP @ 0.75 0.4443
AP (Small) 0.0056
AP (Medium) 0.1784
AP (Large) 0.5668

Analysis

  • Trade-off: The mAP (0.4150) reflects the efficiency trade-off.
  • Small Object Blindness: AP (Small) is extremely low (0.0056) due to 320px downsampling.
  • Strengths: Competent on large, prominent objects (AP Large 0.5668).

Qualitative Analysis

Successful Detections:

ssd_success_1

Failure Cases:

ssd_failure_1

Feature Maps:

ssd_feature_map_1


3. Model Evaluation: YOLOv12-X

Model Architecture

YOLOv12: High-speed detector using attention mechanisms (FlashAttention, Area Attention) and a hierarchical backbone (R-ELAN).

Quantitative Results

Efficiency: 59.1M Params, 199.0 GFLOPs, 11.79 ms/img latency.

Detection Performance

Dataset mAP @[0.50:0.95] Avg IoU mAP @ 0.50
COCO 2017 0.718 0.554 0.841
Pascal VOC 2007 0.898 0.714 0.888

Qualitative Analysis

Successful Cases (COCO & Pascal):

yolo_coco_success_3

Failure Cases (COCO & Pascal):

yolo_coco_failure_3

Feature Map Visualization:

yolo_feature_map_1_output_2

Releases

No releases published

Packages

 
 
 

Contributors