Skip to content

BhavyaKansal20/SignLang-AI


███████╗██╗ ██████╗ ███╗   ██╗██╗      █████╗ ███╗   ██╗ ██████╗      █████╗ ██╗
██╔════╝██║██╔════╝ ████╗  ██║██║     ██╔══██╗████╗  ██║██╔════╝     ██╔══██╗██║
███████╗██║██║  ███╗██╔██╗ ██║██║     ███████║██╔██╗ ██║██║  ███╗    ███████║██║
╚════██║██║██║   ██║██║╚██╗██║██║     ██╔══██║██║╚██╗██║██║   ██║    ██╔══██║██║
███████║██║╚██████╔╝██║ ╚████║███████╗██║  ██║██║ ╚████║╚██████╔╝    ██║  ██║██║
╚══════╝╚═╝ ╚═════╝ ╚═╝  ╚═══╝╚══════╝╚═╝  ╚═╝╚═╝  ╚═══╝ ╚═════╝     ╚═╝  ╚═╝╚═╝

✦ Real-time Indian Sign Language Recognition — Powered by Deep Learning ✦

Every Hand Has a Voice. Break the Silence with AI.


Live Demo GitHub Repo


Python Flask PyTorch MediaPipe OpenCV gTTS Docker HuggingFace


"7 crore Indians are deaf or hard of hearing. SignLang AI gives every one of them a voice — in real-time, through any webcam."



⚡ At a Glance

✋ Two-Hand Detection 🧠 LSTM Model 🎙️ Bilingual TTS ✨ NLP Polish 📖 Sign Dictionary
MediaPipe Dual Track 3-Layer PyTorch English + Hindi Rule-based NLP 35 ISL Signs
21 landmarks × hand 99.9% Val Acc gTTS Engine Auto grammar fix Slide-up drawer
Both skeletons drawn 42,000+ samples Real-time speak Sentence builder Click to hear


🏗️ System Architecture

╔══════════════════════════════════════════════════════════════════════════════╗
║                    🤙  SIGNLANG AI  —  SYSTEM ARCHITECTURE                  ║
╠══════════════════════════════════════════════════════════════════════════════╣
║                                                                              ║
║   ┌─────────────┐     HTTPS      ┌──────────────────────────────────────┐   ║
║   │   🌐 USER   │ ─────────────► │          FLASK APPLICATION           │   ║
║   │   BROWSER   │                │                                      │   ║
║   └─────────────┘                │  ┌────────────┐  ┌────────────────┐  │   ║
║                                  │  │  Auth      │  │  Route Handler │  │   ║
║   ┌─────────────┐                │  │  Middleware │  │  /dashboard    │  │   ║
║   │  📷 WEBCAM  │ ─────────────► │  │  (SHA-256) │  │  /api/camera   │  │   ║
║   │   30 FPS    │                │  └────────────┘  │  /api/tts      │  │   ║
║   └─────────────┘                │                  │  /api/nlp      │  │   ║
║                                  └──────────────┬───└────────────────┘──┘   ║
║                                                 │                            ║
║        ┌────────────────────────────────────────┼────────────────────┐      ║
║        │                      │                 │                    │      ║
║        ▼                      ▼                 ▼                    ▼      ║
║  ┌───────────────┐  ┌──────────────────┐  ┌──────────┐  ┌─────────────────┐ ║
║  │ 📡 MEDIAPIPE  │  │  🧠 LSTM ENGINE  │  │ 🔤 NLP   │  │  🔊 TTS ENGINE  │ ║
║  │               │  │                  │  │  ENGINE  │  │                 │ ║
║  │  2-Hand track │  │  signlang_       │  │  Polish  │  │  gTTS English   │ ║
║  │  21 landmarks │  │  model.pt        │  │  Grammar │  │  gTTS Hindi     │ ║
║  │  per hand     │  │  label_map.json  │  │  Clean   │  │  deep-          │ ║
║  │  63 features  │  │  35 classes      │  │  Punct.  │  │  translator     │ ║
║  └───────────────┘  └──────────────────┘  └──────────┘  └─────────────────┘ ║
║                                                                              ║
╚══════════════════════════════════════════════════════════════════════════════╝


🤖 ML Pipeline

╔══════════════════════════════════════════════════════════════════════════════╗
║                        ML PIPELINE — END TO END                             ║
╠══════════════════════════════════════════════════════════════════════════════╣
║                                                                              ║
║   RAW DATA               PREPROCESSING           LANDMARK EXTRACTION        ║
║  ──────────             ───────────────          ──────────────────────      ║
║  42,745 ISL  ─────────► Folder scan   ─────────► MediaPipe Hands            ║
║  images                 Class mapping            21 (x,y,z) landmarks       ║
║  35 classes             80/20 split              Wrist-normalized            ║
║                                                  63 features flat            ║
║                                                                              ║
║   SEQUENCE BUILD          MODEL TRAINING            INFERENCE                ║
║  ───────────────         ────────────────          ──────────               ║
║  30 frames/seq ────────► 3-layer LSTM   ─────────► Majority vote            ║
║  Sliding window          hidden=128               buffer (5 frames)         ║
║  Augmentation            dropout=0.3              conf threshold 0.70       ║
║                          60 epochs                35-class softmax          ║
║                          MPS (Apple M4)           → word + confidence       ║
║                                                                              ║
╚══════════════════════════════════════════════════════════════════════════════╝


📊 Model Performance

  SIGNLANG LSTM MODEL — TRAINING RESULTS
  ───────────────────────────────────────

  Val Accuracy   ████████████████████  99.9%   (Epoch 22 best)
  Train Accuracy ████████████████████  99.9%   (Epoch 60)
  Val Loss       ▓▓░░░░░░░░░░░░░░░░░░  0.0046  (Best checkpoint)
  Train Loss     ▓▓░░░░░░░░░░░░░░░░░░  0.0086

  Epoch 1   ████░░░░░░░░░░░░░░░░  70.1%  →  Val 90.4%  ✓ saved
  Epoch 4   ████████████████░░░░  97.8%  →  Val 99.1%  ✓ saved
  Epoch 7   ████████████████████  98.9%  →  Val 99.7%  ✓ saved
  Epoch 22  ████████████████████  99.8%  →  Val 99.9%  ✓ BEST
Metric Value
Best Validation Accuracy 99.9%
Best Validation Loss 0.0046
Total Training Samples 36,257
Total Validation Samples 6,399
Total Dataset 42,656 sequences
Classes 35 ISL signs
Model Architecture 3-layer LSTM (hidden=128)
Training Device Apple MPS (M4)
Epochs Trained 60


🔮 Detection Flow

                        WEBCAM FRAME (30 FPS)
                               │
                               ▼
             ┌─────────────────────────────────┐
             │       OpenCV — Frame Flip       │
             │    (mirror effect for user)     │
             └────────────────┬────────────────┘
                              │
                              ▼
             ┌─────────────────────────────────┐
             │    MediaPipe — 2-Hand Track     │
             │  Up to 2 hands detected         │
             │  21 (x,y,z) landmarks per hand  │
             │  Both skeletons drawn on feed   │
             └────────────────┬────────────────┘
                              │
                              ▼
             ┌─────────────────────────────────┐
             │   Dominant Hand Selection       │
             │  (largest bounding box = front) │
             │  Wrist-normalize → 63 features  │
             └────────────────┬────────────────┘
                              │
                              ▼
             ┌─────────────────────────────────┐
             │    Sliding Window Buffer        │
             │    30 frames accumulated        │
             │    → shape (30, 63)             │
             └────────────────┬────────────────┘
                              │
                              ▼
             ┌─────────────────────────────────┐
             │    LSTM Inference (PyTorch)     │
             │    → softmax probabilities      │
             │    → top class + confidence     │
             └────────────────┬────────────────┘
                              │
               ┌──────────────┼──────────────┐
               ▼              ▼              ▼
          conf < 0.70    0.70–0.85      conf > 0.85
               │              │              ▼
          ❌ Rejected    ⚠️ Accepted     ✅ High conf
                              │              │
                              └──────┬───────┘
                                     │
                              Majority Vote (5 frames)
                                     │
                                     ▼
             ┌─────────────────────────────────┐
             │    Sentence Builder             │
             │  • Cooldown deduplication       │
             │  • Max 30 words                 │
             │  • NLP polish on demand         │
             └────────────────┬────────────────┘
                              │
               ┌──────────────┴──────────────┐
               ▼                             ▼
      🔊 gTTS Speak (EN/HI)        📋 Sentence Display
      Background thread            + Confidence Arc
      Auto-triggered               + Hindi translation


✨ Features

✋ Two-Hand Detection — click to expand

MediaPipe now tracks both hands simultaneously — both skeletons are rendered on the camera overlay so users can see exactly what's being detected.

  • Up to 2 hands tracked at 30 FPS
  • Both hand skeletons drawn with full MediaPipe styling
  • Dominant hand selector — picks the larger (closer) hand for LSTM prediction
  • num_hands indicator shown on dashboard HUD
  • No retraining required — model still receives single 63-feature vector
🧠 LSTM Sign Recognition — click to expand

3-layer PyTorch LSTM trained from scratch on 42,000+ landmark sequences.

Parameter Value
Input size 63 (21 landmarks × xyz)
Hidden size 128
Layers 3
Dropout 0.3
Sequence length 30 frames
Output 35 ISL classes
Confidence threshold 70%
Vote buffer 5 predictions

Supported signs:

Letters:  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Numbers:  One Two Three Four Five Six Seven Eight Nine
🎙️ Bilingual TTS + NLP — click to expand

Text-to-Speech:

  • Individual words spoken automatically on detection
  • Full sentence spoken on demand via button or S key
  • English and Hindi voices powered by gTTS
  • Hindi TTS uses deep-translator for accurate translation
  • Base64 audio returned via API — plays instantly in browser

NLP Sentence Polishing:

  • Number words → digits (One → 1, Five → 5)
  • Consecutive duplicate removal
  • Auto-capitalisation of first word
  • Trailing punctuation added
  • Single POST /api/nlp/polish call
📖 Sign Dictionary — click to expand

Slide-up reference drawer showing all 35 supported ISL signs:

  • Press D or click Dictionary button to open
  • All signs displayed as clickable cards
  • Click any sign to hear it spoken aloud via TTS
  • Organized alphabetically
  • Works independently of camera
⌨️ Keyboard Shortcuts — click to expand

Full keyboard accessibility built in:

Key Action
Space Start / Stop camera
C Clear sentence
H Toggle Hindi voice
D Open sign dictionary
S Speak sentence (English)
P Polish sentence (NLP)
? Toggle shortcuts overlay
Esc Close all overlays
📊 Confidence Sparkline — click to expand

Live Canvas chart rendered below the camera:

  • Plots last 60 prediction confidence values
  • Gradient fill under the line
  • Colour-coded dot: 🟢 jade (>85%) / 🟠 fire (65–85%) / 🔴 rose (<65%)
  • Updates every 280ms in sync with polling
  • Helps users adjust hand position for optimal accuracy
📄 Session PDF Reports — click to expand

Every session is automatically saved and downloadable as a PDF:

┌────────────────────────────────────────┐
│       🤙 SIGNLANG AI REPORT           │
│                                        │
│  User: Bhavya Kansal                   │
│  Date: 2026-03-14   Time: 14:32        │
│                                        │
│  Sentence: Hello how are you           │
│  Words: 4      Duration: 45s           │
│                                        │
│  WORD LOG:                             │
│  14:31:22  Hello     94.2%            │
│  14:31:28  How       88.7%            │
│  14:31:34  Are       91.3%            │
│  14:31:41  You       96.1%            │
└────────────────────────────────────────┘


🌐 API Routes

  PUBLIC ROUTES                          PRIVATE ROUTES (Auth Required)
  ─────────────                          ───────────────────────────────
  GET  /                  Landing        GET  /dashboard
  GET  /signin            Auth           GET  /history
  GET  /signup            Register       GET  /video_feed         (MJPEG stream)
  GET  /privacy           Policy         POST /api/camera/start
  GET  /terms             ToS            POST /api/camera/stop
                                         GET  /api/camera/status
                                         POST /api/sentence/clear
                                         POST /api/nlp/polish
                                         POST /api/tts/word
                                         POST /api/tts/sentence
                                         POST /api/translate/hindi
                                         POST /api/feedback
                                         GET  /api/report/latest


🗂️ Project Structure

signlang-ai/
│
├── 📄 app.py                     ← Flask application + all routes
├── 📋 requirements.txt           ← Python dependencies
├── 🐳 Dockerfile                 ← Docker config for deployment
│
├── 📂 model/
│   ├── signlang_model.pt         ← Trained 3-layer LSTM weights
│   ├── label_map.json            ← {0: "A", 1: "B", ...}
│   ├── train_lstm.py             ← Training script
│   ├── extract_landmarks.py      ← MediaPipe landmark extraction
│   ├── prepare_dataset.py        ← Dataset folder → class mapping
│   └── plots/
│       └── training_curves.png   ← Loss & accuracy plots
│
├── 📂 pipeline/
│   ├── landmark_extractor.py     ← 2-hand MediaPipe wrapper
│   ├── predictor.py              ← LSTM inference + vote buffer
│   └── sentence_builder.py      ← Word accumulator + cooldown
│
├── 📂 utils/
│   ├── tts_engine.py             ← gTTS English + Hindi TTS
│   ├── nlp_engine.py             ← Sentence polishing + translation
│   ├── report_gen.py             ← ReportLab PDF generation
│   └── sheets_client.py         ← Google Sheets session storage
│
├── 📂 static/
│   ├── logo.png                  ← Project logo (drop here)
│   ├── css/style.css             ← Full premium design system
│   └── js/
│       ├── dashboard.js          ← Camera, TTS, NLP, sparkline JS
│       └── main.js               ← Cursor, scroll reveal, animations
│
└── 📂 templates/
    ├── base.html                 ← Base layout + navbar + footer
    ├── index.html                ← Landing page
    ├── dashboard.html            ← Live detection UI
    ├── history.html              ← Session history table
    ├── signin.html               ← Auth
    ├── signup.html               ← Register
    ├── privacy.html              ← Privacy policy
    └── terms.html                ← Terms of use


⚙️ Setup & Run Locally

Prerequisites

python --version   # Python 3.12+
pip --version      # pip 23+

Step 1 — Clone

git clone https://github.com/BhavyaKansal20/SignLang-AI.git
cd SignLang-AI

Step 2 — Install Dependencies

pip install -r requirements.txt

Step 3 — Prepare Dataset (only if retraining)

python model/prepare_dataset.py --src /path/to/ISL/images
python model/extract_landmarks.py --mode image --src data/raw/images
python model/train_lstm.py

Step 4 — Launch

python app.py
# 🚀 Running on http://localhost:7860


🐳 Docker

# Build
docker build -t signlang-ai .

# Run
docker run -p 7860:7860 signlang-ai

# Open
# http://localhost:7860


🧰 Tech Stack

Layer Technology Purpose
Language Python Core runtime
Backend Flask Web framework + MJPEG stream
Deep Learning PyTorch LSTM model training + inference
Computer Vision MediaPipe 2-hand landmark extraction
Video OpenCV Webcam capture + frame processing
TTS gTTS English + Hindi text-to-speech
Translation DeepTranslator EN → Hindi NLP translation
PDF ReportLab Session PDF report generation
Storage Sheets Users + session storage
Deployment HuggingFace Cloud hosting
Container Docker Reproducible deployment


🔐 Security Architecture

  USER PASSWORD
       │
       ▼
  SHA-256 Hash ──────► Stored in Google Sheets (never plain text)

  LOGIN REQUEST
       │
       ├─► Hash & Compare ──► ✅ Match → Flask Session Created
       │                      ❌ Mismatch → Redirect to signin
       │
       └─► All private routes check session["user_email"]
           Decorator: @login_req applied to every API endpoint

  CAMERA DATA
       │
       └─► Processed entirely on the server/device
           No video frames stored or transmitted
           Only recognized text saved to session history


⚠️ Disclaimer

SignLang AI is developed for educational and accessibility research purposes. The model achieves 99.9% validation accuracy on the training distribution. Real-world accuracy may vary with lighting, camera quality and signing style. Do not rely on this tool as a sole communication aid in critical situations.



👨‍💻 Author

╔══════════════════════════════════════════════════════════════╗
║                                                              ║
║   👤  Bhavya Kansal                                          ║
║   🎓  AI Engineer | DeepTech Developer                       ║
║   🏢  Founder — MultiModex AI                                ║
║   🎓  B.Tech AI & ML — Thapar Institute of Engg. & Tech      ║
║   🔬  AI/ML Intern Trainee — IIT Ropar × NIELIT              ║
║   🌐  bhavyakansal.dev                                       ║
║   📧  kansalbhavya27@gmail.com                               ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

Portfolio GitHub LinkedIn MultiModex AI



⭐ Support

If SignLang AI impressed you or helped you:

  1. ⭐ Star this repository
  2. 🍴 Fork and build on it
  3. 📣 Share with developers and accessibility advocates
  4. 🐛 Open issues or pull requests
  5. 🤝 Help expand the sign vocabulary

Every star helps this project reach more people who need it. 🙏



  ╔══════════════════════════════════════════════════════╗
  ║     🤙  S I G N L A N G   A I                        ║
  ║     MultiModex AI  •  © 2026 Bhavya Kansal           ║
  ╚══════════════════════════════════════════════════════╝

Live Demo

About

✦ Real-time Indian Sign Language Recognition — Powered by Deep Learning ✦ Every Hand Has a Voice. Break the Silence with AI.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors