███████╗██╗ ██████╗ ███╗ ██╗██╗ █████╗ ███╗ ██╗ ██████╗ █████╗ ██╗
██╔════╝██║██╔════╝ ████╗ ██║██║ ██╔══██╗████╗ ██║██╔════╝ ██╔══██╗██║
███████╗██║██║ ███╗██╔██╗ ██║██║ ███████║██╔██╗ ██║██║ ███╗ ███████║██║
╚════██║██║██║ ██║██║╚██╗██║██║ ██╔══██║██║╚██╗██║██║ ██║ ██╔══██║██║
███████║██║╚██████╔╝██║ ╚████║███████╗██║ ██║██║ ╚████║╚██████╔╝ ██║ ██║██║
╚══════╝╚═╝ ╚═════╝ ╚═╝ ╚═══╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═════╝ ╚═╝ ╚═╝╚═╝
"7 crore Indians are deaf or hard of hearing. SignLang AI gives every one of them a voice — in real-time, through any webcam."
| ✋ Two-Hand Detection | 🧠 LSTM Model | 🎙️ Bilingual TTS | ✨ NLP Polish | 📖 Sign Dictionary |
|---|---|---|---|---|
| MediaPipe Dual Track | 3-Layer PyTorch | English + Hindi | Rule-based NLP | 35 ISL Signs |
| 21 landmarks × hand | 99.9% Val Acc | gTTS Engine | Auto grammar fix | Slide-up drawer |
| Both skeletons drawn | 42,000+ samples | Real-time speak | Sentence builder | Click to hear |
╔══════════════════════════════════════════════════════════════════════════════╗
║ 🤙 SIGNLANG AI — SYSTEM ARCHITECTURE ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────┐ HTTPS ┌──────────────────────────────────────┐ ║
║ │ 🌐 USER │ ─────────────► │ FLASK APPLICATION │ ║
║ │ BROWSER │ │ │ ║
║ └─────────────┘ │ ┌────────────┐ ┌────────────────┐ │ ║
║ │ │ Auth │ │ Route Handler │ │ ║
║ ┌─────────────┐ │ │ Middleware │ │ /dashboard │ │ ║
║ │ 📷 WEBCAM │ ─────────────► │ │ (SHA-256) │ │ /api/camera │ │ ║
║ │ 30 FPS │ │ └────────────┘ │ /api/tts │ │ ║
║ └─────────────┘ │ │ /api/nlp │ │ ║
║ └──────────────┬───└────────────────┘──┘ ║
║ │ ║
║ ┌────────────────────────────────────────┼────────────────────┐ ║
║ │ │ │ │ ║
║ ▼ ▼ ▼ ▼ ║
║ ┌───────────────┐ ┌──────────────────┐ ┌──────────┐ ┌─────────────────┐ ║
║ │ 📡 MEDIAPIPE │ │ 🧠 LSTM ENGINE │ │ 🔤 NLP │ │ 🔊 TTS ENGINE │ ║
║ │ │ │ │ │ ENGINE │ │ │ ║
║ │ 2-Hand track │ │ signlang_ │ │ Polish │ │ gTTS English │ ║
║ │ 21 landmarks │ │ model.pt │ │ Grammar │ │ gTTS Hindi │ ║
║ │ per hand │ │ label_map.json │ │ Clean │ │ deep- │ ║
║ │ 63 features │ │ 35 classes │ │ Punct. │ │ translator │ ║
║ └───────────────┘ └──────────────────┘ └──────────┘ └─────────────────┘ ║
║ ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ ML PIPELINE — END TO END ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ ║
║ RAW DATA PREPROCESSING LANDMARK EXTRACTION ║
║ ────────── ─────────────── ────────────────────── ║
║ 42,745 ISL ─────────► Folder scan ─────────► MediaPipe Hands ║
║ images Class mapping 21 (x,y,z) landmarks ║
║ 35 classes 80/20 split Wrist-normalized ║
║ 63 features flat ║
║ ║
║ SEQUENCE BUILD MODEL TRAINING INFERENCE ║
║ ─────────────── ──────────────── ────────── ║
║ 30 frames/seq ────────► 3-layer LSTM ─────────► Majority vote ║
║ Sliding window hidden=128 buffer (5 frames) ║
║ Augmentation dropout=0.3 conf threshold 0.70 ║
║ 60 epochs 35-class softmax ║
║ MPS (Apple M4) → word + confidence ║
║ ║
╚══════════════════════════════════════════════════════════════════════════════╝
SIGNLANG LSTM MODEL — TRAINING RESULTS
───────────────────────────────────────
Val Accuracy ████████████████████ 99.9% (Epoch 22 best)
Train Accuracy ████████████████████ 99.9% (Epoch 60)
Val Loss ▓▓░░░░░░░░░░░░░░░░░░ 0.0046 (Best checkpoint)
Train Loss ▓▓░░░░░░░░░░░░░░░░░░ 0.0086
Epoch 1 ████░░░░░░░░░░░░░░░░ 70.1% → Val 90.4% ✓ saved
Epoch 4 ████████████████░░░░ 97.8% → Val 99.1% ✓ saved
Epoch 7 ████████████████████ 98.9% → Val 99.7% ✓ saved
Epoch 22 ████████████████████ 99.8% → Val 99.9% ✓ BEST
| Metric | Value |
|---|---|
| Best Validation Accuracy | 99.9% |
| Best Validation Loss | 0.0046 |
| Total Training Samples | 36,257 |
| Total Validation Samples | 6,399 |
| Total Dataset | 42,656 sequences |
| Classes | 35 ISL signs |
| Model Architecture | 3-layer LSTM (hidden=128) |
| Training Device | Apple MPS (M4) |
| Epochs Trained | 60 |
WEBCAM FRAME (30 FPS)
│
▼
┌─────────────────────────────────┐
│ OpenCV — Frame Flip │
│ (mirror effect for user) │
└────────────────┬────────────────┘
│
▼
┌─────────────────────────────────┐
│ MediaPipe — 2-Hand Track │
│ Up to 2 hands detected │
│ 21 (x,y,z) landmarks per hand │
│ Both skeletons drawn on feed │
└────────────────┬────────────────┘
│
▼
┌─────────────────────────────────┐
│ Dominant Hand Selection │
│ (largest bounding box = front) │
│ Wrist-normalize → 63 features │
└────────────────┬────────────────┘
│
▼
┌─────────────────────────────────┐
│ Sliding Window Buffer │
│ 30 frames accumulated │
│ → shape (30, 63) │
└────────────────┬────────────────┘
│
▼
┌─────────────────────────────────┐
│ LSTM Inference (PyTorch) │
│ → softmax probabilities │
│ → top class + confidence │
└────────────────┬────────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
conf < 0.70 0.70–0.85 conf > 0.85
│ │ ▼
❌ Rejected ⚠️ Accepted ✅ High conf
│ │
└──────┬───────┘
│
Majority Vote (5 frames)
│
▼
┌─────────────────────────────────┐
│ Sentence Builder │
│ • Cooldown deduplication │
│ • Max 30 words │
│ • NLP polish on demand │
└────────────────┬────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
🔊 gTTS Speak (EN/HI) 📋 Sentence Display
Background thread + Confidence Arc
Auto-triggered + Hindi translation
✋ Two-Hand Detection — click to expand
MediaPipe now tracks both hands simultaneously — both skeletons are rendered on the camera overlay so users can see exactly what's being detected.
- Up to 2 hands tracked at 30 FPS
- Both hand skeletons drawn with full MediaPipe styling
- Dominant hand selector — picks the larger (closer) hand for LSTM prediction
num_handsindicator shown on dashboard HUD- No retraining required — model still receives single 63-feature vector
🧠 LSTM Sign Recognition — click to expand
3-layer PyTorch LSTM trained from scratch on 42,000+ landmark sequences.
| Parameter | Value |
|---|---|
| Input size | 63 (21 landmarks × xyz) |
| Hidden size | 128 |
| Layers | 3 |
| Dropout | 0.3 |
| Sequence length | 30 frames |
| Output | 35 ISL classes |
| Confidence threshold | 70% |
| Vote buffer | 5 predictions |
Supported signs:
Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Numbers: One Two Three Four Five Six Seven Eight Nine
🎙️ Bilingual TTS + NLP — click to expand
Text-to-Speech:
- Individual words spoken automatically on detection
- Full sentence spoken on demand via button or
Skey - English and Hindi voices powered by gTTS
- Hindi TTS uses deep-translator for accurate translation
- Base64 audio returned via API — plays instantly in browser
NLP Sentence Polishing:
- Number words → digits (One → 1, Five → 5)
- Consecutive duplicate removal
- Auto-capitalisation of first word
- Trailing punctuation added
- Single
POST /api/nlp/polishcall
📖 Sign Dictionary — click to expand
Slide-up reference drawer showing all 35 supported ISL signs:
- Press
Dor click Dictionary button to open - All signs displayed as clickable cards
- Click any sign to hear it spoken aloud via TTS
- Organized alphabetically
- Works independently of camera
⌨️ Keyboard Shortcuts — click to expand
Full keyboard accessibility built in:
| Key | Action |
|---|---|
Space |
Start / Stop camera |
C |
Clear sentence |
H |
Toggle Hindi voice |
D |
Open sign dictionary |
S |
Speak sentence (English) |
P |
Polish sentence (NLP) |
? |
Toggle shortcuts overlay |
Esc |
Close all overlays |
📊 Confidence Sparkline — click to expand
Live Canvas chart rendered below the camera:
- Plots last 60 prediction confidence values
- Gradient fill under the line
- Colour-coded dot: 🟢 jade (>85%) / 🟠 fire (65–85%) / 🔴 rose (<65%)
- Updates every 280ms in sync with polling
- Helps users adjust hand position for optimal accuracy
📄 Session PDF Reports — click to expand
Every session is automatically saved and downloadable as a PDF:
┌────────────────────────────────────────┐
│ 🤙 SIGNLANG AI REPORT │
│ │
│ User: Bhavya Kansal │
│ Date: 2026-03-14 Time: 14:32 │
│ │
│ Sentence: Hello how are you │
│ Words: 4 Duration: 45s │
│ │
│ WORD LOG: │
│ 14:31:22 Hello 94.2% │
│ 14:31:28 How 88.7% │
│ 14:31:34 Are 91.3% │
│ 14:31:41 You 96.1% │
└────────────────────────────────────────┘
PUBLIC ROUTES PRIVATE ROUTES (Auth Required)
───────────── ───────────────────────────────
GET / Landing GET /dashboard
GET /signin Auth GET /history
GET /signup Register GET /video_feed (MJPEG stream)
GET /privacy Policy POST /api/camera/start
GET /terms ToS POST /api/camera/stop
GET /api/camera/status
POST /api/sentence/clear
POST /api/nlp/polish
POST /api/tts/word
POST /api/tts/sentence
POST /api/translate/hindi
POST /api/feedback
GET /api/report/latest
signlang-ai/
│
├── 📄 app.py ← Flask application + all routes
├── 📋 requirements.txt ← Python dependencies
├── 🐳 Dockerfile ← Docker config for deployment
│
├── 📂 model/
│ ├── signlang_model.pt ← Trained 3-layer LSTM weights
│ ├── label_map.json ← {0: "A", 1: "B", ...}
│ ├── train_lstm.py ← Training script
│ ├── extract_landmarks.py ← MediaPipe landmark extraction
│ ├── prepare_dataset.py ← Dataset folder → class mapping
│ └── plots/
│ └── training_curves.png ← Loss & accuracy plots
│
├── 📂 pipeline/
│ ├── landmark_extractor.py ← 2-hand MediaPipe wrapper
│ ├── predictor.py ← LSTM inference + vote buffer
│ └── sentence_builder.py ← Word accumulator + cooldown
│
├── 📂 utils/
│ ├── tts_engine.py ← gTTS English + Hindi TTS
│ ├── nlp_engine.py ← Sentence polishing + translation
│ ├── report_gen.py ← ReportLab PDF generation
│ └── sheets_client.py ← Google Sheets session storage
│
├── 📂 static/
│ ├── logo.png ← Project logo (drop here)
│ ├── css/style.css ← Full premium design system
│ └── js/
│ ├── dashboard.js ← Camera, TTS, NLP, sparkline JS
│ └── main.js ← Cursor, scroll reveal, animations
│
└── 📂 templates/
├── base.html ← Base layout + navbar + footer
├── index.html ← Landing page
├── dashboard.html ← Live detection UI
├── history.html ← Session history table
├── signin.html ← Auth
├── signup.html ← Register
├── privacy.html ← Privacy policy
└── terms.html ← Terms of use
python --version # Python 3.12+
pip --version # pip 23+git clone https://github.com/BhavyaKansal20/SignLang-AI.git
cd SignLang-AIpip install -r requirements.txtpython model/prepare_dataset.py --src /path/to/ISL/images
python model/extract_landmarks.py --mode image --src data/raw/images
python model/train_lstm.pypython app.py
# 🚀 Running on http://localhost:7860# Build
docker build -t signlang-ai .
# Run
docker run -p 7860:7860 signlang-ai
# Open
# http://localhost:7860 USER PASSWORD
│
▼
SHA-256 Hash ──────► Stored in Google Sheets (never plain text)
LOGIN REQUEST
│
├─► Hash & Compare ──► ✅ Match → Flask Session Created
│ ❌ Mismatch → Redirect to signin
│
└─► All private routes check session["user_email"]
Decorator: @login_req applied to every API endpoint
CAMERA DATA
│
└─► Processed entirely on the server/device
No video frames stored or transmitted
Only recognized text saved to session history
SignLang AI is developed for educational and accessibility research purposes. The model achieves 99.9% validation accuracy on the training distribution. Real-world accuracy may vary with lighting, camera quality and signing style. Do not rely on this tool as a sole communication aid in critical situations.
╔══════════════════════════════════════════════════════════════╗
║ ║
║ 👤 Bhavya Kansal ║
║ 🎓 AI Engineer | DeepTech Developer ║
║ 🏢 Founder — MultiModex AI ║
║ 🎓 B.Tech AI & ML — Thapar Institute of Engg. & Tech ║
║ 🔬 AI/ML Intern Trainee — IIT Ropar × NIELIT ║
║ 🌐 bhavyakansal.dev ║
║ 📧 kansalbhavya27@gmail.com ║
║ ║
╚══════════════════════════════════════════════════════════════╝
If SignLang AI impressed you or helped you:
1. ⭐ Star this repository
2. 🍴 Fork and build on it
3. 📣 Share with developers and accessibility advocates
4. 🐛 Open issues or pull requests
5. 🤝 Help expand the sign vocabulary
Every star helps this project reach more people who need it. 🙏