|
1 | | -# Speech to Sign Language Translator |
| 1 | +# SoundSigns: Speech to Sign Language Translator |
2 | 2 |
|
3 | 3 | ## Overview |
4 | 4 |
|
5 | | -A real-time speech-to-sign language translation application that converts spoken words into ISL (International Sign Language) gloss and provides a visual interface. |
| 5 | +SoundSigns is a comprehensive web-based application that translates spoken English into International Sign Language (ISL) in real-time. The system captures speech, converts it to text, translates it into ISL gloss, and displays the translation through a 3D animated avatar using pre-rendered video clips. |
6 | 6 |
|
7 | 7 | ## Features |
8 | 8 |
|
9 | | -- Live audio transcription |
10 | | -- Real-time ISL gloss translation |
11 | | -- Responsive and modern UI design |
12 | | -- Microphone recording controls |
| 9 | +- **Real-time Speech Recognition**: Browser-based speech-to-text conversion using Web Speech API |
| 10 | +- **ISL Gloss Translation**: Converts English text to International Sign Language gloss using ChatGPT API |
| 11 | +- **3D Avatar Animation**: Visual sign language representation through pre-rendered MP4 video clips |
| 12 | +- **Video Assembly**: Seamless concatenation of individual sign videos into coherent sentences |
| 13 | +- **Interactive Interface**: Clean, responsive UI with microphone controls and video playback |
| 14 | +- **Multi-format Support**: Covers alphabet letters (A-Z), numbers (0-9), and common vocabulary |
| 15 | +- **Download Functionality**: Save translated sign language videos for offline use |
| 16 | +- **Cross-browser Compatibility**: Works on modern browsers supporting Web Speech API |
| 17 | + |
| 18 | +## Architecture |
| 19 | + |
| 20 | +The application follows a modular three-tier architecture: |
| 21 | + |
| 22 | +- **Frontend**: React.js with Tailwind CSS handling user interaction and video processing |
| 23 | +- **Backend**: Flask server managing API communications and text-to-gloss conversion |
| 24 | +- **Dataset**: Curated collection of ~150 pre-rendered ISL sign videos |
13 | 25 |
|
14 | 26 | ## Prerequisites |
15 | 27 |
|
16 | 28 | - Python 3.8+ |
17 | 29 | - Node.js 14+ |
18 | 30 | - OpenAI API Key |
| 31 | +- Modern web browser with Web Speech API support (Chrome, Edge recommended) |
19 | 32 |
|
20 | | -## Backend Setup |
| 33 | +## Installation |
21 | 34 |
|
22 | | -1. Install Python dependencies: |
| 35 | +### Backend Setup |
23 | 36 |
|
| 37 | +1. Install Python dependencies: |
24 | 38 | ```bash |
25 | 39 | pip install sounddevice numpy openai flask flask-cors python-dotenv |
26 | 40 | ``` |
27 | 41 |
|
28 | | -2. Create a .env file inside the backend/ folder and add your OpenAI API key: |
29 | | - |
| 42 | +2. Create a `.env` file in the `backend/` directory: |
30 | 43 | ```bash |
31 | 44 | OPENAI_API_KEY=your_openai_key_here |
32 | 45 | ``` |
33 | 46 |
|
34 | | -## Frontend Setup |
| 47 | +**Security Note**: Never commit the `.env` file to version control. |
35 | 48 |
|
36 | | -1. Initialize: |
| 49 | +### Frontend Setup |
37 | 50 |
|
| 51 | +1. Navigate to frontend directory and install dependencies: |
38 | 52 | ```bash |
39 | 53 | cd frontend |
40 | 54 | npm install |
41 | 55 | ``` |
42 | 56 |
|
43 | 57 | ## Running the Application |
44 | 58 |
|
45 | | -1. Run the development server in first terminal: |
46 | | - |
| 59 | +1. Start the frontend development server: |
47 | 60 | ```bash |
| 61 | +cd frontend |
48 | 62 | npm run dev |
49 | 63 | ``` |
50 | 64 |
|
51 | | -2. Start the Python backend from root in another terminal: |
52 | | - |
| 65 | +2. In a separate terminal, start the backend server from the project root: |
53 | 66 | ```bash |
54 | | -cd .. |
55 | 67 | py backend/transcription.py |
56 | 68 | ``` |
57 | 69 |
|
58 | | -## Technologies Used: |
| 70 | +3. Access the application at `http://localhost:3000` (or the port specified by your dev server) |
| 71 | + |
| 72 | +## Usage |
| 73 | + |
| 74 | +1. **Voice Input**: Click the microphone button and speak clearly in English |
| 75 | +2. **Transcription**: View the real-time speech-to-text conversion |
| 76 | +3. **Translation**: See the ISL gloss translation displayed |
| 77 | +4. **Video Playback**: Watch the 3D avatar perform the signed translation |
| 78 | +5. **Controls**: Use play, replay, and download buttons to control video playback |
| 79 | + |
| 80 | +## Project Structure |
| 81 | + |
| 82 | +``` |
| 83 | +project-root/ |
| 84 | +├── backend/ |
| 85 | +│ ├── .env # Environment variables (not in version control) |
| 86 | +│ └── transcription.py # Flask server and API logic |
| 87 | +├── frontend/ |
| 88 | +│ ├── src/ |
| 89 | +│ │ ├── components/ # React components |
| 90 | +│ │ └── App.jsx # Main application file |
| 91 | +│ └── assets/ |
| 92 | +│ └── videos/ # Pre-rendered sign language videos |
| 93 | +│ ├── letters/ # A-Z alphabet signs |
| 94 | +│ ├── numbers/ # 0-9 numerical signs |
| 95 | +│ └── words/ # Common vocabulary signs |
| 96 | +``` |
| 97 | + |
| 98 | +## Technologies Used |
| 99 | + |
| 100 | +- **Frontend**: React.js, Tailwind CSS, Web Speech API |
| 101 | +- **Backend**: Python, Flask, Flask-CORS |
| 102 | +- **Translation**: OpenAI GPT-3.5-turbo API |
| 103 | +- **Video Processing**: Browser-based video concatenation |
| 104 | +- **Dataset**: Pre-rendered MP4 videos with 3D ISL avatar |
| 105 | + |
| 106 | +## System Requirements |
| 107 | + |
| 108 | +- **Browser**: Chrome, Edge, or other browsers with Web Speech API support |
| 109 | +- **Microphone**: Required for speech input |
| 110 | +- **Internet Connection**: Required for OpenAI API access |
| 111 | + |
| 112 | +## Known Limitations |
| 113 | + |
| 114 | +- Limited vocabulary dataset (~150 signs) |
| 115 | +- Words not in dataset are finger-spelled letter by letter |
| 116 | +- Translation accuracy depends on ChatGPT's ISL gloss generation |
| 117 | +- Requires quiet environment for optimal speech recognition |
| 118 | +- System latency of 3-5 seconds for complete translation process |
| 119 | + |
| 120 | +## Contributing |
| 121 | + |
| 122 | +This project was developed as a capstone project by Ahmad Ataba and Waseem Saleem under the supervision of Dr. Reuven Cohen at Braude College. |
| 123 | + |
| 124 | +## Dataset Attribution |
| 125 | + |
| 126 | +The sign language video dataset is sourced from the open-source "Text-Speech to Sign Language Generator" project by JS-Coderr (2024), available on GitHub. |
| 127 | + |
| 128 | +## License |
| 129 | + |
| 130 | +This project uses open-source components and datasets. Please refer to individual component licenses for specific terms. |
| 131 | + |
| 132 | +## Support |
| 133 | + |
| 134 | +For technical issues or questions about the application, please refer to the project documentation or contact the development team. |
| 135 | + |
| 136 | +--- |
59 | 137 |
|
60 | | -- Frontend: React, Tailwind CSS |
61 | | -- Backend: Python, Flask |
62 | | -- Translation: OpenAI GPT-4 |
| 138 | +**Note**: This application is designed for educational and accessibility purposes. For critical communication needs, professional sign language interpretation is recommended. |
0 commit comments