The project uses a multi-layer neural network to classify emails as spam or legitimate. The classifier has 98.6% accuracy, and a 97% F1 score.
- Text preprocessing with NLTK for cleaning and normalization
- TF-IDF vectorization for text representation
- Multi-layer neural network using PyTorch
- REST API for email classification
The model uses a multi-layer neural network with:
- Input layer matching feature dimensions
- Three hidden layers with ReLU activation and Batch Normalization
- Dropout regularization to prevent overfitting
- Output layer with sigmoid activation for binary classification
The model's performance can be seen through the following visualizations:
The confusion matrix shows:
- True Negatives (top left): Correctly identified non-spam emails
- False Positives (top right): Non-spam emails incorrectly flagged as spam
- False Negatives (bottom left): Spam emails that were missed
- True Positives (bottom right): Correctly identified spam emails
This histogram shows how the model distributes probability scores for spam and non-spam emails.
To run the project's REST API use the following command:
python app/api.pyEmails that need to be classified can be routed in a POST request to the /predict endpoint with the email provided in the request body.

