🎫 Cloud-Based Intelligent Support Ticket Classification Platform

🚀 Live Demo • 📖 API Docs

A production-grade AI/ML system that automatically classifies, prioritizes, and analyzes customer support tickets using supervised machine learning. Built for enterprise scalability and deployed via Docker on AWS.

💼 Business Impact

The Problem

Customer support teams face overwhelming ticket volumes, leading to:

⏰ Slow response times from manual triage
🎯 Misrouted tickets causing customer frustration
📉 Inconsistent prioritization missing critical issues
💰 High labor costs for manual classification

The Solution

This platform provides instant, AI-powered ticket classification that:

⚡ Reduces triage time by 90% - from minutes to milliseconds
🎯 Achieves 85%+ accuracy in category prediction
🔄 Enables automatic routing to specialized teams
📊 Provides confidence scores for human-in-the-loop workflows
💵 Cuts operational costs by automating repetitive tasks

ROI Example

For a support team handling 1,000 tickets/day:

Metric	Before	After	Impact
Avg. Triage Time	2 min	0.1 sec	99.9% faster
Mis-routing Rate	25%	5%	80% reduction
Agent Efficiency	50 tickets/day	75 tickets/day	50% increase

✨ Features

Core Capabilities

🏷️ Category Classification: Billing, Technical, Account, Feature Request, General Inquiry
⚡ Priority Prediction: Critical, High, Medium, Low
📊 Confidence Scores: Probability distribution for all classes
🔄 Batch Processing: Classify multiple tickets in one API call
📈 Real-time Metrics: Model performance monitoring

Technical Features

🔌 RESTful API with OpenAPI/Swagger documentation
🐳 Docker containerized for consistent deployments
☁️ AWS-ready with EC2/ECS deployment guides
🧪 Comprehensive testing with pytest
📊 Visualization suite for data analysis

🛠️ Tech Stack

Category	Technology
ML/Data Science	scikit-learn, Pandas, NumPy, NLTK
API Framework	FastAPI, Uvicorn, Pydantic
Database	SQLite (dev), PostgreSQL (prod-ready)
Visualization	Matplotlib, Seaborn
Containerization	Docker, Docker Compose
Cloud	AWS EC2, ECS, ECR
Testing	pytest, httpx

🚀 Quick Start

Prerequisites

Python 3.11+
pip or conda
Docker (optional, for containerized deployment)

Installation

# Clone the repository
git clone https://github.com/yourusername/ticket-classifier.git
cd ticket-classifier

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab'); nltk.download('stopwords'); nltk.download('wordnet')"

Train the Model

# Run the training pipeline
python scripts/train.py

This will:

Generate 500 synthetic tickets
Store them in SQLite database
Preprocess text with NLP pipeline
Train TF-IDF + Logistic Regression model
Generate visualizations
Save the trained model

Start the API

# Development mode with hot reload
uvicorn src.api.main:app --reload --port 8000

# Production mode
uvicorn src.api.main:app --host 0.0.0.0 --port 8000

Test the API

# Health check
curl http://localhost:8000/health

# Predict ticket category
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "Cannot login to my account",
    "description": "I have been trying to login for an hour but keep getting invalid credentials error"
  }'

📁 Project Structure

ticket-classifier/
├── src/
│   ├── api/                    # FastAPI application
│   │   ├── main.py             # API endpoints
│   │   └── schemas.py          # Pydantic models
│   ├── data/                   # Data processing
│   │   ├── generator.py        # Synthetic data generation
│   │   ├── database.py         # SQLite operations
│   │   └── preprocessing.py    # NLP preprocessing
│   ├── models/                 # ML models
│   │   ├── classifier.py       # TF-IDF + LogReg classifier
│   │   └── serialized/         # Saved model files
│   └── visualizations/         # Matplotlib plots
│       └── plots.py
├── data/                       # Dataset storage
├── scripts/                    # Utility scripts
│   └── train.py                # Training pipeline
├── tests/                      # Test suite
├── docs/                       # Documentation
│   └── deployment.md           # AWS deployment guide
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

📖 API Documentation

Interactive Docs

Once running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Endpoints

Method	Endpoint	Description
`GET`	`/health`	Health check
`GET`	`/metrics`	Model performance metrics
`POST`	`/predict`	Classify single ticket
`POST`	`/predict/batch`	Classify multiple tickets
`GET`	`/categories`	List all categories
`GET`	`/priorities`	List all priorities

Example Response

{
  "ticket_text": "Cannot login to my account I have been trying...",
  "category": "Account",
  "priority": "High",
  "confidence_category": 0.847,
  "confidence_priority": 0.623,
  "category_probabilities": {
    "Account": 0.847,
    "Technical": 0.089,
    "Billing": 0.032,
    "General Inquiry": 0.021,
    "Feature Request": 0.011
  },
  "priority_probabilities": {
    "High": 0.623,
    "Medium": 0.241,
    "Critical": 0.098,
    "Low": 0.038
  }
}

🧠 Model Details

Why Supervised Learning?

We use supervised learning because:

Labeled data available: Historical tickets have known categories and priorities
Clear class definitions: 5 categories and 4 priority levels are well-defined
Interpretability: Logistic Regression coefficients show which words influence predictions
Production-ready: Fast inference (<10ms) suitable for real-time API

Algorithm: TF-IDF + Logistic Regression

Component	Purpose	Configuration
TF-IDF Vectorizer	Convert text to numerical features	max_features=5000, ngram_range=(1,2)
Logistic Regression	Multi-class classification	C=1.0, class_weight='balanced'

Preprocessing Pipeline

Lowercase: Normalize case
URL/Email Removal: Strip non-content elements
Tokenization: Split into words (NLTK)
Stopword Removal: Remove common words + domain-specific terms
Lemmatization: Reduce words to base form (WordNet)
Length Filter: Remove tokens < 2 characters

Performance Metrics

Target	Accuracy	Precision	Recall	F1 Score
Category	~85%	~84%	~85%	~84%
Priority	~70%	~68%	~70%	~68%

Note: Priority prediction is harder due to subjective nature of urgency assessment.

☁️ Deployment

Docker Deployment

# Build image
docker build -t ticket-classifier .

# Run container
docker run -d -p 8000:8000 --name ticket-api ticket-classifier

# Or use docker-compose
docker-compose up -d

AWS Deployment

See docs/deployment.md for detailed guides on:

EC2: Traditional VM deployment
ECS Fargate: Serverless container deployment
Security: SSL/TLS, security groups
Monitoring: CloudWatch integration

🧪 Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html

# Run specific test file
pytest tests/test_api.py -v

Test Coverage

test_preprocessing.py: Text cleaning, tokenization, stopwords
test_classifier.py: Model training, prediction, serialization
test_api.py: API endpoints, validation, error handling

📊 Visualizations

After training, check the visualizations/ directory for:

category_distribution.png: Bar chart of ticket categories
priority_distribution.png: Pie chart of priority levels
confusion_matrix_category.png: Category prediction accuracy
confusion_matrix_priority.png: Priority prediction accuracy
metrics_comparison.png: Model performance metrics

🔮 Future Enhancements

BERT/Transformer models for improved accuracy
Auto-response generation based on category
Sentiment analysis integration
Multi-language support
Active learning for model improvement
Real-time model retraining pipeline

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Ayan Chatterjee

LinkedIn: Ayan Chatterjee

Made with ❤️ by @Ayan Chatterjee using Python, FastAPI, and scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
docs		docs
scripts		scripts
src		src
static		static
tests		tests
visualizations		visualizations
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
render.yaml		render.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎫 Cloud-Based Intelligent Support Ticket Classification Platform

A production-grade AI/ML system that automatically classifies, prioritizes, and analyzes customer support tickets using supervised machine learning. Built for enterprise scalability and deployed via Docker on AWS.

📋 Table of Contents

💼 Business Impact

The Problem

The Solution

ROI Example

✨ Features

Core Capabilities

Technical Features

🛠️ Tech Stack

🚀 Quick Start

Prerequisites

Installation

Train the Model

Start the API

Test the API

📁 Project Structure

📖 API Documentation

Interactive Docs

Endpoints

Example Response

🧠 Model Details

Why Supervised Learning?

Algorithm: TF-IDF + Logistic Regression

Preprocessing Pipeline

Performance Metrics

☁️ Deployment

Docker Deployment

AWS Deployment

🧪 Testing

Test Coverage

📊 Visualizations

🔮 Future Enhancements

🤝 Contributing

📄 License

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages