Skip to content

Ayan113/Cloud-Based-Intelligent-Support-Ticket-Classification-Platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽซ Cloud-Based Intelligent Support Ticket Classification Platform

Python 3.11+ FastAPI scikit-learn Docker License: MIT

๐Ÿš€ Live Demo โ€ข ๐Ÿ“– API Docs

A production-grade AI/ML system that automatically classifies, prioritizes, and analyzes customer support tickets using supervised machine learning. Built for enterprise scalability and deployed via Docker on AWS.

๐Ÿ“‹ Table of Contents


๐Ÿ’ผ Business Impact

The Problem

Customer support teams face overwhelming ticket volumes, leading to:

  • โฐ Slow response times from manual triage
  • ๐ŸŽฏ Misrouted tickets causing customer frustration
  • ๐Ÿ“‰ Inconsistent prioritization missing critical issues
  • ๐Ÿ’ฐ High labor costs for manual classification

The Solution

This platform provides instant, AI-powered ticket classification that:

  • โšก Reduces triage time by 90% - from minutes to milliseconds
  • ๐ŸŽฏ Achieves 85%+ accuracy in category prediction
  • ๐Ÿ”„ Enables automatic routing to specialized teams
  • ๐Ÿ“Š Provides confidence scores for human-in-the-loop workflows
  • ๐Ÿ’ต Cuts operational costs by automating repetitive tasks

ROI Example

For a support team handling 1,000 tickets/day:

Metric Before After Impact
Avg. Triage Time 2 min 0.1 sec 99.9% faster
Mis-routing Rate 25% 5% 80% reduction
Agent Efficiency 50 tickets/day 75 tickets/day 50% increase

โœจ Features

Core Capabilities

  • ๐Ÿท๏ธ Category Classification: Billing, Technical, Account, Feature Request, General Inquiry
  • โšก Priority Prediction: Critical, High, Medium, Low
  • ๐Ÿ“Š Confidence Scores: Probability distribution for all classes
  • ๐Ÿ”„ Batch Processing: Classify multiple tickets in one API call
  • ๐Ÿ“ˆ Real-time Metrics: Model performance monitoring

Technical Features

  • ๐Ÿ”Œ RESTful API with OpenAPI/Swagger documentation
  • ๐Ÿณ Docker containerized for consistent deployments
  • โ˜๏ธ AWS-ready with EC2/ECS deployment guides
  • ๐Ÿงช Comprehensive testing with pytest
  • ๐Ÿ“Š Visualization suite for data analysis

๐Ÿ› ๏ธ Tech Stack

Category Technology
ML/Data Science scikit-learn, Pandas, NumPy, NLTK
API Framework FastAPI, Uvicorn, Pydantic
Database SQLite (dev), PostgreSQL (prod-ready)
Visualization Matplotlib, Seaborn
Containerization Docker, Docker Compose
Cloud AWS EC2, ECS, ECR
Testing pytest, httpx

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.11+
  • pip or conda
  • Docker (optional, for containerized deployment)

Installation

# Clone the repository
git clone https://github.com/yourusername/ticket-classifier.git
cd ticket-classifier

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab'); nltk.download('stopwords'); nltk.download('wordnet')"

Train the Model

# Run the training pipeline
python scripts/train.py

This will:

  1. Generate 500 synthetic tickets
  2. Store them in SQLite database
  3. Preprocess text with NLP pipeline
  4. Train TF-IDF + Logistic Regression model
  5. Generate visualizations
  6. Save the trained model

Start the API

# Development mode with hot reload
uvicorn src.api.main:app --reload --port 8000

# Production mode
uvicorn src.api.main:app --host 0.0.0.0 --port 8000

Test the API

# Health check
curl http://localhost:8000/health

# Predict ticket category
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "Cannot login to my account",
    "description": "I have been trying to login for an hour but keep getting invalid credentials error"
  }'

๐Ÿ“ Project Structure

ticket-classifier/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ api/                    # FastAPI application
โ”‚   โ”‚   โ”œโ”€โ”€ main.py             # API endpoints
โ”‚   โ”‚   โ””โ”€โ”€ schemas.py          # Pydantic models
โ”‚   โ”œโ”€โ”€ data/                   # Data processing
โ”‚   โ”‚   โ”œโ”€โ”€ generator.py        # Synthetic data generation
โ”‚   โ”‚   โ”œโ”€โ”€ database.py         # SQLite operations
โ”‚   โ”‚   โ””โ”€โ”€ preprocessing.py    # NLP preprocessing
โ”‚   โ”œโ”€โ”€ models/                 # ML models
โ”‚   โ”‚   โ”œโ”€โ”€ classifier.py       # TF-IDF + LogReg classifier
โ”‚   โ”‚   โ””โ”€โ”€ serialized/         # Saved model files
โ”‚   โ””โ”€โ”€ visualizations/         # Matplotlib plots
โ”‚       โ””โ”€โ”€ plots.py
โ”œโ”€โ”€ data/                       # Dataset storage
โ”œโ”€โ”€ scripts/                    # Utility scripts
โ”‚   โ””โ”€โ”€ train.py                # Training pipeline
โ”œโ”€โ”€ tests/                      # Test suite
โ”œโ”€โ”€ docs/                       # Documentation
โ”‚   โ””โ”€โ”€ deployment.md           # AWS deployment guide
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐Ÿ“– API Documentation

Interactive Docs

Once running, visit:

Endpoints

Method Endpoint Description
GET /health Health check
GET /metrics Model performance metrics
POST /predict Classify single ticket
POST /predict/batch Classify multiple tickets
GET /categories List all categories
GET /priorities List all priorities

Example Response

{
  "ticket_text": "Cannot login to my account I have been trying...",
  "category": "Account",
  "priority": "High",
  "confidence_category": 0.847,
  "confidence_priority": 0.623,
  "category_probabilities": {
    "Account": 0.847,
    "Technical": 0.089,
    "Billing": 0.032,
    "General Inquiry": 0.021,
    "Feature Request": 0.011
  },
  "priority_probabilities": {
    "High": 0.623,
    "Medium": 0.241,
    "Critical": 0.098,
    "Low": 0.038
  }
}

๐Ÿง  Model Details

Why Supervised Learning?

We use supervised learning because:

  1. Labeled data available: Historical tickets have known categories and priorities
  2. Clear class definitions: 5 categories and 4 priority levels are well-defined
  3. Interpretability: Logistic Regression coefficients show which words influence predictions
  4. Production-ready: Fast inference (<10ms) suitable for real-time API

Algorithm: TF-IDF + Logistic Regression

Component Purpose Configuration
TF-IDF Vectorizer Convert text to numerical features max_features=5000, ngram_range=(1,2)
Logistic Regression Multi-class classification C=1.0, class_weight='balanced'

Preprocessing Pipeline

  1. Lowercase: Normalize case
  2. URL/Email Removal: Strip non-content elements
  3. Tokenization: Split into words (NLTK)
  4. Stopword Removal: Remove common words + domain-specific terms
  5. Lemmatization: Reduce words to base form (WordNet)
  6. Length Filter: Remove tokens < 2 characters

Performance Metrics

Target Accuracy Precision Recall F1 Score
Category ~85% ~84% ~85% ~84%
Priority ~70% ~68% ~70% ~68%

Note: Priority prediction is harder due to subjective nature of urgency assessment.


โ˜๏ธ Deployment

Docker Deployment

# Build image
docker build -t ticket-classifier .

# Run container
docker run -d -p 8000:8000 --name ticket-api ticket-classifier

# Or use docker-compose
docker-compose up -d

AWS Deployment

See docs/deployment.md for detailed guides on:

  • EC2: Traditional VM deployment
  • ECS Fargate: Serverless container deployment
  • Security: SSL/TLS, security groups
  • Monitoring: CloudWatch integration

๐Ÿงช Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html

# Run specific test file
pytest tests/test_api.py -v

Test Coverage

  • test_preprocessing.py: Text cleaning, tokenization, stopwords
  • test_classifier.py: Model training, prediction, serialization
  • test_api.py: API endpoints, validation, error handling

๐Ÿ“Š Visualizations

After training, check the visualizations/ directory for:

  • category_distribution.png: Bar chart of ticket categories
  • priority_distribution.png: Pie chart of priority levels
  • confusion_matrix_category.png: Category prediction accuracy
  • confusion_matrix_priority.png: Priority prediction accuracy
  • metrics_comparison.png: Model performance metrics

๐Ÿ”ฎ Future Enhancements

  • BERT/Transformer models for improved accuracy
  • Auto-response generation based on category
  • Sentiment analysis integration
  • Multi-language support
  • Active learning for model improvement
  • Real-time model retraining pipeline

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ‘ค Author

Ayan Chatterjee


Made with โค๏ธ by @Ayan Chatterjee using Python, FastAPI, and scikit-learn

About

๐ŸŽซ AI-powered customer support ticket classification using TF-IDF + Logistic Regression. Features interactive web UI, FastAPI REST API, Docker deployment, and AWS-ready infrastructure.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors