Skip to content

Mielone2Good/real-time-fraud-detection-mlops

Repository files navigation

Contributors Forks Stargazers Issues License LinkedIn


AWS Logo

Real-Time Fraud Detection System (E2E, MLOps)

Production-style streaming fraud detection system with automated retraining and model registry.
Explore the code »

Report Bug · Request Feature


⚡ ~303 tx/sec · ⏱️ ~1.9 ms avg prediction · 🔁 Auto-retraining · ☁️ Deployed on AWS EC2

System Overview (At a Glance)

overview

Kafka → Fraud Detection Service → PostgreSQL
↘ MLflow (Model Registry)
↘ Streamlit Dashboard

📌 About The Project

An end-to-end, production-style fraud detection system processing credit card transactions in real time.

Transactions are streamed through Kafka, scored by an XGBoost model, stored in PostgreSQL, monitored via dashboards, and automatically retrained once enough labeled data is available.
The entire stack is containerized with Docker and deployed on AWS EC2.

🧠 Fraud Detection Logic (High-Level)

Incoming transactions are scored with a probabilistic fraud model.
Predictions, probabilities, and metadata are persisted for monitoring and retraining.

The model is trained on a highly imbalanced dataset, optimized for precision and recall instead of raw accuracy.

Key principles:

  • Streaming inference (Kafka consumer)
  • Cost-aware evaluation metrics
  • Continuous model improvement via retraining

🐳 Docker Deployment

overview2
  • Single docker compose up spins up the full stack
  • Stateless services, reproducible environment
  • Clear separation between inference, retraining, and monitoring

🚀 Performance

Streaming test results:

  • Throughput: ~303 transactions/sec
  • ⏱️ Avg processing time: 1.92 ms
  • 🧾 Transactions processed: 9,692
  • 🚨 Fraud detected: 1,529 (15.78%)

Retraining metrics:

  • Precision: 1.00
  • Recall: 0.95
  • F1-score: 0.97
  • Average Precision: 0.99

🧪 MLflow with Automated Retraining

mlflow
  • Model versioning
  • Retraining triggered when ≥500 labeled transactions are available
  • Class imbalance handled via weighting
  • New model registered and promoted to Production in MLflow
  • Inference service hot-reloads the model (no downtime)

📊 Monitoring Dashboard

dashboard

Live metrics:

  • Total transactions
  • Fraud rate
  • Average fraud probability
  • Throughput (tx/sec)
  • Inference latency

🎥 Demo

New.mp4

📊 Dataset Used

https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud Download it as .csv and put it into data/raw_data/creditcard.csv

🎯 Use Cases

  • Fraud detection systems
  • Real-time ML inference
  • MLOps & retraining pipelines
  • Streaming analytics platforms

About

End-to-end streaming fraud detection with Kafka, XGBoost, MLflow model registry, and automated retraining.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages