Production-style streaming fraud detection system with automated retraining and model registry.
Explore the code »
Report Bug
·
Request Feature
⚡ ~303 tx/sec · ⏱️ ~1.9 ms avg prediction · 🔁 Auto-retraining · ☁️ Deployed on AWS EC2
Kafka → Fraud Detection Service → PostgreSQL
↘ MLflow (Model Registry)
↘ Streamlit Dashboard
An end-to-end, production-style fraud detection system processing credit card transactions in real time.
Transactions are streamed through Kafka, scored by an XGBoost model, stored in PostgreSQL, monitored via dashboards, and automatically retrained once enough labeled data is available.
The entire stack is containerized with Docker and deployed on AWS EC2.
Incoming transactions are scored with a probabilistic fraud model.
Predictions, probabilities, and metadata are persisted for monitoring and retraining.
The model is trained on a highly imbalanced dataset, optimized for precision and recall instead of raw accuracy.
Key principles:
- Streaming inference (Kafka consumer)
- Cost-aware evaluation metrics
- Continuous model improvement via retraining
- Single
docker compose upspins up the full stack - Stateless services, reproducible environment
- Clear separation between inference, retraining, and monitoring
Streaming test results:
- ⚡ Throughput: ~303 transactions/sec
- ⏱️ Avg processing time: 1.92 ms
- 🧾 Transactions processed: 9,692
- 🚨 Fraud detected: 1,529 (15.78%)
Retraining metrics:
- Precision: 1.00
- Recall: 0.95
- F1-score: 0.97
- Average Precision: 0.99
- Model versioning
- Retraining triggered when ≥500 labeled transactions are available
- Class imbalance handled via weighting
- New model registered and promoted to Production in MLflow
- Inference service hot-reloads the model (no downtime)
Live metrics:
- Total transactions
- Fraud rate
- Average fraud probability
- Throughput (tx/sec)
- Inference latency
New.mp4
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud Download it as .csv and put it into data/raw_data/creditcard.csv
- Fraud detection systems
- Real-time ML inference
- MLOps & retraining pipelines
- Streaming analytics platforms