This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Dynamic Risk Assessment System - An ML pipeline for predicting client attrition risk. The system includes automated model monitoring, retraining on drift detection, and a Flask API for serving predictions.
pip install -r requirements.txt
# or with pipenv:
pipenv install && pipenv shellpython ingestion.py # Ingest and merge CSV data
python training.py # Train logistic regression model
python scoring.py # Score model, compute F1
python deployment.py # Deploy model to production
python diagnostics.py # Run diagnostics
python reporting.py # Generate confusion matrixpython app.py # Start Flask server on http://127.0.0.1:8000
python apicalls.py # Test all API endpoints (requires server running)python fullprocess.py # Run complete pipeline with drift detectionAll paths are controlled via config.json:
input_folder_path: Source data directory (practicedataorsourcedata)output_folder_path: Ingested data output (ingesteddata)test_data_path: Test dataset location (testdata)output_model_path: Model artifacts (practicemodelsormodels)prod_deployment_path: Production deployment (production_deployment)
Switch between practice and production by editing these paths.
- ingestion.py - Merges CSVs from
input_folder_path→output_folder_path/finaldata.csv - training.py - Trains on
finaldata.csv→ savestrainedmodel.pkl+encoder.pkl - scoring.py - Evaluates on test data → writes F1 to
latestscore.txt - deployment.py - Copies model, encoder, score, and ingestion record to
prod_deployment_path - fullprocess.py - Orchestrates pipeline with model drift detection (compares F1 scores)
common_functions.py:preprocess_data() handles feature engineering:
- One-hot encodes
corporationcolumn - Separates target column
exited - Used by training, scoring, and diagnostics modules
POST /prediction- Get predictions for a dataset (JSON body:{"dataset_path": "filename.csv"})GET /scoring- Get model F1 scoreGET /summarystats- Get dataset statistics (mean, median, std)GET /diagnostics- Get execution times, missing data %, outdated packages
Compares deployed model F1 against new data F1. Retrains and redeploys only if performance degrades.