Skip to content

Latest commit

 

History

History
71 lines (56 loc) · 2.73 KB

File metadata and controls

71 lines (56 loc) · 2.73 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Dynamic Risk Assessment System - An ML pipeline for predicting client attrition risk. The system includes automated model monitoring, retraining on drift detection, and a Flask API for serving predictions.

Commands

Setup

pip install -r requirements.txt
# or with pipenv:
pipenv install && pipenv shell

ML Pipeline (run in order)

python ingestion.py      # Ingest and merge CSV data
python training.py       # Train logistic regression model
python scoring.py        # Score model, compute F1
python deployment.py     # Deploy model to production
python diagnostics.py    # Run diagnostics
python reporting.py      # Generate confusion matrix

API Server

python app.py            # Start Flask server on http://127.0.0.1:8000
python apicalls.py       # Test all API endpoints (requires server running)

Full Automation

python fullprocess.py    # Run complete pipeline with drift detection

Architecture

Configuration

All paths are controlled via config.json:

  • input_folder_path: Source data directory (practicedata or sourcedata)
  • output_folder_path: Ingested data output (ingesteddata)
  • test_data_path: Test dataset location (testdata)
  • output_model_path: Model artifacts (practicemodels or models)
  • prod_deployment_path: Production deployment (production_deployment)

Switch between practice and production by editing these paths.

Data Flow

  1. ingestion.py - Merges CSVs from input_folder_pathoutput_folder_path/finaldata.csv
  2. training.py - Trains on finaldata.csv → saves trainedmodel.pkl + encoder.pkl
  3. scoring.py - Evaluates on test data → writes F1 to latestscore.txt
  4. deployment.py - Copies model, encoder, score, and ingestion record to prod_deployment_path
  5. fullprocess.py - Orchestrates pipeline with model drift detection (compares F1 scores)

Preprocessing

common_functions.py:preprocess_data() handles feature engineering:

  • One-hot encodes corporation column
  • Separates target column exited
  • Used by training, scoring, and diagnostics modules

API Endpoints (app.py)

  • POST /prediction - Get predictions for a dataset (JSON body: {"dataset_path": "filename.csv"})
  • GET /scoring - Get model F1 score
  • GET /summarystats - Get dataset statistics (mean, median, std)
  • GET /diagnostics - Get execution times, missing data %, outdated packages

Model Drift Detection (fullprocess.py)

Compares deployed model F1 against new data F1. Retrains and redeploys only if performance degrades.