CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Dynamic Risk Assessment System - An ML pipeline for predicting client attrition risk. The system includes automated model monitoring, retraining on drift detection, and a Flask API for serving predictions.

Commands

Setup

pip install -r requirements.txt
# or with pipenv:
pipenv install && pipenv shell

ML Pipeline (run in order)

python ingestion.py      # Ingest and merge CSV data
python training.py       # Train logistic regression model
python scoring.py        # Score model, compute F1
python deployment.py     # Deploy model to production
python diagnostics.py    # Run diagnostics
python reporting.py      # Generate confusion matrix

API Server

python app.py            # Start Flask server on http://127.0.0.1:8000
python apicalls.py       # Test all API endpoints (requires server running)

Full Automation

python fullprocess.py    # Run complete pipeline with drift detection

Architecture

Configuration

All paths are controlled via config.json:

input_folder_path: Source data directory (practicedata or sourcedata)
output_folder_path: Ingested data output (ingesteddata)
test_data_path: Test dataset location (testdata)
output_model_path: Model artifacts (practicemodels or models)
prod_deployment_path: Production deployment (production_deployment)

Switch between practice and production by editing these paths.

Data Flow

ingestion.py - Merges CSVs from input_folder_path → output_folder_path/finaldata.csv
training.py - Trains on finaldata.csv → saves trainedmodel.pkl + encoder.pkl
scoring.py - Evaluates on test data → writes F1 to latestscore.txt
deployment.py - Copies model, encoder, score, and ingestion record to prod_deployment_path
fullprocess.py - Orchestrates pipeline with model drift detection (compares F1 scores)

Preprocessing

common_functions.py:preprocess_data() handles feature engineering:

One-hot encodes corporation column
Separates target column exited
Used by training, scoring, and diagnostics modules

API Endpoints (app.py)

POST /prediction - Get predictions for a dataset (JSON body: {"dataset_path": "filename.csv"})
GET /scoring - Get model F1 score
GET /summarystats - Get dataset statistics (mean, median, std)
GET /diagnostics - Get execution times, missing data %, outdated packages

Model Drift Detection (fullprocess.py)

Compares deployed model F1 against new data F1. Retrains and redeploys only if performance degrades.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Commands

Setup

ML Pipeline (run in order)

API Server

Full Automation

Architecture

Configuration

Data Flow

Preprocessing

API Endpoints (app.py)

Model Drift Detection (fullprocess.py)

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Commands

Setup

ML Pipeline (run in order)

API Server

Full Automation

Architecture

Configuration

Data Flow

Preprocessing

API Endpoints (app.py)

Model Drift Detection (fullprocess.py)