dataoptimal.github.io/_posts/2020-05-27-InsuranceFraud-R.md at 8bc834875b239e42d6ba2f290d59780c1cb057f5 · thatsenam/dataoptimal.github.io

title

Insurance Fraud in R

date

2020-05-27

header

image	teaser
/images/scope2.jpg	/images/wordcloud2.png

excerpt

Fraud Detection, Statistical Analysis, K-Nearest Neighbor, R

mathjax

true

EDA and Classification Prediction

Summary

This was my first data science project. The focus of the course was statistical analysis using R. The goal of this project was to use R and statistical analysis to identify significant features in fraudulent insurance claim transactions and to design a classification model to predict whether fraud was reported on the insurance claim transaction. K-Nearest Neighbor was the model tested. The paper walks through the steps to the project.

Libraries

Data

claims{:target="_blank"}

Models / Methods / Metrics

K-Nearest Neighbor
Correlation and Partial Correlation
Multicollinearity: Logistic Regression and vif() function
Feature selection: Variable coefficients and odds ratio

Exploratory Data Analysis Preview

The EDA showed that there are distinctions between the fraudulent records and the non-fraudulent records.

Hobbies

The claimant's hobbies show some variation in fraud cases.

Weeks Before Incident

The number of weeks the policy was owned before the claim show some variation in fraud cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EDA and Classification Prediction

Exploratory Data Analysis Preview

The Complete Project: here{:target="_blank"}.

FilesExpand file tree

2020-05-27-InsuranceFraud-R.md

Latest commit

History

2020-05-27-InsuranceFraud-R.md

File metadata and controls

EDA and Classification Prediction

Exploratory Data Analysis Preview

The Complete Project: here{:target="_blank"}.