This project analyzes insurance premium data and builds predictive models to estimate insurance costs based on various factors.
- Clone the repository:
git clone https://github.com/yourusername/Kaggle-Insurance-Premium.git
- Navigate to the project directory:
cd Kaggle-Insurance-Premium - Create and activate a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install the required packages:
pip install -r requirements.txt
- Prepare the dataset:
- Ensure you have the dataset in the correct format and place it in the
datadirectory.
- Ensure you have the dataset in the correct format and place it in the
- Run the data preprocessing script:
python preprocess.py
- Train the model:
python train.py
- Evaluate the model:
python evaluate.py
This project is licensed under the MIT License. See the LICENSE file for more details.
The following large CSV files are tracked using Git LFS. You can download them directly using the links below:
kaggle-insurance-premiums
├── data
│ ├── train.csv # Training dataset with features and target variable
│ ├── test.csv # Test dataset for predictions
│ └── sample_submission.csv # Template for submission format
├── notebooks
│ └── exploratory_data_analysis.ipynb # Jupyter notebook for EDA
├── src
│ ├── data_preprocessing.py # Data preprocessing functions and classes
│ ├── model_training.py # Model training implementation
│ └── model_evaluation.py # Model evaluation functions
├── submissions
│ └── submission.csv # Final submission file with predictions
├── requirements.txt # Project dependencies
├── README.md # Project documentation
└── .gitignore # Files to ignore in Git
- train.csv: Contains the training dataset with features and the continuous target variable, Premium Amount.
- test.csv: Contains the test dataset for which predictions of the Premium Amount need to be made.
- sample_submission.csv: Provides a template for the submission format.
- exploratory_data_analysis.ipynb: This Jupyter notebook is used for exploratory data analysis (EDA) on the datasets. It includes visualizations and insights derived from the training data.
- data_preprocessing.py: Functions and classes for preprocessing the data, including handling missing values, encoding categorical variables, and scaling features.
- model_training.py: Implementation of the model training process, defining the model architecture, compiling the model, and fitting it to the training data.
- model_evaluation.py: Functions for evaluating the model's performance using metrics such as Root Mean Squared Logarithmic Error (RMSLE) and generating predictions on the test set.
The final submission file containing the predicted Premium Amounts for the test dataset is located in the submissions directory.
To install the required dependencies, run:
pip install -r requirements.txt
- Preprocess the data using
data_preprocessing.py. - Train the model using
model_training.py. - Evaluate the model using
model_evaluation.py. - Generate predictions and format them according to the submission requirements.
This project is licensed under the MIT License. See the LICENSE file for details.