This repository contains the code for my submission to the Kaggle Playground Series - Season 4, Episode 5 competition. The goal of the competition is to predict the occurrence of floods based on various environmental and geographical features.
Flooding is a significant natural disaster that can cause extensive damage to property and loss of life. The objective of this competition is to develop a model that accurately predicts the likelihood of floods based on historical data. The dataset provided includes various features such as rainfall, river water levels, soil moisture, and more.
Participants are required to build a machine learning model that can predict the target variable (whether a flood occurs) and are evaluated based on the R² score of their predictions on the test set.
The project is divided into two main parts:
- Data Preprocessing
- Model Training and Prediction
The data preprocessing script handles the following tasks:
- Loading the dataset
- Handling missing values
- Scaling/Imputing features
- Splitting the data into training and testing sets
The model training script includes:
- Loading the preprocessed data
- Training a Linear Regression model
- Evaluating the model using the R² score
- Making predictions on the test dataset
- Achieved an R² score of 84.49 on the test dataset using a simple Linear Regression model.
-
Clone this repository:
git clone https://github.com/yourusername/flood-prediction.git cd flood-prediction -
Install the required packages:
pip install -r requirements.txt
-
Run the data preprocessing script:
python data_preprocessing.py
-
Run the model training and prediction script:
python model_training.py
This project demonstrates a basic approach to flood prediction using a Linear Regression model. While the model achieves a good R² score, there is potential for improvement by exploring more complex models and additional feature engineering.
For more details, please refer to the Kaggle competition page.