This project implements a Multiple Linear Regression model to automate salary estimations for an HR department. The model predicts a candidate's salary based on three key factors: experience, written test score, and personal interview score.
The goal is to provide a data-driven approach to hiring by analyzing historical recruitment statistics. This helps ensure fair and consistent salary offerings for future candidates.
- experience: Professional background (handles both text and numeric formats).
- test_score: Score out of 10 from a technical written test.
- interview_score: Score out of 10 from the personal interview.
- salary($): The target variable the model aims to predict.
- Language: Python
- Libraries: *
Pandas: For data manipulation and cleaning. Scikit-Learn: For building and training the Linear Regression model.Word2Number: To convert text-based experience (e.g., "five") into integers.
The project follows a standard machine learning pipeline:
- Data Cleaning: Handling missing values. Missing experience is filled with "zero," and missing test scores are filled with the column median.
- Data Transformation: Converting word-based numbers into numerical data.
- Model Training: Fitting a Linear Regression model to the historical data.
- Prediction: Generating salary estimates for new candidate profiles.
The model was tested with two sample candidates:
| Candidate | Experience | Test Score | Interview Score | Predicted Salary |
|---|---|---|---|---|
| New Hire 1 | 2 Years | 9 / 10 | 6 / 10 | $53,205.97 |
| New Hire 2 | 12 Years | 10 / 10 | 10 / 10 | $92,002.18 |
- Clone this repository.
- Ensure you have
hiring (1).csvin the same directory. - Install dependencies:
pip install pandas scikit-learn word2number
- Run the Jupyter Notebook or Python script to see the results.