X Education sells online courses to industry professionals. X Education gets a lot of leads, its lead conversion rate is very poor. For e.g., they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone. Build a logistic regression model to assign a lead score between 0 and 100 to each of the leads which can be used by the company to target potential leads. A higher score would mean that the lead is hot, i.e. is most likely to convert whereas a lower score would mean that the lead is cold and will mostly not get converted.There are some more problems presented by the company which your model should be able to adjust to if the company's requirement changes in the future so you will need to handle these as well. These problems are provided in a separate doc file. Please fill it based on the logistic regression model you got in the first step. Also, make sure you include this in your final PPT where you'll make recommendations.
- Reading Data
- Cleaning Data
- EDA
- Converting yes/no categorical value to 0/1, also creating Dummy
- Splitting data into train and test set
- Building Model
- Model Evaluation
- ROC Curve
- Precision- Recall
- Making Predictions
- Prediction on test set
- Lead Score Lead+Scoring+Case+Study.ipynb : The python file showing coding and data analysis
- Assignment Subjective Questions.pdf : Some subjective questions answered
- LEAD SCORE CASE STUDY.pdf : Final Presentation
- Summary.pdf : Summary on what's done in the entire ipynb file