diff --git a/03_02/03_02 Read from CSV [Begin].ipynb b/03_02/03_02 Read from CSV [Begin].ipynb new file mode 100644 index 0000000..f036f17 --- /dev/null +++ b/03_02/03_02 Read from CSV [Begin].ipynb @@ -0,0 +1,135 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Credit Card Retention Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataset Description" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really like to understand what characteristics lend themselves to someone who is going to churn so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.\n", + "\n", + "- This dataset consists of 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc. There are nearly 18 features.\n", + "\n", + "- 16.07% of customers have churned.\n", + "\n", + "- [Dataset link](https://www.kaggle.com/datasets/whenamancodes/credit-card-customers-prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "***" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "import numpy as np\n", + "import plotly.graph_objs as go\n", + "from plotly.offline import iplot\n", + "sns.set()\n", + "pd.options.display.max_columns = 999" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reading in Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1) `data.shape` \n", + "\n", + "2) `data.head()` \n", + "\n", + "3) `data.columns` " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/03_02/03_02 Read from CSV.ipynb b/03_02/03_02 Read from CSV.ipynb new file mode 100644 index 0000000..cdf3712 --- /dev/null +++ b/03_02/03_02 Read from CSV.ipynb @@ -0,0 +1,427 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Credit Card Retention Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataset Description" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really like to understand what characteristics lend themselves to someone who is going to churn so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.\n", + "\n", + "- This dataset consists of 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc. There are nearly 18 features.\n", + "\n", + "- 16.07% of customers have churned.\n", + "\n", + "- [Dataset link](https://www.kaggle.com/datasets/whenamancodes/credit-card-customers-prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "***" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to read in a csv into python, we will be leveraging the Pandas library. Any package we want to use in Python will need an import statement. In addition to pandas which we will import using `import pandas as pd`, we will also import matplotlib and seaborn (libraries used for visualization) and numpy (a library for array manipulation)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "import numpy as np\n", + "import plotly.graph_objs as go\n", + "from plotly.offline import iplot\n", + "sns.set()\n", + "pd.options.display.max_columns = 999" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reading in Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The method we will be using is `pd.read_csv` which implies we will be reading a comma separated value file. You can see this in the defaults for this method by typing `help(pd.read_csv)` and see that the separator is set to `,` with other helpful defaults like `header='infer'`. You can read through the rest to get familiar with parameters you can pass through that might be specific to what you may need and different from the defaults. \n", + "\n", + "If you type `pd.read` and then press `tab` you will see other methods available to you out of the box to read in files. Examples: `pd.read_excel`, `pd.read_pickle`, `pd.read_json`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "help(pd.read_csv)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# pd.read" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next step to read in a csv file is to know where the relative location is to your Python script. In this case, I've created a folder called `data/` that I will use to store any input data files. To read in the file, I will just pass the file name into the parenthesis and take a look at the output." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "data = pd.read_csv('../data/BankChurners_v2.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next steps I always do after reading in my file is to:\n", + "\n", + "1) `data.shape` to see the size of the dataset. The size will help me decide on how to manage working with the dataset if it happens to be large. Here we see this dataset has **10K+** rows of customer data and **23** columns describing the behavior of those customers.\n", + "\n", + "2) `data.head()` to see the top of the dataset and make any changes like renaming column names. The default will show the top 5 rows, but you can pass through any number you like (10,25, etc)\n", + "\n", + "3) `data.columns` to see what all the column names\n", + "\n", + "Let's do that here." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(10127, 23)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CLIENTNUMAttrition_FlagCustomer_AgeGenderDependent_countEducation_LevelMarital_StatusIncome_CategoryCard_CategoryMonths_on_bookTotal_Relationship_CountMonths_Inactive_12_monContacts_Count_12_monCredit_LimitTotal_Revolving_BalAvg_Open_To_BuyTotal_Amt_Chng_Q4_Q1Total_Trans_AmtTotal_Trans_CtTotal_Ct_Chng_Q4_Q1Avg_Utilization_RatioNaive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
090032Existing Customer45M3High SchoolMarried$60K - $80KBlue3951312691.077711914.01.3351144421.6250.0610.0000930.99991
190033Existing Customer49F5GraduateSingleLess than $40KBlue446128256.08647392.01.5411291333.7140.1050.0000570.99994
290034Existing Customer51M3GraduateMarried$80K - $120KBlue364103418.003418.02.5941887202.3330.0000.0000210.99998
390035Existing Customer40F4High SchoolNaNLess than $40KBlue343413313.02517796.01.4051171202.3330.7600.0001340.99987
490036Existing Customer40M3UneducatedMarried$60K - $80KBlue215104716.004716.02.175816282.5000.0000.0000220.99998
\n", + "
" + ], + "text/plain": [ + " CLIENTNUM Attrition_Flag Customer_Age Gender Dependent_count \\\n", + "0 90032 Existing Customer 45 M 3 \n", + "1 90033 Existing Customer 49 F 5 \n", + "2 90034 Existing Customer 51 M 3 \n", + "3 90035 Existing Customer 40 F 4 \n", + "4 90036 Existing Customer 40 M 3 \n", + "\n", + " Education_Level Marital_Status Income_Category Card_Category \\\n", + "0 High School Married $60K - $80K Blue \n", + "1 Graduate Single Less than $40K Blue \n", + "2 Graduate Married $80K - $120K Blue \n", + "3 High School NaN Less than $40K Blue \n", + "4 Uneducated Married $60K - $80K Blue \n", + "\n", + " Months_on_book Total_Relationship_Count Months_Inactive_12_mon \\\n", + "0 39 5 1 \n", + "1 44 6 1 \n", + "2 36 4 1 \n", + "3 34 3 4 \n", + "4 21 5 1 \n", + "\n", + " Contacts_Count_12_mon Credit_Limit Total_Revolving_Bal Avg_Open_To_Buy \\\n", + "0 3 12691.0 777 11914.0 \n", + "1 2 8256.0 864 7392.0 \n", + "2 0 3418.0 0 3418.0 \n", + "3 1 3313.0 2517 796.0 \n", + "4 0 4716.0 0 4716.0 \n", + "\n", + " Total_Amt_Chng_Q4_Q1 Total_Trans_Amt Total_Trans_Ct Total_Ct_Chng_Q4_Q1 \\\n", + "0 1.335 1144 42 1.625 \n", + "1 1.541 1291 33 3.714 \n", + "2 2.594 1887 20 2.333 \n", + "3 1.405 1171 20 2.333 \n", + "4 2.175 816 28 2.500 \n", + "\n", + " Avg_Utilization_Ratio \\\n", + "0 0.061 \n", + "1 0.105 \n", + "2 0.000 \n", + "3 0.760 \n", + "4 0.000 \n", + "\n", + " Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1 \\\n", + "0 0.000093 \n", + "1 0.000057 \n", + "2 0.000021 \n", + "3 0.000134 \n", + "4 0.000022 \n", + "\n", + " Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2 \n", + "0 0.99991 \n", + "1 0.99994 \n", + "2 0.99998 \n", + "3 0.99987 \n", + "4 0.99998 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.head()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}