From b182c598ae42bb75cec32e5dd5b93d1341488286 Mon Sep 17 00:00:00 2001 From: Sarah Nabelsi Date: Mon, 23 Jan 2023 10:05:57 -0800 Subject: [PATCH 1/4] adding 03_02 files --- 03_02/03_02 Read from CSV.ipynb | 434 ++++++++++++++++++++++++++++++++ 1 file changed, 434 insertions(+) create mode 100644 03_02/03_02 Read from CSV.ipynb diff --git a/03_02/03_02 Read from CSV.ipynb b/03_02/03_02 Read from CSV.ipynb new file mode 100644 index 0000000..704c6e6 --- /dev/null +++ b/03_02/03_02 Read from CSV.ipynb @@ -0,0 +1,434 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Credit Card Retention Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataset Description" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really like to understand what characteristics lend themselves to someone who is going to churn so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.\n", + "\n", + "- This dataset consists of 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc. There are nearly 18 features.\n", + "\n", + "- 16.07% of customers have churned.\n", + "\n", + "- [Dataset link](https://www.kaggle.com/datasets/whenamancodes/credit-card-customers-prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "***" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to read in a csv into python, we will be leveraging the Pandas library. Any package we want to use in Python will need an import statement. In addition to pandas which we will import using `import pandas as pd`, we will also import matplotlib and seaborn (libraries used for visualization) and numpy (a library for array manipulation)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "import numpy as np\n", + "import plotly.graph_objs as go\n", + "from plotly.offline import iplot\n", + "sns.set()\n", + "pd.options.display.max_columns = 999" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reading in Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The method we will be using is `pd.read_csv` which implies we will be reading a comma separated value file. You can see this in the defaults for this method by typing `help(pd.read_csv)` and see that the separator is set to `,` with other helpful defaults like `header='infer'`. You can read through the rest to get familiar with parameters you can pass through that might be specific to what you may need and different from the defaults. \n", + "\n", + "If you type `pd.read` and then press `tab` you will see other methods available to you out of the box to read in files. Examples: `pd.read_excel`, `pd.read_pickle`, `pd.read_json`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "help(pd.read_csv)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# pd.read" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next step to read in a csv file is to know where the relative location is to your Python script. In this case, I've created a folder called `data/` that I will use to store any input data files. To read in the file, I will just pass the file name into the parenthesis and take a look at the output." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "data = pd.read_csv('../data/BankChurners_v2.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next steps I always do after reading in my file is to:\n", + "\n", + "1) `data.shape` to see the size of the dataset. The size will help me decide on how to manage working with the dataset if it happens to be large. Here we see this dataset has **10K+** rows of customer data and **23** columns describing the behavior of those customers.\n", + "\n", + "2) `data.head()` to see the top of the dataset and make any changes like renaming column names. The default will show the top 5 rows, but you can pass through any number you like (10,25, etc)\n", + "\n", + "3) `data.columns` to see what all the column names\n", + "\n", + "Let's do that here." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(10127, 23)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CLIENTNUMAttrition_FlagCustomer_AgeGenderDependent_countEducation_LevelMarital_StatusIncome_CategoryCard_CategoryMonths_on_bookTotal_Relationship_CountMonths_Inactive_12_monContacts_Count_12_monCredit_LimitTotal_Revolving_BalAvg_Open_To_BuyTotal_Amt_Chng_Q4_Q1Total_Trans_AmtTotal_Trans_CtTotal_Ct_Chng_Q4_Q1Avg_Utilization_RatioNaive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
090032Existing Customer45M3High SchoolMarried$60K - $80KBlue3951312691.077711914.01.3351144421.6250.0610.0000930.99991
190033Existing Customer49F5GraduateSingleLess than $40KBlue446128256.08647392.01.5411291333.7140.1050.0000570.99994
290034Existing Customer51M3GraduateMarried$80K - $120KBlue364103418.003418.02.5941887202.3330.0000.0000210.99998
390035Existing Customer40F4High SchoolNaNLess than $40KBlue343413313.02517796.01.4051171202.3330.7600.0001340.99987
490036Existing Customer40M3UneducatedMarried$60K - $80KBlue215104716.004716.02.175816282.5000.0000.0000220.99998
\n", + "
" + ], + "text/plain": [ + " CLIENTNUM Attrition_Flag Customer_Age Gender Dependent_count \\\n", + "0 90032 Existing Customer 45 M 3 \n", + "1 90033 Existing Customer 49 F 5 \n", + "2 90034 Existing Customer 51 M 3 \n", + "3 90035 Existing Customer 40 F 4 \n", + "4 90036 Existing Customer 40 M 3 \n", + "\n", + " Education_Level Marital_Status Income_Category Card_Category \\\n", + "0 High School Married $60K - $80K Blue \n", + "1 Graduate Single Less than $40K Blue \n", + "2 Graduate Married $80K - $120K Blue \n", + "3 High School NaN Less than $40K Blue \n", + "4 Uneducated Married $60K - $80K Blue \n", + "\n", + " Months_on_book Total_Relationship_Count Months_Inactive_12_mon \\\n", + "0 39 5 1 \n", + "1 44 6 1 \n", + "2 36 4 1 \n", + "3 34 3 4 \n", + "4 21 5 1 \n", + "\n", + " Contacts_Count_12_mon Credit_Limit Total_Revolving_Bal Avg_Open_To_Buy \\\n", + "0 3 12691.0 777 11914.0 \n", + "1 2 8256.0 864 7392.0 \n", + "2 0 3418.0 0 3418.0 \n", + "3 1 3313.0 2517 796.0 \n", + "4 0 4716.0 0 4716.0 \n", + "\n", + " Total_Amt_Chng_Q4_Q1 Total_Trans_Amt Total_Trans_Ct Total_Ct_Chng_Q4_Q1 \\\n", + "0 1.335 1144 42 1.625 \n", + "1 1.541 1291 33 3.714 \n", + "2 2.594 1887 20 2.333 \n", + "3 1.405 1171 20 2.333 \n", + "4 2.175 816 28 2.500 \n", + "\n", + " Avg_Utilization_Ratio \\\n", + "0 0.061 \n", + "1 0.105 \n", + "2 0.000 \n", + "3 0.760 \n", + "4 0.000 \n", + "\n", + " Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1 \\\n", + "0 0.000093 \n", + "1 0.000057 \n", + "2 0.000021 \n", + "3 0.000134 \n", + "4 0.000022 \n", + "\n", + " Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2 \n", + "0 0.99991 \n", + "1 0.99994 \n", + "2 0.99998 \n", + "3 0.99987 \n", + "4 0.99998 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 5e41a933c0678188519d4ebdd16fd76675735a7b Mon Sep 17 00:00:00 2001 From: Sarah Nabelsi Date: Mon, 23 Jan 2023 10:49:08 -0800 Subject: [PATCH 2/4] adding beginning file' --- 03_02/03_02 Read from CSV [Begin].ipynb | 162 ++++++++++++++++++++++++ 03_02/03_02 Read from CSV.ipynb | 7 - 2 files changed, 162 insertions(+), 7 deletions(-) create mode 100644 03_02/03_02 Read from CSV [Begin].ipynb diff --git a/03_02/03_02 Read from CSV [Begin].ipynb b/03_02/03_02 Read from CSV [Begin].ipynb new file mode 100644 index 0000000..24312ad --- /dev/null +++ b/03_02/03_02 Read from CSV [Begin].ipynb @@ -0,0 +1,162 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Credit Card Retention Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataset Description" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really like to understand what characteristics lend themselves to someone who is going to churn so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.\n", + "\n", + "- This dataset consists of 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc. There are nearly 18 features.\n", + "\n", + "- 16.07% of customers have churned.\n", + "\n", + "- [Dataset link](https://www.kaggle.com/datasets/whenamancodes/credit-card-customers-prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "***" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to read in a csv into python, we will be leveraging the Pandas library. Any package we want to use in Python will need an import statement. In addition to pandas which we will import using `import pandas as pd`, we will also import matplotlib and seaborn (libraries used for visualization) and numpy (a library for array manipulation)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "import numpy as np\n", + "import plotly.graph_objs as go\n", + "from plotly.offline import iplot\n", + "sns.set()\n", + "pd.options.display.max_columns = 999" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reading in Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The method we will be using is `pd.read_csv` which implies we will be reading a comma separated value file. You can see this in the defaults for this method by typing `help(pd.read_csv)` and see that the separator is set to `,` with other helpful defaults like `header='infer'`. You can read through the rest to get familiar with parameters you can pass through that might be specific to what you may need and different from the defaults. \n", + "\n", + "If you type `pd.read` and then press `tab` you will see other methods available to you out of the box to read in files. Examples: `pd.read_excel`, `pd.read_pickle`, `pd.read_json`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next step to read in a csv file is to know where the relative location is to your Python script. In this case, I've created a folder called `data/` that I will use to store any input data files. To read in the file, I will just pass the file name into the parenthesis and take a look at the output." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next steps I always do after reading in my file is to:\n", + "\n", + "1) `data.shape` to see the size of the dataset. The size will help me decide on how to manage working with the dataset if it happens to be large. Here we see this dataset has **10K+** rows of customer data and **23** columns describing the behavior of those customers.\n", + "\n", + "2) `data.head()` to see the top of the dataset and make any changes like renaming column names. The default will show the top 5 rows, but you can pass through any number you like (10,25, etc)\n", + "\n", + "3) `data.columns` to see what all the column names\n", + "\n", + "Let's do that here." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/03_02/03_02 Read from CSV.ipynb b/03_02/03_02 Read from CSV.ipynb index 704c6e6..cdf3712 100644 --- a/03_02/03_02 Read from CSV.ipynb +++ b/03_02/03_02 Read from CSV.ipynb @@ -401,13 +401,6 @@ "source": [ "data.head()" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { From a2de0404cf2abd8c39d26a3320747780e973eb35 Mon Sep 17 00:00:00 2001 From: Sarah Nabelsi Date: Tue, 24 Jan 2023 05:39:34 -0800 Subject: [PATCH 3/4] modify final --- 03_02/03_02 Read from CSV [Begin].ipynb | 33 +++---------------------- 1 file changed, 3 insertions(+), 30 deletions(-) diff --git a/03_02/03_02 Read from CSV [Begin].ipynb b/03_02/03_02 Read from CSV [Begin].ipynb index 24312ad..f036f17 100644 --- a/03_02/03_02 Read from CSV [Begin].ipynb +++ b/03_02/03_02 Read from CSV [Begin].ipynb @@ -34,13 +34,6 @@ "***" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In order to read in a csv into python, we will be leveraging the Pandas library. Any package we want to use in Python will need an import statement. In addition to pandas which we will import using `import pandas as pd`, we will also import matplotlib and seaborn (libraries used for visualization) and numpy (a library for array manipulation)." - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -71,15 +64,6 @@ "## Reading in Dataset" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The method we will be using is `pd.read_csv` which implies we will be reading a comma separated value file. You can see this in the defaults for this method by typing `help(pd.read_csv)` and see that the separator is set to `,` with other helpful defaults like `header='infer'`. You can read through the rest to get familiar with parameters you can pass through that might be specific to what you may need and different from the defaults. \n", - "\n", - "If you type `pd.read` and then press `tab` you will see other methods available to you out of the box to read in files. Examples: `pd.read_excel`, `pd.read_pickle`, `pd.read_json`" - ] - }, { "cell_type": "code", "execution_count": null, @@ -94,13 +78,6 @@ "outputs": [], "source": [] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The next step to read in a csv file is to know where the relative location is to your Python script. In this case, I've created a folder called `data/` that I will use to store any input data files. To read in the file, I will just pass the file name into the parenthesis and take a look at the output." - ] - }, { "cell_type": "code", "execution_count": null, @@ -112,15 +89,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The next steps I always do after reading in my file is to:\n", - "\n", - "1) `data.shape` to see the size of the dataset. The size will help me decide on how to manage working with the dataset if it happens to be large. Here we see this dataset has **10K+** rows of customer data and **23** columns describing the behavior of those customers.\n", - "\n", - "2) `data.head()` to see the top of the dataset and make any changes like renaming column names. The default will show the top 5 rows, but you can pass through any number you like (10,25, etc)\n", + "1) `data.shape` \n", "\n", - "3) `data.columns` to see what all the column names\n", + "2) `data.head()` \n", "\n", - "Let's do that here." + "3) `data.columns` " ] }, { From 3cf3dca3a9cc477c4698e2957ce45ef0bc3f59c4 Mon Sep 17 00:00:00 2001 From: MAhsan89 <152721796+MAhsan89@users.noreply.github.com> Date: Fri, 30 Aug 2024 02:54:03 +0500 Subject: [PATCH 4/4] Update 03_02 Read from CSV.ipynb