Skip to content

imanchalsingh/analysing-netflix-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysing Netflix Data Cleaning

Python Pandas Matplotlib Jupyter

A data cleaning and preprocessing project focused on preparing the Netflix dataset for analysis using Python, Pandas, and Jupyter Notebook.

This project demonstrates real-world data cleaning workflows including handling missing values, fixing mixed data types, and transforming raw data into analysis-ready datasets.


Project Overview

Raw datasets are rarely clean.
This project focuses on transforming messy Netflix content data into a structured and usable format suitable for analytics and visualization.

Key objectives:

  • Identify and handle missing values
  • Fix inconsistent/mixed-type columns
  • Convert date columns into datetime format
  • Create derived analytical features
  • Prepare clean dataset for further analysis

Tech Stack

  • Python
  • Pandas
  • Matplotlib
  • Jupyter Notebook

Project Structure

Analysing-Netflix-Data-Cleaning/
│
├── data/
│ ├── netflix_titles.csv
│ └── cleaned-data.csv
│
├── notebook/
│ └── netflix_data_cleaning.ipynb
│
└── README.md

Data Cleaning Workflow

flowchart LR
    A[Raw Netflix Dataset] --> B[Data Inspection]
    B --> C[Handle Missing Values]
    C --> D[Fix Mixed-Type Columns]
    D --> E[Convert Date Columns]
    E --> F[Feature Engineering]
    F --> G[Cleaned Dataset Ready]

    style A fill:#1f77b4,color:#fff
    style B fill:#9467bd,color:#fff
    style C fill:#2ca02c,color:#fff
    style D fill:#ff7f0e,color:#fff
    style E fill:#17becf,color:#fff
    style F fill:#e377c2,color:#fff
    style G fill:#d62728,color:#fff
Loading

Cleaning Steps Performed

1️Missing Values Handling

  • Filled categorical columns using "Unknown" or mode values
  • Verified null counts column-wise
  • Ensured dataset consistency after imputation

Mixed-Type Column Fix

  • Cleaned the duration column by splitting into:
    • Numeric duration value
    • Duration type (Minutes / Seasons)
  • Standardized data types for analysis readiness

Datetime Conversion

  • Converted date_added into proper datetime format
  • Extracted new analytical features:
    • year_added
    • month_added
    • month_name

Data Validation

  • Verified column datatypes
  • Removed inconsistencies and formatting issues
  • Saved a fully cleaned dataset for downstream analysis

Output

A cleaned and structured dataset ready for:

  • Exploratory Data Analysis (EDA)
  • Data Visualization
  • Dashboard Creation
  • Business Insights

Project Source

This project is inspired by the learning project from roadmap.sh:

</> https://roadmap.sh/projects/cleaning-netflix-dataset

Implementation and analysis were completed independently as part of learning real-world data analytics workflows.


Future Improvements

  • Content trend analysis
  • Genre popularity insights
  • Dashboard using Power BI / Tableau
  • Time-series visualization

Learning Outcome

This project strengthened understanding of:

  • Real-world data preprocessing
  • Pandas data transformation
  • Analytical thinking
  • Structuring analytics projects for GitHub portfolios

Connect

If you have feedback or suggestions, feel free to connect or open an issue in this repository!

⭐ If you found this project helpful, consider giving it a star!

About

Data cleaning and preprocessing of Netflix dataset using Python, Pandas, and Jupyter Notebook. Demonstrates real-world data preparation workflows for analytics projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors