A data cleaning and preprocessing project focused on preparing the Netflix dataset for analysis using Python, Pandas, and Jupyter Notebook.
This project demonstrates real-world data cleaning workflows including handling missing values, fixing mixed data types, and transforming raw data into analysis-ready datasets.
Raw datasets are rarely clean.
This project focuses on transforming messy Netflix content data into a structured and usable format suitable for analytics and visualization.
Key objectives:
- Identify and handle missing values
- Fix inconsistent/mixed-type columns
- Convert date columns into datetime format
- Create derived analytical features
- Prepare clean dataset for further analysis
- Python
- Pandas
- Matplotlib
- Jupyter Notebook
Analysing-Netflix-Data-Cleaning/
│
├── data/
│ ├── netflix_titles.csv
│ └── cleaned-data.csv
│
├── notebook/
│ └── netflix_data_cleaning.ipynb
│
└── README.mdflowchart LR
A[Raw Netflix Dataset] --> B[Data Inspection]
B --> C[Handle Missing Values]
C --> D[Fix Mixed-Type Columns]
D --> E[Convert Date Columns]
E --> F[Feature Engineering]
F --> G[Cleaned Dataset Ready]
style A fill:#1f77b4,color:#fff
style B fill:#9467bd,color:#fff
style C fill:#2ca02c,color:#fff
style D fill:#ff7f0e,color:#fff
style E fill:#17becf,color:#fff
style F fill:#e377c2,color:#fff
style G fill:#d62728,color:#fff
- Filled categorical columns using "Unknown" or mode values
- Verified null counts column-wise
- Ensured dataset consistency after imputation
- Cleaned the
durationcolumn by splitting into:- Numeric duration value
- Duration type (Minutes / Seasons)
- Standardized data types for analysis readiness
- Converted
date_addedinto proper datetime format - Extracted new analytical features:
year_addedmonth_addedmonth_name
- Verified column datatypes
- Removed inconsistencies and formatting issues
- Saved a fully cleaned dataset for downstream analysis
A cleaned and structured dataset ready for:
- Exploratory Data Analysis (EDA)
- Data Visualization
- Dashboard Creation
- Business Insights
This project is inspired by the learning project from roadmap.sh:
</> https://roadmap.sh/projects/cleaning-netflix-dataset
Implementation and analysis were completed independently as part of learning real-world data analytics workflows.
- Content trend analysis
- Genre popularity insights
- Dashboard using Power BI / Tableau
- Time-series visualization
This project strengthened understanding of:
- Real-world data preprocessing
- Pandas data transformation
- Analytical thinking
- Structuring analytics projects for GitHub portfolios
If you have feedback or suggestions, feel free to connect or open an issue in this repository!
⭐ If you found this project helpful, consider giving it a star!