This project performs a comprehensive exploratory data analysis (EDA) on the Zomato dataset. The goal is to uncover patterns and insights related to restaurant distributions, user ratings, online delivery services, and geographical variations.
This analysis uses two datasets:
zomato.csv: Contains detailed information about restaurants, including location, cuisines, cost, and ratings.Country-Code.xlsx: A mapping file to link country codes with their respective country names.
The primary objective is to clean, merge, and analyze this data to answer key business questions, such as:
- What is the geographical spread of Zomato's listings?
- What do user ratings tell us about customer satisfaction?
- Which countries and cities have the highest concentration of restaurants?
- Where is the online delivery service most prevalent?
- Data Loading: Imported the
zomato.csvandCountry-Code.xlsxfiles using Pandas. - Data Cleaning & Preprocessing:
- Merged the two datasets on
"Country Code"to add full country names to the main dataframe. - Checked for missing values. Found and visualized 9 missing entries in the
'Cuisines'column using a Seaborn heatmap.
- Merged the two datasets on
- Exploratory Data Analysis (EDA):
- Analyzed the geographical distribution of restaurants by country and city.
- Investigated the relationship between
Aggregate rating,Rating color, andRating text. - Examined the distribution and origin of zero ratings.
- Identified which countries offer online delivery.
- Mapped currencies to their respective countries.
-
Top 3 Countries: The vast majority of Zomato's listings are in India (94.39%), followed by the United States (4.73%) and the United Kingdom (0.87%).
-
Top 5 Cities: The top 5 cities with the most restaurant listings are all within the National Capital Region (NCR) of India:
- New Delhi (68.87%)
- Gurgaon (14.07%)
- Noida (13.59%)
- Faridabad (3.16%)
- Ghaziabad (0.31%)
- Rating Distribution: A significant number of restaurants (2,148) have a rating of 0.0 ("Not Rated"). This suggests many new listings or a low rate of user feedback for certain entries.
- Common Ratings: The most frequent actual ratings fall in the "Average" category (Orange color), specifically between 2.5 and 3.4.
- Zero Ratings: The overwhelming majority of "Not Rated" (0.0) entries originate from India (2,139 out of 2,148).
- Online Delivery: The online delivery service is primarily available in India and the UAE. Other countries in this dataset do not appear to have this feature enabled.
- Currency: The analysis successfully maps various currencies (like Botswana Pula, Brazilian Real, Dollar, Pounds, and Indian Rupees) to their corresponding countries.
- Python
- Pandas: For data manipulation and analysis.
- NumPy: For numerical operations.
- Matplotlib: For basic data visualization.
- Seaborn: For advanced statistical visualization.
- Jupyter Notebook: As the development environment.
- Clone the repository:
git clone https://github.com/your-username/your-repository-name.git
- Install the required libraries:
pip install pandas numpy matplotlib seaborn jupyterlab openpyxl
- Launch Jupyter Notebook or Jupyter Lab:
jupyter lab
- Open the
.ipynbfile and run the cells sequentially.
The notebook identifies a next step which can be implemented:
- Analyze the
Cuisinescolumn to find and visualize the Top 10 most popular cuisines across the dataset.