An end-to-end Data Analytics project demonstrating real-world data modeling, ETL, SQL analytics, Python reporting, and Metabase dashboarding on a large sales dataset.
This project simulates how raw transactional data is transformed into business-ready insights using modern analytics tools.
The goal of this project is to build a Sales Analytics System that enables decision-makers to analyze revenue, customer behavior, product performance, and regional trends.
- Dataset Size: ~1,000,000 sales records
- Architecture: Star Schema (Fact & Dimension tables)
- Analytics Layers: SQL, Python, Metabase Dashboard
Raw CSV Data
↓
Staging Table (PostgreSQL)
↓
Dimension Tables
(Customer, Product, Store, Date, Employee)
↓
Fact Table (Sales)
↓
Analytics Layer
(SQL Queries & Python Scripts)
↓
BI Dashboards (Metabase)
- Database: PostgreSQL
- Query Language: SQL
- Programming: Python (Pandas, Matplotlib, SQLAlchemy)
- BI Tool: Metabase
- Version Control: Git & GitHub
sales-data-warehouse/
│
├── data/
│ └── (dataset excluded due to size)
│
├── sql/
│ ├── create_tables.sql
│ ├── load_dimensions.sql
│ ├── load_fact.sql
│ └── analytics_queries.sql
│
├── notebooks/
│ ├── etl_load_sales_data.py
│ └── analytics_visuals.py
│
├── reports/
│ ├── total_revenue.txt
│ ├── monthly_revenue_trend.png
│ ├── top_5_products.png
│ └── revenue_by_region.png
│
└── README.md
- Loaded raw CSV data into a staging table
- Cleaned and standardized data (dates, nulls, duplicates)
- Built dimension tables with deduplication logic
- Loaded fact table using join-based inserts to maintain referential integrity
- Validated data consistency across SQL and Python analytics
- Total Revenue calculation
- Monthly revenue trends
- Revenue by region
- Top products by revenue
- Customer segmentation analysis
- Automated SQL querying using Pandas
- Revenue aggregation and trend analysis
- Generated and saved charts using Matplotlib
- KPI card for Total Revenue
- Interactive Monthly Revenue Trend
- Revenue by Region bar chart
- Top Products performance analysis
- Total Revenue: ₹5.68+ Billion
- Clear seasonal revenue patterns identified
- Regional performance comparison enabled
- High-value products and customer segments identified
- Star schema data modeling
- ETL pipeline design and debugging
- Handling foreign keys and duplicates
- SQL joins and aggregations at scale
- Python-based analytics automation
- BI dashboard creation for business users
- Create PostgreSQL database
- Run SQL scripts in
/sqlfolder
python notebooks/analytics_visuals.pyjava -jar metabase.jar
http://localhost:3000