Skip to content

andiachmad/medallion-warehouse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personal Data Warehouse & Analytics

This repository contains my personal data warehouse project, where raw e-commerce data is cleaned, transformed, and structured into fact tables and data marts. The goal is to create datasets that can be used both for BI dashboards and machine learning models, providing insights into sales performance, top products, and seller activities across cities and categories.

So far, the project has reached the gold layer stage, and a customer/order-centric data mart (mart_sales_performance) has been created. It aggregates:

  • Total sales per seller, city, and product category
  • Average order value and number of items
  • Weekly trends for better time-based analysis

The Medallion Layer architecture has been applied in this project to organize data into multiple stages:

  1. Bronze Layer → raw ingested data, minimal transformations
  2. Silver Layer → cleaned and harmonized data ready for analytics
  3. Gold Layer → aggregated, high-quality data ready for data marts, BI dashboards, and ML datasets

Tools & Data Used

  1. Dataset Source: Kaggle (extracted to local CSV files)
  2. Initial Data Profiling & Insights: Jupyter Notebook
  3. Temporary Database / Local Engine: DuckDB
  4. Data Transformation & Quality Checks: dbt
  5. Querying & Data Integrity Checks: DBeaver

Architecture

Here's the data warehouse architecture I've designed: gambar

Star Schema

The star schema that consolidates all fact and dimension tables: DW-Personal-Project drawio

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors