To enable the analytics team to continue finding insights in what books their customers are ordering, we will build an ETL pipeline that extracts their data from Postgres, stages them in BigQuery, and transforms data into a set of dimensional tables. The orders data was modeled using a star schema, with order_lines as a central fact table and books, authors, customers and time as dimension tables optimized for queries on order analysis. The resulting database supports optimized fast queries about customers, the books they buy, and the time periods when they visit the ecommerce site.
Please see a description of the sample bookstore dataset we are using here. The webpage gives you descriptions of the tables and the database diagram. This dataset mimics the normalized data models you will encounter in production.
You will transform the data into a star schema with one fact table and several dimensions enriching the dataset. You can read more about fact tables here
- order_lines - records in database associated with individual book orders
order_id, order_date, customer_id, price, book_id, author_id
- customers - customers of the website
user_id, first_name, last_name, email, country - books - books sold
book_id, title, author_id, year, publisher, num_pages - authors - authors of the books
author_id, name - time - timestamps of records in orders table broken down into specific units
order_date, hour, day, week, month, year, weekday