Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
id: a198f6959c
question: How do I structure a layered data warehouse (raw, clean, analytics) for
a batch pipeline?
sort_order: 15
---

- Raw layer: store ingested data exactly as received
- Clean layer: filter invalid records and enforce basic constraints
- Data quality (dq) layer: validate completeness and consistency (e.g. missing timestamps)
- Analytics layer: build aggregated views
- Mart layer: expose final business metrics

Flow: raw → clean → dq → analytics → mart

Each layer is implemented as SQL transformations in PostgreSQL and BigQuery.

This separation helps with debugging, testing, and ensuring that analytical outputs are built on validated data.