Course
data-engineering-zoomcamp
Question
How do I structure a layered data warehouse (raw, clean, analytics) for a batch pipeline?
Answer
You can structure your warehouse into separate layers to isolate responsibilities and improve data reliability.
A common approach is:
- Raw layer: store ingested data exactly as received
- Clean layer: filter invalid records and enforce basic constraints
- Data quality layer: validate completeness and consistency (e.g. missing timestamps)
- Analytics layer: build aggregated views
- Mart layer: expose final business metrics
This separation helps with debugging, testing, and ensuring that analytical outputs are built on validated data.
In this project, the flow is:
raw → clean → dq → analytics → mart
Each layer is implemented as SQL transformations in PostgreSQL and BigQuery.
Checklist