diff --git a/_questions/data-engineering-zoomcamp/module-3/015_a198f6959c_layered-data-warehouse-batch-pipeline.md b/_questions/data-engineering-zoomcamp/module-3/015_a198f6959c_layered-data-warehouse-batch-pipeline.md new file mode 100644 index 0000000..edd5daf --- /dev/null +++ b/_questions/data-engineering-zoomcamp/module-3/015_a198f6959c_layered-data-warehouse-batch-pipeline.md @@ -0,0 +1,18 @@ +--- +id: a198f6959c +question: How do I structure a layered data warehouse (raw, clean, analytics) for + a batch pipeline? +sort_order: 15 +--- + +- Raw layer: store ingested data exactly as received +- Clean layer: filter invalid records and enforce basic constraints +- Data quality (dq) layer: validate completeness and consistency (e.g. missing timestamps) +- Analytics layer: build aggregated views +- Mart layer: expose final business metrics + +Flow: raw → clean → dq → analytics → mart + +Each layer is implemented as SQL transformations in PostgreSQL and BigQuery. + +This separation helps with debugging, testing, and ensuring that analytical outputs are built on validated data. \ No newline at end of file