[FAQ] How to structure a layered data warehouse (raw, clean, analytics) in a batch pipeline?

### Course

data-engineering-zoomcamp

### Question

How do I structure a layered data warehouse (raw, clean, analytics) for a batch pipeline?

### Answer

You can structure your warehouse into separate layers to isolate responsibilities and improve data reliability.

A common approach is:

- Raw layer: store ingested data exactly as received
- Clean layer: filter invalid records and enforce basic constraints
- Data quality layer: validate completeness and consistency (e.g. missing timestamps)
- Analytics layer: build aggregated views
- Mart layer: expose final business metrics

This separation helps with debugging, testing, and ensuring that analytical outputs are built on validated data.

In this project, the flow is:
raw → clean → dq → analytics → mart

Each layer is implemented as SQL transformations in PostgreSQL and BigQuery.

### Checklist

- [x] I have searched existing FAQs and this question is not already answered
- [x] The answer provides accurate, helpful information
- [ ] I have included any relevant code examples or links

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FAQ] How to structure a layered data warehouse (raw, clean, analytics) in a batch pipeline? #252

Course

Question

Answer

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FAQ] How to structure a layered data warehouse (raw, clean, analytics) in a batch pipeline? #252

Description

Course

Question

Answer

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions