Skip to content

[FAQ] How to structure a layered data warehouse (raw, clean, analytics) in a batch pipeline? #252

@AsherJD-io

Description

@AsherJD-io

Course

data-engineering-zoomcamp

Question

How do I structure a layered data warehouse (raw, clean, analytics) for a batch pipeline?

Answer

You can structure your warehouse into separate layers to isolate responsibilities and improve data reliability.

A common approach is:

  • Raw layer: store ingested data exactly as received
  • Clean layer: filter invalid records and enforce basic constraints
  • Data quality layer: validate completeness and consistency (e.g. missing timestamps)
  • Analytics layer: build aggregated views
  • Mart layer: expose final business metrics

This separation helps with debugging, testing, and ensuring that analytical outputs are built on validated data.

In this project, the flow is:
raw → clean → dq → analytics → mart

Each layer is implemented as SQL transformations in PostgreSQL and BigQuery.

Checklist

  • I have searched existing FAQs and this question is not already answered
  • The answer provides accurate, helpful information
  • I have included any relevant code examples or links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions