Skip to content
#

scd2

Here are 18 public repositories matching this topic...

This is a data engineering pipeline built on Databricks + Delta Lake + PySpark that ingests travel booking and customer master data, applies SCD Type 2 logic, and delivers analytics-ready tables. It includes data quality enforcement, dimension versioning, fact aggregation, and performance tuning.

  • Updated Oct 8, 2025
  • Jupyter Notebook

Production-grade CDC pipeline: MySQL → Debezium → Kinesis → S3 → AWS Glue (PySpark) → Redshift + Postgres + OpenSearch. Multi-sink fanout with SCD2, idempotency tracking, and 13 modular Terraform modules.

  • Updated Apr 21, 2026
  • Python

Modern data stack reference: dbt + BigQuery + Airflow (Cloud Composer) with medallion layering, SCD2 snapshots, exposures, freshness SLAs, and 45× cost reduction via partition + cluster + incremental tuning.

  • Updated Apr 21, 2026
  • Python

Improve this page

Add a description, image, and links to the scd2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scd2 topic, visit your repo's landing page and select "manage topics."

Learn more