🧩 Open Lakehouse Starter

Open source environment for small teams & startups, modelled to be:

light,
expandable,
scalable (S3, Spark, orchestrators, Databricks, etc.).

Structure

dlt/: Python ETL pipeline for data ingestion
dbt/: SQL models and transformations
superset/: dashboard & visualizations
minio: S3-compatible object storage

Prerequisites

Docker and Docker Compose
Python 3.9 or higher
pip (Python package manager)

Installation

1. Clone the repository

git clone <your-repo-url>
cd LakehouseStarterKit

2. Create and activate a virtual environment

python -m venv .venv
# On Windows:
.venv\Scripts\activate
# On Linux/Mac:
source .venv/bin/activate

3. Install Python dependencies

pip install -r requirements.txt

4. Start Docker services

docker-compose up -d

This will start:

MinIO on http://localhost:9001 (console) and http://localhost:9000 (API)
- Username: admin
- Password: password123
Superset on http://localhost:8088
- Username: admin
- Password: admin

Wait a few minutes for Superset to complete initialization.

Quick Start

Run the data pipeline

# Load data from API to DuckDB using dlt
python dlt/pipelines/example_api.py

# Transform data using dbt
cd dbt
dbt run

Access the tools

MinIO Console: http://localhost:9001
Superset Dashboard: http://localhost:8088

Project Structure

.
├── dlt/                    # Data ingestion with dlt
│   ├── pipelines/
│   │   └── example_api.py  # Example pipeline fetching public APIs data
│   └── dlt.config.toml     # dlt configuration
├── dbt/                    # Data transformation with dbt
│   ├── models/
│   │   └── staging/
│   │       ├── sources.yml        # Source definitions
│   │       ├── schema.yml         # Model documentation
│   │       └── example_model.sql  # Example transformation
│   ├── dbt_project.yml
│   └── profiles.yml
├── superset/               # Superset configuration
│   └── superset_config.py
├── docker-compose.yml      # Docker services definition
├── requirements.txt        # Python dependencies
└── .env                    # Environment variables

Next Steps

Explore the data in DuckDB at openlakehouse_demo.duckdb
Create your own dlt pipelines in dlt/pipelines/
Add dbt models in dbt/models/
Connect Superset to DuckDB and create dashboards
Scale up by connecting to S3 (MinIO), adding Spark, or integrating orchestrators

License

MIT License - see LICENSE file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧩 Open Lakehouse Starter

Structure

Prerequisites

Installation

1. Clone the repository

2. Create and activate a virtual environment

3. Install Python dependencies

4. Start Docker services

Quick Start

Run the data pipeline

Access the tools

Project Structure

Next Steps

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets/screenshots		assets/screenshots
dbt		dbt
dlt		dlt
duckdb		duckdb
superset		superset
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧩 Open Lakehouse Starter

Structure

Prerequisites

Installation

1. Clone the repository

2. Create and activate a virtual environment

3. Install Python dependencies

4. Start Docker services

Quick Start

Run the data pipeline

Access the tools

Project Structure

Next Steps

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages