AGENTS.md — modern-data-platform

Global guidance for agentic coding agents working in this repo. Sub-folder AGENTS.md files override these rules for their specific context.

Project overview

modern-data-platform composes free, open-source Modern Data Stack tools into a self-contained, versioned, production-ready, and LLM-friendly data warehouse.

The intended workflow is daily or hourly batch refresh (not real-time streaming).

Tech stack

Layer	Tool	Status
Environment	uv	Active
Database	DuckDB	Active
Transformation	SQLMesh	Active
Orchestration	Dagster	Planned
Ingestion	dlt	Planned
Storage layer	DuckLake	Planned
Reporting	Evidence	Planned

Python >=3.13 (pyproject.toml), managed with uv.

Repository structure

root/
├── AGENTS.md              # This file — global rules and conventions
├── .agents/               # Agent config, SKILL and Persona definitions
│   ├── data-engineer.md
│   └── analyst.md
├── analytics/             # Analytics / ML module (future)
│   └── __init__.py
├── ingestion/             # dlt pipelines (future)
├── reporting/             # Dashboards (future)
├── transformation/        # SQLMesh pipeline (DuckDB backend)
│   ├── config.yaml
│   ├── models/
│   │   ├── a_raw/         # Raw layer: ingestion models
│   │   ├── b_staging/     # Staging layer: cleaning & normalization
│   │   └── c_marts/       # Marts layer: business-ready tables
│   ├── seeds/             # CSV seed files for reference data
│   ├── audits/
│   ├── macros/
│   └── tests/
├── data/                  # Local DuckDB database (git-ignored)
├── pyproject.toml         # Python deps, ruff config, build
└── uv.lock

Setup

uv sync  # Create venv and install dependencies

Common commands

Python (uv)

uv sync                         # Install/sync dependencies
uv venv --clear && uv sync      # Recreate venv from scratch

SQLMesh pipeline (`transformation/`)

cd transformation && sqlmesh plan --auto-apply   # Plan + auto-apply changes
cd transformation && sqlmesh run                 # Run pipeline
cd transformation && sqlmesh plan -m <model_name> --auto-apply  # Single model

Model naming convention: a_raw.*, b_staging.*, c_marts.*.

DuckDB

duckdb -ui data/db.duckdb       # Open DuckDB UI

Lint / format

Python (ruff)

uv run ruff format .            # Format
uv run ruff check --fix .       # Lint + auto-fix

SQL (SQLMesh)

cd transformation && sqlmesh lint

Rules configured in transformation/config.yaml.

Testing

cd transformation && sqlmesh lint             # SQLMesh lint
uv run ruff check . && cd transformation && sqlmesh lint  # ruff + sqlmesh lint

No Python unit test suite yet (transformation/tests/.gitkeep).

Code style

General

Prefer small, reviewable PRs: isolate behavior changes.
Keep side effects explicit: separate data fetching, transformation, and persistence.
Do not write to data/ at import time; do it in functions/entrypoints.

Python imports

Prefer explicit imports over import *.
Use collections.abc for typing iterables.
Use explicit from typing import X imports. Do not use import typing as t.

Types and annotations

Use type hints for public functions and non-trivial helpers.
Prefer collections.abc types: Iterator, Iterable, Sequence, etc.
Use typing.Any only at integration boundaries.

Naming conventions

SQLMesh models: a_raw.*, b_staging.*, c_marts.*.
Column names: snake_case.

Error handling and logging

Avoid bare except:; catch Exception at boundaries only.
Prefer structured logging (logging module) over print for libraries.
Include enough context for debugging (what operation failed, exception message).
Re-raise if failure should abort the run.

Polars usage

Prefer Polars for DataFrame operations in Python models.
Keep column naming consistent: snake_case.

SQL style (DuckDB / SQLMesh)

Keep SELECT lists explicit; avoid SELECT * in staging and marts.
Use consistent casing for SQL keywords (match surrounding code).
Prefer CTEs for multi-step transforms.

Python SQLMesh models

Conventions for Python-based SQLMesh models (transformation/models/):

Column types in @model decorator: use lowercase ("date", "text", "float").
Reference data (seeds): static reference data lives in CSV seed files under transformation/seeds/. Python models read from seed tables at runtime via context.resolve_table() + context.fetchdf(). Do not hardcode reference data inline in model files.
Returning empty DataFrames: Python models may not return an empty DataFrame. If your model could possibly return an empty DataFrame, conditionally yield it or return an empty generator instead:
```
if df.is_empty():
    yield from ()
else:
    yield df
```
Error handling: catch Exception at the top level, log with logger.exception(...) for full traceback context, then re-raise so the pipeline fails visibly on persistent errors.

Repo-specific notes

Local DB path: data/db.duckdb (see transformation/config.yaml).
data/ and logs/ are local artifacts, git-ignored.

Data Analyst Best Practices:

Query from c_marts.* tables whenever possible; avoid querying raw or staging
Document assumptions and methodology in reports.
Prefer explicit SELECT lists over SELECT *.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md — modern-data-platform

Project overview

Tech stack

Repository structure

Setup

Common commands

Python (uv)

SQLMesh pipeline (`transformation/`)

DuckDB

Lint / format

Python (ruff)

SQL (SQLMesh)

Testing

Code style

General

Python imports

Types and annotations

Naming conventions

Error handling and logging

Polars usage

SQL style (DuckDB / SQLMesh)

Python SQLMesh models

Repo-specific notes

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md — modern-data-platform

Project overview

Tech stack

Repository structure

Setup

Common commands

Python (uv)

SQLMesh pipeline (transformation/)

DuckDB

Lint / format

Python (ruff)

SQL (SQLMesh)

Testing

Code style

General

Python imports

Types and annotations

Naming conventions

Error handling and logging

Polars usage

SQL style (DuckDB / SQLMesh)

Python SQLMesh models

Repo-specific notes

SQLMesh pipeline (`transformation/`)