IGH Data Transformation

Developer Getting Started

Prerequisites

UV - Fast Python package manager (manages Python versions automatically)

Installation

Install UV (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone the repository:

git clone <repository-url>
cd igh-data-transform

Install Python and dependencies:

# UV will automatically install Python 3.12 if needed
uv sync

Running the Application

You can run the CLI tool without activating the virtual environment using uv run:

# Show available commands
uv run igh-transform --help

Alternatively, activate the virtual environment first:

source .venv/bin/activate  # On Linux/Mac
# or
.venv\Scripts\activate     # On Windows

# Then run normally
igh-transform --help

CLI Commands

Bronze to Silver Transformation

Transform raw Bronze layer data to cleaned Silver layer:

uv run igh-transform bronze-to-silver --bronze-db ./data/bronze.db --silver-db ./data/silver.db

This applies cleanup transformations:

Drops columns that are entirely null (preserves valid_from/valid_to)
Normalizes whitespace in text fields
Ready for table-specific column renames and value mappings

Silver to Gold Transformation

Transform Silver layer to a star schema Gold layer (dimensions, facts, bridges):

uv run igh-transform silver-to-gold --silver-db ./data/silver.db --gold-db ./data/star_schema.db

Running the ETL Pipeline

Two wrapper scripts run the full pipeline and copy the resulting star schema database to the backend.

`sync-and-run-etl.sh` - Sync + Transform

Syncs data from Dataverse, then runs Bronze -> Silver -> Gold -> Backend:

# Fresh sync (default) - deletes existing bronze DB and syncs from scratch
./sync-and-run-etl.sh

# Incremental sync - keeps existing bronze DB
./sync-and-run-etl.sh --update

# Skip sync entirely - use an existing bronze DB
./sync-and-run-etl.sh --skip-sync

# Use a custom .env file for Dataverse credentials
./sync-and-run-etl.sh --env-file /path/to/.env

`run-etl.sh` - Transform Only

Runs the transformation pipeline on an existing bronze DB (no Dataverse sync):

# Use the default bronze DB path (data/dataverse_complete_raw.db)
./run-etl.sh

# Use a custom bronze DB path
./run-etl.sh /path/to/bronze.db

Both scripts produce star_schema.db and copy it to ../backend/ and ../backend/tests/.

Pulling Data from Dataverse

This project uses igh-data-sync to pull data from Microsoft Dataverse before applying transformations.

Setup:

Configure environment variables - Create a .env file with your Dataverse credentials:

CLIENT_ID=your-azure-client-id
CLIENT_SECRET=your-azure-client-secret
SCOPE=https://your-org.crm.dynamics.com/.default
API_URL=https://your-org.api.crm.dynamics.com/api/data/v9.2/
SQLITE_DB_PATH=./data/dataverse.db

Run the sync - Pull data from Dataverse to local SQLite:
```
uv run sync-dataverse
```
Verify the data (optional) - Check foreign key integrity:
```
uv run sync-dataverse --verify
```

The synced data will be stored in a SQLite database with SCD2 (Slowly Changing Dimension Type 2) versioning for historical tracking.

Development Workflow

The project uses UV for dependency management. Common commands:

Add a dependency: uv add <package-name>
Add a dev dependency: uv add --dev <package-name>
Update dependencies: uv sync
Run commands without activating venv: uv run <command>
Run unit tests: uv run pytest
Run e2e tests: E2E_BRONZE_DB_PATH=/path/to/bronze.db uv run pytest --e2e -v
Run all tests: E2E_BRONZE_DB_PATH=/path/to/bronze.db uv run pytest --all -v
Run tests with coverage: uv run pytest --cov=igh_data_transform --cov-report=term-missing
Run linter: uv run ruff check src/ tests/

Documentation

Adding Transformations - Guide for data analysts on how to add new data transformations

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
docs		docs
src/igh_data_transform		src/igh_data_transform
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
run-etl.sh		run-etl.sh
sync-and-run-etl.sh		sync-and-run-etl.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IGH Data Transformation

Developer Getting Started

Prerequisites

Installation

Running the Application

CLI Commands

Bronze to Silver Transformation

Silver to Gold Transformation

Running the ETL Pipeline

`sync-and-run-etl.sh` - Sync + Transform

`run-etl.sh` - Transform Only

Pulling Data from Dataverse

Development Workflow

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IGH Data Transformation

Developer Getting Started

Prerequisites

Installation

Running the Application

CLI Commands

Bronze to Silver Transformation

Silver to Gold Transformation

Running the ETL Pipeline

sync-and-run-etl.sh - Sync + Transform

run-etl.sh - Transform Only

Pulling Data from Dataverse

Development Workflow

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`sync-and-run-etl.sh` - Sync + Transform

`run-etl.sh` - Transform Only

Packages