Skip to content

Commit 22ebdc6

Browse files
feat!: add join support (#13)
* added join * changed backend pass to process * docs updated * version update * version
1 parent 7f357bb commit 22ebdc6

19 files changed

Lines changed: 1164 additions & 303 deletions

File tree

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,14 +97,15 @@ from transformplan.backends.duckdb import DuckDBBackend
9797
con = duckdb.connect()
9898
rel = con.sql("SELECT * FROM 'patients.parquet'")
9999

100+
# Same plan — backend chosen at execution time
100101
plan = (
101-
TransformPlan(backend=DuckDBBackend(con))
102+
TransformPlan()
102103
.col_rename(column="PatientID", new_name="patient_id")
103104
.rows_filter(Col("age") >= 18)
104105
.math_round(column="score", decimals=2)
105106
)
106107

107-
result, protocol = plan.process(rel)
108+
result, protocol = plan.process(rel, backend=DuckDBBackend(con))
108109
```
109110

110111
## Available Operations

docs/api/backends.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,22 @@ The backend determines how data is stored and transformed:
99
- **PolarsBackend** (default): Operates on Polars DataFrames using native Polars expressions
1010
- **DuckDBBackend** (optional): Operates on DuckDB relations using SQL generation
1111

12-
All operations, validation, dry-run, and serialization work identically regardless of backend. Pipelines serialized with one backend can be loaded and executed with another.
12+
A `TransformPlan` is a pure, backend-agnostic recipe of operations. The backend is chosen at execution time by passing it to `process()`, `validate()`, or `dry_run()`. If no backend is specified, `PolarsBackend` is used by default. Pipelines serialized with one backend can be loaded and executed with another.
1313

1414
```python
1515
from transformplan import TransformPlan
1616

17-
# Default — uses PolarsBackend
18-
plan = TransformPlan()
17+
# Build a plan — no backend needed
18+
plan = TransformPlan().col_drop("temp").math_add("age", 1)
1919

20-
# Explicit Polars backend
21-
from transformplan.backends.polars import PolarsBackend
22-
plan = TransformPlan(backend=PolarsBackend())
20+
# Execute with default PolarsBackend
21+
result, protocol = plan.process(polars_df)
2322

24-
# DuckDB backend
23+
# Execute with DuckDB backend
2524
import duckdb
2625
from transformplan.backends.duckdb import DuckDBBackend
2726
con = duckdb.connect()
28-
plan = TransformPlan(backend=DuckDBBackend(con))
27+
result, protocol = plan.process(duckdb_rel, backend=DuckDBBackend(con))
2928
```
3029

3130
## Backend ABC
@@ -96,38 +95,40 @@ con = duckdb.connect()
9695
rel = con.sql("SELECT * FROM 'data.parquet'")
9796

9897
plan = (
99-
TransformPlan(backend=DuckDBBackend(con))
98+
TransformPlan()
10099
.col_rename(column="ID", new_name="id")
101100
.rows_filter(Col("age") >= 18)
102101
.math_standardize(column="score", new_column="z_score")
103102
)
104103

105-
result, protocol = plan.process(rel)
104+
result, protocol = plan.process(rel, backend=DuckDBBackend(con))
106105
```
107106

108107
## Cross-Backend Serialization
109108

110-
Pipelines are backend-agnostic when serialized. You can build a pipeline with one backend and execute it with another:
109+
Pipelines are inherently backend-agnostic. The same serialized plan can be executed with any backend:
111110

112111
```python
113-
import polars as pl
114112
import duckdb
115113
from transformplan import TransformPlan, Col
116114
from transformplan.backends.duckdb import DuckDBBackend
117115

118-
# Build and serialize with Polars (default)
116+
# Build and serialize
119117
plan = (
120118
TransformPlan()
121119
.col_rename(column="ID", new_name="id")
122120
.rows_filter(Col("age") >= 18)
123121
)
124122
plan.to_json("pipeline.json")
125123

126-
# Load and execute with DuckDB
124+
# Load and execute with Polars (default)
125+
restored = TransformPlan.from_json("pipeline.json")
126+
result, protocol = restored.process(polars_df)
127+
128+
# Or execute with DuckDB
127129
con = duckdb.connect()
128130
rel = con.sql("SELECT * FROM 'data.parquet'")
129-
plan_duckdb = TransformPlan.from_json("pipeline.json", backend=DuckDBBackend(con))
130-
result, protocol = plan_duckdb.process(rel)
131+
result, protocol = restored.process(rel, backend=DuckDBBackend(con))
131132
```
132133

133134
## Type System

docs/api/plan.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The main class for building and executing transformation pipelines.
44

55
## Overview
66

7-
`TransformPlan` uses a deferred execution model: operations are registered via method chaining, then executed together when you call `process()`, `validate()`, or `dry_run()`. An optional `backend` parameter selects the execution engine (defaults to `PolarsBackend`).
7+
`TransformPlan` uses a deferred execution model: operations are registered via method chaining, then executed together when you call `process()`, `validate()`, or `dry_run()`. The plan itself is backend-agnostic — the backend is chosen at execution time (defaults to `PolarsBackend`).
88

99
```python
1010
from transformplan import TransformPlan, Col
@@ -22,15 +22,19 @@ df_result, protocol = plan.process(df)
2222

2323
## Backend Selection
2424

25+
The backend is passed at execution time, not at construction:
26+
2527
```python
28+
from transformplan.backends.duckdb import DuckDBBackend
29+
30+
plan = TransformPlan().col_drop("temp").math_add("age", 1)
31+
2632
# Default (Polars)
27-
plan = TransformPlan()
33+
result, protocol = plan.process(polars_df)
2834

2935
# DuckDB
30-
import duckdb
31-
from transformplan.backends.duckdb import DuckDBBackend
3236
con = duckdb.connect()
33-
plan = TransformPlan(backend=DuckDBBackend(con))
37+
result, protocol = plan.process(duckdb_rel, backend=DuckDBBackend(con))
3438
```
3539

3640
See [Backends](backends.md) for details on each backend.

docs/api/validation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ if not result.is_valid:
6969

7070
## DuckDB Validation
7171

72-
Validation works identically with DuckDB relations:
72+
Validation works identically with DuckDB relations — pass the backend at validation time:
7373

7474
```python
7575
import duckdb
@@ -80,12 +80,12 @@ con = duckdb.connect()
8080
rel = con.sql("SELECT 'Alice' AS name, 25 AS age, 50000 AS salary")
8181

8282
plan = (
83-
TransformPlan(backend=DuckDBBackend(con))
83+
TransformPlan()
8484
.col_drop("age")
8585
.rows_filter(Col("age") > 18) # Error: age was dropped!
8686
)
8787

88-
result = plan.validate(rel)
88+
result = plan.validate(rel, backend=DuckDBBackend(con))
8989
# ValidationResult(valid=False, errors=1)
9090
```
9191

docs/getting-started/quickstart.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ print(df_result)
7474

7575
## Using the DuckDB Backend
7676

77-
TransformPlan supports DuckDB as an alternative backend. All 88 operations, validation, and dry-run work identically — only the data type changes from Polars DataFrames to DuckDB relations.
77+
TransformPlan supports DuckDB as an alternative backend. All 88 operations, validation, and dry-run work identically — the same plan works with both Polars DataFrames and DuckDB relations. Simply pass the backend at execution time:
7878

7979
```python
8080
import duckdb
@@ -89,18 +89,20 @@ rel = con.sql("""
8989
UNION ALL SELECT 'Diana', 'Sales', 70000, 2
9090
""")
9191

92+
# Same plan as before — no backend in constructor
9293
plan = (
93-
TransformPlan(backend=DuckDBBackend(con))
94+
TransformPlan()
9495
.col_rename(column="name", new_name="employee")
9596
.math_multiply(column="salary", value=1.05)
9697
.math_round(column="salary", decimals=0)
9798
.rows_filter(Col("years") >= 3)
9899
)
99100

100-
# Validate and execute — same API as Polars
101-
result = plan.validate(rel)
101+
# Pass backend at execution time
102+
backend = DuckDBBackend(con)
103+
result = plan.validate(rel, backend=backend)
102104
if result.is_valid:
103-
df_result, protocol = plan.process(rel)
105+
df_result, protocol = plan.process(rel, backend=backend)
104106
```
105107

106108
## Viewing the Audit Protocol

docs/index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,14 +96,15 @@ from transformplan.backends.duckdb import DuckDBBackend
9696
con = duckdb.connect()
9797
rel = con.sql("SELECT * FROM 'patients.parquet'")
9898

99+
# Same plan — backend chosen at execution time
99100
plan = (
100-
TransformPlan(backend=DuckDBBackend(con))
101+
TransformPlan()
101102
.col_rename(column="PatientID", new_name="patient_id")
102103
.rows_filter(Col("age") >= 18)
103104
.math_round(column="score", decimals=2)
104105
)
105106

106-
result, protocol = plan.process(rel)
107+
result, protocol = plan.process(rel, backend=DuckDBBackend(con))
107108
```
108109

109110
## Available Operations

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "transformplan"
3-
version = "0.1.2"
3+
version = "0.1.3"
44
description = "Safe, reproducible data transformations with built-in auditing and validation"
55
readme = "README.md"
66
requires-python = ">=3.10"

0 commit comments

Comments
 (0)