Skip to content

Commit 67e8712

Browse files
Merge pull request #7 from limebit/func_cat_refactoring
Func cat refactoring
2 parents 0cdddca + 00e4881 commit 67e8712

15 files changed

Lines changed: 1271 additions & 583 deletions

File tree

README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,12 +90,11 @@ Total time: 0.0247s
9090
| Category | Description | Examples |
9191
|----------|-------------|----------|
9292
| **col_** | Column operations | `col_rename`, `col_drop`, `col_cast`, `col_add`, `col_select` |
93-
| **math_** | Arithmetic operations | `math_add`, `math_multiply`, `math_clamp`, `math_round`, `math_abs` |
93+
| **math_** | Arithmetic & scaling | `math_add`, `math_multiply`, `math_standardize`, `math_minmax`, `math_clamp` |
9494
| **rows_** | Row filtering & reshaping | `rows_filter`, `rows_drop_nulls`, `rows_sort`, `rows_unique`, `rows_pivot` |
9595
| **str_** | String operations | `str_lower`, `str_upper`, `str_strip`, `str_replace`, `str_split` |
9696
| **dt_** | Datetime operations | `dt_year`, `dt_month`, `dt_parse`, `dt_age_years`, `dt_diff_days` |
97-
| **map_** | Value mapping | `map_values`, `map_discretize`, `map_case`, `map_from_column` |
98-
| **enc_** | Categorical encoding | `enc_onehot`, `enc_ordinal`, `enc_label` |
97+
| **map_** | Value mapping & encoding | `map_values`, `map_discretize`, `map_onehot`, `map_ordinal` |
9998

10099
## Installation
101100

docs/api/index.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,13 @@ All TransformPlan operations at a glance. Click method names for detailed docume
8282
| [`math_percent_of`](ops/math.md) | Calculate percentage of one column relative to another |
8383
| [`math_cumsum`](ops/math.md) | Calculate cumulative sum (optionally grouped) |
8484
| [`math_rank`](ops/math.md) | Calculate rank of values |
85+
| [`math_standardize`](ops/math.md) | Z-score standardization (mean=0, std=1) |
86+
| [`math_minmax`](ops/math.md) | Min-max normalization to a range |
87+
| [`math_robust_scale`](ops/math.md) | Robust scaling using median and IQR |
88+
| [`math_log`](ops/math.md) | Logarithmic transform |
89+
| [`math_sqrt`](ops/math.md) | Square root transform |
90+
| [`math_power`](ops/math.md) | Power transform |
91+
| [`math_winsorize`](ops/math.md) | Clip values to percentiles or bounds |
8592

8693
### Row Operations
8794

docs/api/ops/encoding.md

Lines changed: 0 additions & 187 deletions
This file was deleted.

docs/api/ops/map.md

Lines changed: 55 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Map Operations
22

3-
Value mapping, discretization, and transformation operations.
3+
Value mapping, discretization, encoding, and transformation operations.
44

55
## Overview
66

7-
Map operations transform column values using dictionaries, bins, or other columns. They're useful for categorization, value replacement, and data normalization.
7+
Map operations transform column values using dictionaries, bins, or encoding schemes. They're useful for categorization, value replacement, data normalization, and ML feature preparation.
88

99
```python
1010
from transformplan import TransformPlan
@@ -13,6 +13,7 @@ plan = (
1313
TransformPlan()
1414
.map_values("status", {"A": "Active", "I": "Inactive"})
1515
.map_discretize("age", bins=[18, 35, 55], labels=["Young", "Adult", "Senior"])
16+
.map_onehot("color", categories=["red", "green", "blue"], drop="first")
1617
)
1718
```
1819

@@ -29,6 +30,9 @@ plan = (
2930
- map_null_to_value
3031
- map_value_to_null
3132
- map_from_column
33+
- map_onehot
34+
- map_ordinal
35+
- map_label
3236

3337
## Examples
3438

@@ -155,3 +159,52 @@ plan = TransformPlan().map_value_to_null("score", -999)
155159
# Replace null with default
156160
plan = TransformPlan().map_null_to_value("category", "Uncategorized")
157161
```
162+
163+
### One-Hot Encoding
164+
165+
```python
166+
# Basic one-hot encoding
167+
plan = TransformPlan().map_onehot(
168+
column="color",
169+
categories=["red", "green", "blue"]
170+
)
171+
# Creates columns: color_red, color_green, color_blue
172+
173+
# Drop first category to avoid multicollinearity (for regression models)
174+
plan = TransformPlan().map_onehot(
175+
column="color",
176+
categories=["red", "green", "blue"],
177+
drop="first"
178+
)
179+
# Creates columns: color_green, color_blue (drops color_red)
180+
```
181+
182+
### Ordinal Encoding
183+
184+
```python
185+
# Ordinal encoding with meaningful order
186+
plan = TransformPlan().map_ordinal(
187+
column="size",
188+
categories=["small", "medium", "large"]
189+
)
190+
# Maps: small -> 0, medium -> 1, large -> 2
191+
```
192+
193+
### Label Encoding
194+
195+
```python
196+
# Label encoding (alphabetically sorted by default)
197+
plan = TransformPlan().map_label(column="department")
198+
# Maps alphabetically: Engineering -> 0, HR -> 1, Sales -> 2
199+
```
200+
201+
### ML Feature Preparation
202+
203+
```python
204+
# One-hot encode categorical features, dropping first to avoid multicollinearity
205+
plan = (
206+
TransformPlan()
207+
.map_onehot("color", categories=["red", "green", "blue"], drop="first")
208+
.map_ordinal("quality", categories=["low", "medium", "high"])
209+
)
210+
```

docs/api/ops/math.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@ plan = (
3939
- math_percent_of
4040
- math_cumsum
4141
- math_rank
42+
- math_standardize
43+
- math_minmax
44+
- math_robust_scale
45+
- math_log
46+
- math_sqrt
47+
- math_power
48+
- math_winsorize
4249

4350
## Examples
4451

@@ -128,3 +135,52 @@ plan = TransformPlan().math_rank(
128135
group_by="category"
129136
)
130137
```
138+
139+
### Scaling Operations
140+
141+
```python
142+
# Z-score standardization (explicit params for reproducibility)
143+
plan = TransformPlan().math_standardize("income", mean=50000, std=25000)
144+
145+
# Derive from data
146+
plan = TransformPlan().math_standardize("income")
147+
148+
# Min-max normalization to [0, 1]
149+
plan = TransformPlan().math_minmax("age", min_val=0, max_val=100)
150+
151+
# Custom range
152+
plan = TransformPlan().math_minmax("score", min_val=0, max_val=100, feature_range=(0, 10))
153+
154+
# Robust scaling (resistant to outliers)
155+
plan = TransformPlan().math_robust_scale("salary", median=60000, iqr=30000)
156+
```
157+
158+
### Transform Operations
159+
160+
```python
161+
# Natural log
162+
plan = TransformPlan().math_log("price")
163+
164+
# Log base 10
165+
plan = TransformPlan().math_log("price", base=10)
166+
167+
# Log with offset for zeros
168+
plan = TransformPlan().math_log("count", offset=1) # log(x + 1)
169+
170+
# Square root
171+
plan = TransformPlan().math_sqrt("variance")
172+
173+
# Power transform
174+
plan = TransformPlan().math_power("value", exponent=2) # square
175+
plan = TransformPlan().math_power("value", exponent=0.5) # sqrt
176+
```
177+
178+
### Outlier Handling
179+
180+
```python
181+
# Winsorize by percentiles
182+
plan = TransformPlan().math_winsorize("salary", lower=0.05, upper=0.95)
183+
184+
# Winsorize by explicit values
185+
plan = TransformPlan().math_winsorize("salary", lower_value=20000, upper_value=200000)
186+
```

docs/index.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,8 +93,7 @@ Total time: 0.0247s
9393
| **rows_** | Row filtering & reshaping | `rows_filter`, `rows_drop_nulls`, `rows_sort`, `rows_unique`, `rows_pivot` |
9494
| **str_** | String operations | `str_lower`, `str_upper`, `str_strip`, `str_replace`, `str_split` |
9595
| **dt_** | Datetime operations | `dt_year`, `dt_month`, `dt_parse`, `dt_age_years`, `dt_diff_days` |
96-
| **map_** | Value mapping | `map_values`, `map_discretize`, `map_case`, `map_from_column` |
97-
| **enc_** | Categorical encoding | `enc_onehot`, `enc_ordinal`, `enc_label` |
96+
| **map_** | Value mapping & encoding | `map_values`, `map_discretize`, `map_onehot`, `map_ordinal` |
9897

9998

10099
## Getting Started

mkdocs.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,4 +91,3 @@ nav:
9191
- String Operations: api/ops/string.md
9292
- Datetime Operations: api/ops/datetime.md
9393
- Map Operations: api/ops/map.md
94-
- Encoding Operations: api/ops/encoding.md

0 commit comments

Comments
 (0)