Skip to content

[SPARK-53440][PYTHON] Allow Column.transform() to accept SQL lambda expression strings#54965

Draft
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:fix-53440
Draft

[SPARK-53440][PYTHON] Allow Column.transform() to accept SQL lambda expression strings#54965
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:fix-53440

Conversation

@xiaoxuandev
Copy link
Contributor

What changes were proposed in this pull request?

Extend Column.transform to accept a SQL lambda expression string (e.g. 'x -> x * 2') in addition to the existing Python callable support.

For Classic mode, the SQL lambda is parsed by CatalystSqlParser, the lambda body's parameter variable is replaced with the actual column expression, and the result is returned directly — no intermediate array wrapping.

For Connect mode, the SQL lambda string is sent to the server via SQLExpression and evaluated using transform(array(col), lambda)[0], since the client has no local Catalyst parser.

Why are the changes needed?

Currently Column.transform only accepts Python callables. In some situations it is preferable to express transformations using SQL syntax (e.g. 'x -> x + 1') since no Python introspection happens, just simple parsing.

Does this PR introduce any user-facing change?

Yes. Column.transform now accepts a str argument in addition to Callable:

df.value.transform(lambda c: c * 2)   # existing — Python callable
df.value.transform('x -> x * 2')      # new — SQL lambda string

How was this patch tested?

Added 8 new tests in test_column.py covering SQL lambda arithmetic, function calls, conditional logic, null handling, chaining (SQL-only and mixed SQL+Python), and error paths (missing arrow, invalid param name).

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

…ression strings

### What changes were proposed in this pull request?
Extend `Column.transform` to accept a SQL lambda expression string (e.g. `'x -> x * 2'`) in addition to the existing Python callable support.

For Classic mode, the SQL lambda is parsed by `CatalystSqlParser`, the lambda body's parameter variable is replaced with the actual column expression, and the result is returned directly — no intermediate array wrapping.

For Connect mode, the SQL lambda string is sent to the server via `SQLExpression` and evaluated using `transform(array(col), lambda)[0]`, since the client has no local Catalyst parser.

### Why are the changes needed?
Currently `Column.transform` only accepts Python callables. In some situations it is preferable to express transformations using SQL syntax (e.g. `'x -> x + 1'`) since no Python introspection happens, just simple parsing. This was requested in SPARK-53440.

### Does this PR introduce _any_ user-facing change?
Yes. `Column.transform` now accepts a `str` argument in addition to `Callable`:
```python
df.value.transform(lambda c: c * 2)   # existing — Python callable
df.value.transform('x -> x * 2')      # new — SQL lambda string
```

### How was this patch tested?
Added 8 new tests in `test_column.py` covering SQL lambda arithmetic, function calls, conditional logic, null handling, chaining (SQL-only and mixed SQL+Python), and error paths (missing arrow, invalid param name).

### Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored with Kiro.

@dispatch_col_method
def transform(self, f: Callable[["Column"], "Column"]) -> "Column":
def transform(self, f: Union[Callable[["Column"], "Column"], str]) -> "Column":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, can't people just use it likeexpr("transform(c, 'x -> x * 2')")? These API are supposed to be python friendly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants