Skip to content

feat: support string to numeric coercion for arithmetic operators#23050

Open
dangdat1111 wants to merge 2 commits into
apache:mainfrom
dangdat1111:support-string-to-numeric-coercion
Open

feat: support string to numeric coercion for arithmetic operators#23050
dangdat1111 wants to merge 2 commits into
apache:mainfrom
dangdat1111:support-string-to-numeric-coercion

Conversation

@dangdat1111

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

Comparison operators already coerce a string operand to the other operand's numeric type (via string_numeric_coercion), so col < '5' works numerically. Arithmetic operators did not, so 1 + '1' failed during planning with:

Error during planning: Cannot coerce arithmetic expression Int64 + Utf8 to valid types

This change makes arithmetic consistent with comparison, and aligns the engine with its own documented non-ANSI mode behavior, which states that implicit casts between types are allowed (e.g. string to integer when possible).

What changes are included in this PR?

  • In BinaryTypeCoercer::signature_inner, add a string_numeric_coercion fallback to the Plus | Minus | Multiply | Divide | Modulo branch. The string operand is coerced to the numeric type of the other operand (e.g. Int64 + Utf8 -> both Int64, Utf8 + Float64 -> both Float64). The type coercion analyzer then inserts the cast on the string operand.
  • Scope is intentionally limited:
    • string + string (e.g. '1' + '2') still errors because the target type is ambiguous (matches PostgreSQL).
    • Temporal/string pairs (e.g. Timestamp + Utf8, Interval + Utf8) are unaffected, since those types are not numeric and so string_numeric_coercion does not apply.

Are these changes tested?

Yes.

  • Unit tests in datafusion/expr-common/src/type_coercion/binary/tests/arithmetic.rs: new test_type_coercion_arithmetic_string_numeric (all 5 operators across Utf8/LargeUtf8/Utf8View, both operand orders, result-type checks, and the Utf8 + Utf8 error case). test_coercion_error was updated to use Boolean + Boolean, since Float32 + Utf8 is now valid.
  • sqllogictests in datafusion/sqllogictest/test_files/string_numeric_coercion.slt: a new arithmetic section covering 1 + '1', '1' + 1, all 5 operators over an integer and a float column, arrow_typeof of the results, an EXPLAIN showing the string literal is cast to the numeric type, a runtime cast error for a non-numeric string, and the plan-time error for '1' + '2'.

Are there any user-facing changes?

Yes. Arithmetic expressions that mix a numeric and a string operand now plan and execute (the string is cast to the numeric type) instead of failing during planning. There are no breaking changes to public APIs.

🤖 Generated with Claude Code

## Which issue does this PR close?

- Closes apache#23041.

## Rationale for this change

Comparison operators already coerce a string operand to the other
operand's numeric type (via `string_numeric_coercion`), so `col < '5'`
works numerically. Arithmetic operators did not, so `1 + '1'` failed with
"Cannot coerce arithmetic expression Int64 + Utf8 to valid types". This
aligns arithmetic with comparison and with the documented non-ANSI mode
behavior that allows implicit string-to-numeric casts.

## What changes are included in this PR?

- In `BinaryTypeCoercer::signature_inner`, add a `string_numeric_coercion`
  fallback to the `Plus | Minus | Multiply | Divide | Modulo` branch. The
  string operand is coerced to the numeric type of the other operand
  (e.g. `Int64 + Utf8` -> both `Int64`, `Utf8 + Float64` -> both `Float64`).
- `string + string` remains unsupported as the target type is ambiguous,
  and temporal/string pairs (e.g. `Timestamp + Utf8`) are unaffected since
  those types are not numeric.

## Are these changes tested?

Yes.
- Unit tests in `binary/tests/arithmetic.rs`: new
  `test_type_coercion_arithmetic_string_numeric` and an updated
  `test_coercion_error` (now uses `Boolean + Boolean`, since `Float32 + Utf8`
  is valid).
- sqllogictests in `string_numeric_coercion.slt`: arithmetic with string
  literals/columns, result types, EXPLAIN, and runtime/plan-time errors.

## Are there any user-facing changes?

Yes. Arithmetic expressions mixing a numeric and a string operand now plan
and execute (the string is cast to the numeric type) instead of failing
during planning.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support string to numeric coercion

1 participant