Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 3 additions & 8 deletions datafusion/optimizer/src/simplify_expressions/regex.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ const ANY_CHAR_REGEX_PATTERN: &str = ".*";
/// - partial anchored regex patterns (e.g. `^foo`) to `LIKE 'foo%'`
/// - combinations (alternatives) of the above, will be concatenated with `OR` or `AND`
/// - `EQ .*` to NotNull
/// - `NE .*` means IS EMPTY
/// - `NE .*` to false (.* matches non-empty and empty strings, and NULL !~ '.*' results in NULL so this can never be true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is only false in the context of filters (and this code can currently be used for both filters and regular expressions)

I think null !~ '.*' is actually null (not false)

I think a correct rewrite is

CASE 
  WHEN col IS NOT NULL THEN FALSE 
  ELSE NULL
END

Copy link
Copy Markdown
Member Author

@petern48 petern48 Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks, i didn't realize this code applied to cases outside of filters. I've fixed it and updated the PR title and description.

note: the optimizer would further rewrite the CASE expr into IS NOT NULL AND NULL when I ran the tests, so I rewrote it directly to that instead

edit: should've been IS NULL AND NULL 😬

///
/// Dev note: unit tests of this function are in `expr_simplifier.rs`, case `test_simplify_regex`.
pub fn simplify_regex_expr(
Expand Down Expand Up @@ -68,13 +68,8 @@ pub fn simplify_regex_expr(
// Handle the special case for ".*" pattern
if pattern == ANY_CHAR_REGEX_PATTERN {
let new_expr = if mode.not {
// not empty
let empty_lit = Box::new(string_scalar.to_expr(""));
Expr::BinaryExpr(BinaryExpr {
left,
op: Operator::Eq,
right: empty_lit,
})
// Always false.
lit(false)
} else {
// not null
left.is_not_null()
Expand Down
16 changes: 8 additions & 8 deletions datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -883,17 +883,17 @@ mod tests {
"
)?;

// Test `!= ".*"` transforms to checking if the column is empty
// Test `!~ ".*"` transforms to false
Comment thread
petern48 marked this conversation as resolved.
Outdated
let plan = LogicalPlanBuilder::from(table_scan.clone())
.filter(binary_expr(col("a"), Operator::RegexNotMatch, lit(".*")))?
.build()?;

assert_optimized_plan_equal!(
plan,
@ r#"
Filter: test.a = Utf8("")
@ r"
Filter: Boolean(false)
TableScan: test
"#
"
)?;

// Test case-insensitive versions
Expand All @@ -911,17 +911,17 @@ mod tests {
"
)?;

// Test `!~ ".*"` (case-insensitive) transforms to checking if the column is empty
// Test `!~* ".*"` (case-insensitive) transforms to false
let plan = LogicalPlanBuilder::from(table_scan.clone())
.filter(binary_expr(col("a"), Operator::RegexNotIMatch, lit(".*")))?
.build()?;

assert_optimized_plan_equal!(
plan,
@ r#"
Filter: test.a = Utf8("")
@ r"
Filter: Boolean(false)
TableScan: test
"#
"
)
}

Expand Down
8 changes: 2 additions & 6 deletions datafusion/sqllogictest/test_files/simplify_expr.slt
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,8 @@ physical_plan
query TT
explain select b from t where b !~ '.*'
----
logical_plan
01)Filter: t.b = Utf8View("")
02)--TableScan: t projection=[b]
physical_plan
01)FilterExec: b@0 =
02)--DataSourceExec: partitions=1, partition_sizes=[1]
logical_plan EmptyRelation: rows=0
physical_plan EmptyExec

query T
select b from t where b ~ '.*'
Expand Down
Loading