sql-parser: Fix IS [NOT] DISTINCT FROM precedence vs AND/OR#37061
Conversation
The right-hand side of `IS [NOT] DISTINCT FROM` was parsed at `Precedence::Zero`, which greedily pulled a trailing `AND`/`OR` into the RHS. As a result `a IS DISTINCT FROM b AND c` parsed as `a IS DISTINCT FROM (b AND c)` instead of `(a IS DISTINCT FROM b) AND c`, producing wrong results for any predicate that combined `IS [NOT] DISTINCT FROM` with a following boolean connective. Parse the RHS at the precedence of the `IS` operator instead, so that `AND`/`OR` are left for the surrounding expression while comparison/arithmetic (which bind tighter) are still consumed. This matches PostgreSQL. Adds parser unit tests for the AND/OR/arithmetic cases and a sqllogictest regression test covering the query results from the report. Fixes SQL-236
97b6b57 to
87c2d10
Compare
def-
left a comment
There was a problem hiding this comment.
Do we need a catalog migration for potentially already persisted objects?
We normally don't store parsed results but store the SQL for objects, so I'd say we should be fine, plus a migration is a bit heavy. But if you have more concrete concerns I could look into it. |
There was a problem hiding this comment.
Nit: extract construct to a variable to avoid the deep indentation here.
There was a problem hiding this comment.
Done in 8144698 — extracted the IsExprConstruct into a let binding so the match arms are no longer nested inside the struct literal.
def-
left a comment
There was a problem hiding this comment.
I tried, but actually looks fine when upgrading, so no problem from QA side:
diff --git a/misc/python/materialize/checks/all_checks/materialized_views.py b/misc/python/materialize/checks/all_checks/materialized_views.py
index 6066d1af45..455e737028 100644
--- a/misc/python/materialize/checks/all_checks/materialized_views.py
+++ b/misc/python/materialize/checks/all_checks/materialized_views.py
@@ -352,3 +352,61 @@ class MaterializedViewReplacement(Check):
# `validate` remains idempotent.
> DROP MATERIALIZED VIEW mv_replacement_replacement
"""))
+
+
+class MaterializedViewIsNotDistinctFromPrecedence(Check):
+ def initialize(self) -> Testdrive:
+ return Testdrive(dedent("""
+ > CREATE TABLE precedence_table (flag boolean, active boolean);
+ > INSERT INTO precedence_table VALUES (true, true), (false, true), (NULL, true), (true, false);
+ > CREATE VIEW precedence_view1 AS SELECT flag, active FROM precedence_table WHERE flag IS DISTINCT FROM false OR active = false
+ > CREATE MATERIALIZED VIEW precedence_mv1 AS SELECT flag, active FROM precedence_table WHERE flag IS DISTINCT FROM false OR active = false
+ """))
+
+ def manipulate(self) -> list[Testdrive]:
+ return [
+ Testdrive(dedent(s))
+ for s in [
+ """
+ > CREATE VIEW precedence_view2 AS SELECT flag, active FROM precedence_table WHERE flag IS DISTINCT FROM false OR active = false
+ > CREATE MATERIALIZED VIEW precedence_mv2 AS SELECT flag, active FROM precedence_table WHERE flag IS DISTINCT FROM false OR active = false
+ """,
+ """
+ > CREATE VIEW precedence_view3 AS SELECT flag, active FROM precedence_table WHERE flag IS DISTINCT FROM false OR active = false
+ > CREATE MATERIALIZED VIEW precedence_mv3 AS SELECT flag, active FROM precedence_table WHERE flag IS DISTINCT FROM false OR active = false
+ """,
+ ]
+ ]
+
+ def validate(self) -> Testdrive:
+ return Testdrive(dedent("""
+ > SELECT * FROM precedence_view1
+ <null> true
+ true false
+ true true
+
+ > SELECT * FROM precedence_view2
+ <null> true
+ true false
+ true true
+
+ > SELECT * FROM precedence_view3
+ <null> true
+ true false
+ true true
+
+ > SELECT * FROM precedence_mv1
+ <null> true
+ true false
+ true true
+
+ > SELECT * FROM precedence_mv2
+ <null> true
+ true false
+ true true
+
+ > SELECT * FROM precedence_mv3
+ <null> true
+ true false
+ true true
+ """))
@def- you want that test addition in there as well? Or that was just a one-off check from your side? |
Address review nit: compute the IsExprConstruct in a let binding before building Expr::IsExpr, so the match arms are no longer nested inside the struct literal. Pure refactor, no behavior change.
|
Just a one-off check. Since it found no bug we don't need it as a regression test. |
Motivation
Fixes SQL-236.
The right-hand side of
IS [NOT] DISTINCT FROMwas parsed atPrecedence::Zero, the lowest precedence, so the parser greedily pulled atrailing
AND/ORinto the RHS.a IS DISTINCT FROM b AND cparsed asa IS DISTINCT FROM (b AND c)instead of(a IS DISTINCT FROM b) AND c,silently producing wrong results for any predicate combining
IS [NOT] DISTINCT FROMwith a following boolean connective.Description
Parse the RHS of
IS [NOT] DISTINCT FROMat the precedence of theISoperator (
self.parse_subexpr(precedence)) rather thanparse_expr()(
Precedence::Zero). With that precedence:AND/OR(lower precedence) are left for the surrounding expression, sothey are no longer absorbed into the RHS.
into the RHS, unchanged.
This matches PostgreSQL, where
IS DISTINCT FROMbinds tighter thanAND/ORbut looser than comparison/arithmetic.Verification
mz-sql-parserdatadriven cases (src/sql-parser/tests/testdata/scalar)assert the parse tree for
a IS DISTINCT FROM b AND c,a IS NOT DISTINCT FROM b OR c, anda IS DISTINCT FROM b + c.test/sqllogictest/distinct_from.sltcover the exact queries from the report and assert the unparenthesized forms
now match the explicitly-parenthesized ones.
return the same rows there).