Skip to content

fix: SanityCheckPlan error with window functions and NVL filter#20231

Open
EeshanBembi wants to merge 2 commits intoapache:mainfrom
EeshanBembi:fix/sanity-check-nvl-equivalence-20194
Open

fix: SanityCheckPlan error with window functions and NVL filter#20231
EeshanBembi wants to merge 2 commits intoapache:mainfrom
EeshanBembi:fix/sanity-check-nvl-equivalence-20194

Conversation

@EeshanBembi
Copy link
Contributor

Which issue does this PR close?

Closes #20194

Rationale for this change

A query with ROW_NUMBER() OVER (... ORDER BY CASE WHEN col='0' THEN 1 ELSE 0 END) combined with a filter nvl(t2.value_2_3,'0')='0' fails with a SanityCheckPlan error. This worked in 50.3.0 but broke in 52.1.0.

What changes are included in this PR?

Root cause: collect_columns_from_predicate_inner was extracting equality pairs where neither side was a Column (e.g. nvl(col, '0') = '0'), creating equivalence classes between complex expressions and literals. normalize_expr's deep traversal would then replace the literal '0' inside unrelated sort/window CASE WHEN expressions with the complex NVL expression, corrupting the sort ordering and causing a mismatch between SortExec's reported output ordering and BoundedWindowAggExec's expected ordering.

Fix (two changes in filter.rs):

  1. collect_columns_from_predicate_inner: Only extract equality pairs where at least one side is a Column reference. This matches the function's documented intent ("Column-Pairs") and prevents complex-expression-to-literal equivalence classes from being created.
  2. extend_constants: Recognize Literal expressions as inherently constant (previously only checked is_expr_constant on the input's equivalence properties, which doesn't know about literals). This ensures constant propagation still works for complex_expr = literal predicates — e.g. nvl(col, '0') is properly marked as constant after the filter.

How was this tested?

  • Unit test test_collect_columns_skips_non_column_pairs verifying the filtering logic
  • Sqllogictest reproducing the exact query from the issue
  • Full test suites: equivalence tests (51 passed), physical-plan tests (1255 passed), physical-optimizer tests (20 passed)
  • Manual verification with datafusion-cli running the reproduction query

Test plan

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Feb 9, 2026
@EeshanBembi EeshanBembi force-pushed the fix/sanity-check-nvl-equivalence-20194 branch from 5543f1f to 244be39 Compare February 9, 2026 10:57
@EeshanBembi EeshanBembi marked this pull request as ready for review February 9, 2026 14:12
@kosiew
Copy link
Contributor

kosiew commented Feb 26, 2026

hi @EeshanBembi
I would love to review this PR after you resolve the merge conflict.

…he#20194)

`collect_columns_from_predicate_inner` was extracting equality pairs
where neither side was a Column (e.g. `nvl(col, '0') = '0'`), creating
equivalence classes between complex expressions and literals.
`normalize_expr`'s deep traversal would then replace the literal inside
unrelated sort/window expressions with the complex expression,
corrupting the sort ordering and triggering SanityCheckPlan failures.

Fix by restricting `collect_columns_from_predicate_inner` to only
extract pairs where at least one side is a Column reference, matching
the function's documented intent. Also update `extend_constants` to
recognize Literal expressions as inherently constant, so constant
propagation still works for `complex_expr = literal` predicates.

Closes apache#20194
@EeshanBembi EeshanBembi force-pushed the fix/sanity-check-nvl-equivalence-20194 branch from 244be39 to ce9919c Compare February 26, 2026 13:07
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EeshanBembi for working on this.

Comment on lines +365 to +374
let left_const =
input_eqs.is_expr_constant(binary.left()).or_else(|| {
binary
.left()
.as_any()
.downcast_ref::<Literal>()
.map(|l| AcrossPartitions::Uniform(Some(l.value().clone())))
});
let right_const =
input_eqs.is_expr_constant(binary.right()).or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current logic repeats nearly identical is_expr_constant(...).or_else(literal...) blocks for left/right operands.

Could we extract an expr_constant_or_literal(expr, input_eqs) helper here? It would remove duplication and centralize literal/const semantics used by filter equivalence inference.

// replace those occurrences with the complex expression, corrupting
// sort orderings. Constant propagation for such pairs is handled
// separately by `extend_constants`.
let has_column = binary.left().as_any().downcast_ref::<Column>().is_some()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_column currently means "one side is directly a Column expression," not "expression contains a column."

Would has_direct_column_operand (or similar) be clearer here? It matches the intentional exclusion of complex_expr=literal pairs and avoids misreading this as recursive column detection.

/// `normalize_expr`'s deep traversal would replace the literal inside
/// unrelated expressions (e.g. sort keys) with the complex expression.
#[tokio::test]
async fn test_collect_columns_skips_non_column_pairs() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_collect_columns_skips_non_column_pairs does not use async behavior.

Could this be a plain #[test]? Keeping it sync makes scope a bit clearer since no async execution is involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SanityCheckPlan caused by Error during planning:

2 participants