Skip to content

feat: implement IsNotNull expression in vortex expression library#6969

Open
xiaoxuandev wants to merge 1 commit intovortex-data:developfrom
xiaoxuandev:fix-6040
Open

feat: implement IsNotNull expression in vortex expression library#6969
xiaoxuandev wants to merge 1 commit intovortex-data:developfrom
xiaoxuandev:fix-6040

Conversation

@xiaoxuandev
Copy link

Summary

Closes: #6040

Add a first-class IsNotNull scalar function, replacing the previous Not(IsNull(...)) composition pattern. This simplifies the expression tree and enables direct stat_falsification for zone map pruning.

Changes:

New is_not_null.rs with ScalarFnVTable implementation, including stat_falsification using is_constant && null_count > 0 (with TODO for future RowCount stat)
Updated all integration points: DataFusion, DuckDB, Python/Substrait to use is_not_null(...) directly
Replaced the Not(IsNull(...)) fallback in erased.rs validity with IsNotNull
Registered IsNotNull in ScalarFnSession and ExprBuiltins/ArrayBuiltins

AI Assistance Disclosure

This PR was developed with AI assistance (Kiro). AI was used for code review, implementing stat_falsification, writing tests, and drafting the PR description. All output was reviewed and validated by the author.

API Changes
New public APIs:

vortex_array::expr::is_not_null(child) — creates an IsNotNull expression
Expression::is_not_null() / ArrayRef::is_not_null() via ExprBuiltins/ArrayBuiltins traits
Python: vortex._lib.expr.is_not_null(child)

Testing

9 unit tests covering: return dtype, child replacement, mixed/all-valid/all-invalid evaluation, struct field access, display formatting, null sensitivity, and stat falsification pruning expression generation.

Add a first-class IsNotNull scalar function instead of composing
Not(IsNull(...)). This simplifies the expression tree, enables direct
stat_falsification for zone map pruning, and updates all integration
points (DataFusion, DuckDB, Python/Substrait).

The stat_falsification uses is_constant && null_count > 0 as an
approximation since there is no RowCount stat yet.

Closes: vortex-data#6040
// Since there is no RowCount stat in the zone map, we approximate using IsConstant:
// if the zone is constant and has any nulls, then all values must be null.
//
// TODO(vortex-6040): Add a RowCount stat to enable the more general falsification:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who is this?

catalog: &dyn StatsCatalog,
) -> Option<Expression> {
// is_not_null is falsified when ALL values are null, i.e. null_count == row_count.
// Since there is no RowCount stat in the zone map, we approximate using IsConstant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the len?

Comment on lines +86 to +89
if let Some(scalar) = child.as_constant() {
return Ok(ConstantArray::new(!scalar.is_null(), args.row_count()).into_array());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unneeded the validity will do this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an IsNotNull expression

2 participants