feat: Support Spark `array_contains` builtin function by comphead · Pull Request #20685 · apache/datafusion

comphead · 2026-03-03T22:17:41Z

Which issue does this PR close?

Closes Create Spark builtin function array_contains #20611 .

Rationale for this change

What changes are included in this PR?

The Spark function is actual wrapper on top of array_has function. After result is being produced the nulls mask is set respectively for the output indices which correspond to input rows having nulls

Are these changes tested?

Are there any user-facing changes?

mbutrovich

I don't love the needle and haystack naming. I'm not sure if that idiom generalizes to all contributors.

mbutrovich · 2026-03-04T21:21:59Z

datafusion/sqllogictest/test_files/spark/array/array_contains.slt

+2 NULL NULL
+3 false false
+4 NULL NULL
+5 true true


Is that what Spark does? It short circuits and returns true even if there are NULLs in the array? What if the match is after the NULL? Would it return NULL? I thought if any element of the array was NULL the outcome of the expression is NULL.

scala> spark.sql("select array_contains(array(1, null, 2), 2)").show(false) +------------------------------------+ |array_contains(array(1, NULL, 2), 2)| +------------------------------------+ |true | +------------------------------------+ scala> spark.sql("select array_contains(array(1, 2, null), 2)").show(false) +------------------------------------+ |array_contains(array(1, 2, NULL), 2)| +------------------------------------+ |true | +------------------------------------+ scala> spark.sql("select array_contains(array(1, null), 2)").show(false) +---------------------------------+ |array_contains(array(1, NULL), 2)| +---------------------------------+ |null | +---------------------------------+

I think explanation in https://issues.apache.org/jira/browse/SPARK-55749 is very accurately explaining this behavior

on this data Spark run

=== Test Data === +----+----------------+----+ |col1|col2 |col3| +----+----------------+----+ |1 |[1, 2, 3] |10 | |2 |[4, null, 6] |5 | |3 |[7, 8, 9] |10 | |4 |null |1 | |5 |[10, null, null]|10 | +----+----------------+----+ === array_contains(col2, col3) and array_contains(col2, 10) === +----+--------------------------+------------------------+ |col1|array_contains(col2, col3)|array_contains(col2, 10)| +----+--------------------------+------------------------+ |1 |false |false | |2 |null |null | |3 |false |false | |4 |null |null | |5 |true |true | +----+--------------------------+------------------------+

comphead · 2026-03-04T23:40:40Z

I don't love the needle and haystack naming. I'm not sure if that idiom generalizes to all contributors.

it is what currently used in array_has*, position and some other functions. But agree, not sure this is most idiomatic approach

mbutrovich

Thanks @comphead!

Jefffrey · 2026-03-05T02:34:17Z

datafusion/spark/src/function/array/array_contains.rs

+    let haystack = match haystack_arg {
+        ColumnarValue::Array(arr) => Arc::clone(arr),
+        ColumnarValue::Scalar(s) => s.to_array_of_size(result.len())?,
+    };


Suggested change

let haystack = match haystack_arg {

ColumnarValue::Array(arr) => Arc::clone(arr),

ColumnarValue::Scalar(s) => s.to_array_of_size(result.len())?,

};

let haystack = haystack_arg.to_array_of_size(result.len())?;

Jefffrey · 2026-03-05T02:43:21Z

datafusion/spark/src/function/array/array_contains.rs

+    let old_validity = match result.nulls() {
+        Some(n) => n.inner().clone(),
+        None => BooleanBuffer::new_set(result.len()),
+    };
+    let new_validity = &old_validity & &(!&nullify_mask);


Suggested change

let old_validity = match result.nulls() {

Some(n) => n.inner().clone(),

None => BooleanBuffer::new_set(result.len()),

};

let new_validity = &old_validity & &(!&nullify_mask);

let new_validity = match result.nulls() {

Some(n) => n.inner() & &(!&nullify_mask),

None => !&nullify_mask,

};

Jefffrey · 2026-03-05T02:44:17Z

datafusion/spark/src/function/array/array_contains.rs

+            let buf = builder.finish();
+            Some(mask_with_list_nulls(buf, list.nulls()))
+        }
+        _ => None,


Probably better to just error/panic here to make it clear this None path is unreachable?

comphead · 2026-03-05T15:56:52Z

Thanks @Jefffrey and @mbutrovich for the review

## Which issue does this PR close?  - Closes apache#20611 . ## Rationale for this change  ## What changes are included in this PR? The Spark function is actual wrapper on top of `array_has` function. After result is being produced the nulls mask is set respectively for the output indices which correspond to input rows having nulls  ## Are these changes tested?  ## Are there any user-facing changes?   (cherry picked from commit 953bdf4)

github-actions bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion sqllogictest SQL Logic Tests (.slt) spark labels Mar 3, 2026

feat: Support Spark array_contains

267f9c9

comphead force-pushed the spark branch from 6fcba6f to 267f9c9 Compare March 3, 2026 22:22

github-actions bot removed documentation Improvements or additions to documentation development-process Related to development process of DataFusion labels Mar 3, 2026

comphead mentioned this pull request Mar 3, 2026

Create Spark builtin function array_contains #20611

Closed

This comment was marked as duplicate.

Sign in to view

feat: Support Spark array_contains

c9d9e0d

This was referenced Mar 4, 2026

Use datafusion-spark SparkArrayContains for three-valued NULL semantics apache/datafusion-comet#3630

Draft

fix: Correct array_contains behavior for Spark-style null semantics apache/datafusion-comet#3196

Draft

mbutrovich self-requested a review March 4, 2026 21:15

mbutrovich reviewed Mar 4, 2026

View reviewed changes

feat: Support Spark array_contains

43a5410

comphead requested a review from mbutrovich March 4, 2026 23:35

mbutrovich approved these changes Mar 5, 2026

View reviewed changes

Jefffrey reviewed Mar 5, 2026

View reviewed changes

feat: Support Spark array_contains

35e80c0

Jefffrey approved these changes Mar 5, 2026

View reviewed changes

comphead added this pull request to the merge queue Mar 5, 2026

Merged via the queue into apache:main with commit 953bdf4 Mar 5, 2026
30 checks passed

comphead mentioned this pull request Mar 5, 2026

[branch 53]: Backport Spark array_contains builtin function (#20685) #20727

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support Spark `array_contains` builtin function#20685

feat: Support Spark `array_contains` builtin function#20685
comphead merged 4 commits intoapache:mainfrom
comphead:spark

comphead commented Mar 3, 2026 •

edited

Loading

Uh oh!

This comment was marked as duplicate.

Uh oh!

mbutrovich left a comment

Uh oh!

mbutrovich Mar 4, 2026

Uh oh!

comphead Mar 4, 2026

Uh oh!

comphead Mar 4, 2026

Uh oh!

comphead commented Mar 4, 2026

Uh oh!

mbutrovich left a comment

Uh oh!

Jefffrey Mar 5, 2026

Uh oh!

Jefffrey Mar 5, 2026

Uh oh!

Jefffrey Mar 5, 2026

Uh oh!

comphead commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

comphead commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

This comment was marked as duplicate.

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

mbutrovich Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

comphead Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

comphead Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

comphead commented Mar 4, 2026

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

Jefffrey Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

comphead commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

comphead commented Mar 3, 2026 •

edited

Loading