Skip to content

Add any_value aggregate function#23043

Open
Kevin-Li-2025 wants to merge 2 commits into
apache:mainfrom
Kevin-Li-2025:kevin/add-any-value-aggregate
Open

Add any_value aggregate function#23043
Kevin-Li-2025 wants to merge 2 commits into
apache:mainfrom
Kevin-Li-2025:kevin/add-any-value-aggregate

Conversation

@Kevin-Li-2025

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

any_value is a common aggregate in SQL engines for queries that need one representative non-null value from each group without imposing an ordering requirement. DataFusion currently has first_value, but that aggregate is order-sensitive, so exposing any_value gives users the intended arbitrary-value semantics directly.

What changes are included in this PR?

  • Adds an any_value(expression) aggregate UDF and registers it with the default aggregate functions.
  • Reuses the existing trivial first-value accumulator with nulls ignored, so evaluation short-circuits after the first non-null value.
  • Marks the aggregate as order-insensitive and preserves the input field metadata/type in the return field.
  • Adds sqllogictest coverage for scalar, grouped, all-null, empty-input, and string return-type cases.

Are these changes tested?

Yes. I ran:

cargo fmt --all
cargo test -p datafusion-functions-aggregate
cargo test -p datafusion-sqllogictest --test sqllogictests -- aggregate_any_value.slt
cargo clippy --all-targets --all-features -- -D warnings

Are there any user-facing changes?

Yes. This adds a new SQL aggregate function, any_value.

I used AI assistance to help inspect the codebase and run validation, and I reviewed the resulting implementation and tests.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 19, 2026

@Jefffrey Jefffrey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +27 to +39
query I
SELECT any_value(column2) FROM any_value_test;
----
10

query IIT rowsort
SELECT column1, any_value(column2), any_value(column3)
FROM any_value_test
GROUP BY column1;
----
1 10 first
2 NULL NULL
3 30 third

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these two tests technically deterministic?

# under the License.

statement ok
CREATE TABLE any_value_test AS VALUES

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe lets add a test for all nulls column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add any_value aggregate function

2 participants