Skip to content

perf: Optimize comparison on nested types#20716

Open
neilconway wants to merge 4 commits intoapache:mainfrom
neilconway:neilc/optimize-struct-cmp-op
Open

perf: Optimize comparison on nested types#20716
neilconway wants to merge 4 commits intoapache:mainfrom
neilconway:neilc/optimize-struct-cmp-op

Conversation

@neilconway
Copy link
Contributor

@neilconway neilconway commented Mar 5, 2026

Which issue does this PR close?

N/A

Rationale for this change

BooleanBuffer::collect_bool() is faster than map().collect(). Per discussion on #20694; originally suggested by @Dandandan.

What changes are included in this PR?

  • Implement optimization
  • Add benchmark for nested struct comparison

Are these changes tested?

Yes, covered by existing tests.

Are there any user-facing changes?

No.

AI usage

Multiple AI tools were used to iterate on this PR. I have reviewed and understand the resulting code.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 5, 2026
@neilconway
Copy link
Contributor Author

Benchmarks:

  group                          base                                   target
  -----                          ----                                   ------
  compare_nested array_array     1.17     52.5±0.78µs        ? ?/sec    1.00     44.8±0.27µs        ? ?/sec
  compare_nested array_scalar    1.12     50.1±0.38µs        ? ?/sec    1.00     44.6±0.30µs        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thanks @neilconway

I do wonder why we can't update collect to call collect_bool internally 🤔

@alamb alamb added the performance Make DataFusion faster label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants