Skip to content

feat(array): push struct validity into children#8589

Open
miniex wants to merge 1 commit into
vortex-data:developfrom
miniex:feat/push-struct-validity-into-children
Open

feat(array): push struct validity into children#8589
miniex wants to merge 1 commit into
vortex-data:developfrom
miniex:feat/push-struct-validity-into-children

Conversation

@miniex

@miniex miniex commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

push_validity_into_children masks each field with the struct's top-level validity, so a row null at the struct level becomes null in every field ({a: 1, b: 2}, NULL -> {a: 1, b: 2}, {a: NULL, b: NULL}), mirroring Arrow's StructArray::flatten. remove_struct_validity drops the top-level validity to non-nullable; otherwise it is kept, and a struct with no top-level nulls is returned unchanged.

Each field is masked via a mask expression (per @gatesn's note on the issue, not the eager compute::mask of #5826). Open question: should this be a StructArray method, or a standalone mask expression in the new operator world?

Closes: #3859

Benchmark

For reference (not committed), vs hand-rolling the same masking without the fast path: with no top-level nulls the fast path is ~5-7x faster (0.26us vs 1.2us at 4 fields, 0.65us vs 4.5us at 16); with nulls the two are equal (~1.7us / ~6.3us), so the method adds no overhead.

Testing

cargo nextest run -p vortex-array passes (drops/preserves validity, intersecting field-level nulls, all-invalid, no-nulls fast path); fmt --all + clippy --all-targets --all-features clean.


I'm Korean, so sorry if any wording reads a little awkward.

add `StructArray::push_validity_into_children`, which masks each field with the
struct's top-level validity so a row null at the struct level becomes null in
every field. `remove_struct_validity` chooses whether to keep the top-level
validity or drop it to non-nullable.

Closes vortex-data#3859

Signed-off-by: Han Damin <miniex@daminstudio.net>
@miniex miniex requested a review from a team June 25, 2026 01:48
@codspeed-hq

codspeed-hq Bot commented Jun 25, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 4 regressed benchmarks
✅ 1581 untouched benchmarks
⏩ 4 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation copy_nullable[65536] 1 ms 1.4 ms -24.28%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 168.8 µs 205.5 µs -17.84%
Simulation copy_non_nullable[65536] 908.3 µs 1,089.3 µs -16.62%
Simulation compact_sliced[(4096, 90)] 750 ns 837.5 ns -10.45%
Simulation chunked_bool_canonical_into[(1000, 10)] 27.1 µs 16.6 µs +63.22%
Simulation chunked_varbinview_canonical_into[(100, 100)] 259.5 µs 224.4 µs +15.66%
Simulation chunked_varbinview_into_canonical[(100, 100)] 306.6 µs 271.2 µs +13.02%
Simulation rebuild_naive 109.3 µs 98.6 µs +10.82%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing miniex:feat/push-struct-validity-into-children (2ecabe7) with develop (15cec3b)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a method to push struct validity into children

1 participant