feat: use aligned slice access during bulk append in SparkUnsafeArray#4672
feat: use aligned slice access during bulk append in SparkUnsafeArray#4672sandugood wants to merge 8 commits into
Conversation
|
When running CI pipeline, got few UB. Switched to a less optimal way, which still provided boost in performance. By materializing
|
|
Could somebody potentially trigger the CI run? Thanks in advance |
|
Thanks for picking this up. The shape of the change is exactly what #4626 was asking for: keep the runtime alignment check as load-bearing, and replace the per-element null branch with A few things to look at, the first one is a correctness concern.
I'll approve the CI workflows so the matrix actually exercises the change. Out of curiosity, what was the UB you hit earlier? Worth knowing whether item 1 above was the cause or there's a separate hazard. Disclosure: this review was assisted by an AI tool (Anthropic Claude via Claude Code). I read the diff, the linked issue, and the prior closed PR's review feedback before forming the comments above, and I take responsibility for them. |
Which issue does this PR close?
Closes #4626
Rationale for this change
Currently, there is a bottleneck in performance during bulk append in the
SparkUnsafeArrayimplementation (inmacroand respectiveBuildertypes forbool,date32andtimestamp). If the array isNULLABLEthere is a hotspot:Self::is_null_in_bitset()which is suboptimal (see the benchmarking results below)What changes are included in this PR?
With this PR we change the flow of execution for nullable arrays using native arrow methods:
&[bool]null bitmaskHow are these changes tested?
All of the basic tests pass.