perf: Add append_with to string builders, use in replace#22029
Open
neilconway wants to merge 1 commit intoapache:mainfrom
Open
perf: Add append_with to string builders, use in replace#22029neilconway wants to merge 1 commit intoapache:mainfrom
append_with to string builders, use in replace#22029neilconway wants to merge 1 commit intoapache:mainfrom
Conversation
Contributor
Author
|
Other places where these APIs should be useful:
If we make the builders accessible outside the current crate, some of the Spark functions could use these APIs, as well as |
Contributor
Author
|
My initial plan was to have an API where the closure is passed a caller-sized byte slice. That has two shortcomings:
That seemed like a footgun, so I started with these safer APIs instead. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
StringViewArrayBuilder::mapto avoid duplication #21997 (potentially).Rationale for this change
This PR adds two new APIs to
GenericStringArrayBuilderandStringViewArrayBuilder:append_withappends a row whose bytes are produced by invoking a closure that is passed aStringWriterappend_byte_mapappends a row whose bytes are produced by mapping each byte of the input with a byte-to-byte map closure.For
StringViewArrayBuilder,StringWriteris an append-only string writer that switches between writing to a new inline view (for short strings) or to the in-progress data block automatically. ForGenericStringArrayBuilder,StringWriterjust appends to the value buffer directly.(We need two new APIs because
append_byte_mapvectorizes a lot better thanappend_with, so callers that fit the byte-to-byte map pattern should prefer it.)Both of these new APIs allow string UDFs to avoid creating an intermediate data copy in many cases. To illustrate this, this PR adopts the new APIs in
replace.Benchmarks (Arm64):
ASCII single-byte from (byte-map path)
Multi-byte from, StringArray (Writer general path)
Multi-byte from, StringViewArray (Writer general path)
What changes are included in this PR?
append_byte_mapandappend_withto both of the bulk-NULL string buildersreplaceAre these changes tested?
Yes; new tests added.
Are there any user-facing changes?
No.