Skip to content

perf: Add append_with to string builders, use in replace#22029

Open
neilconway wants to merge 1 commit intoapache:mainfrom
neilconway:neilc/perf-builder-append-writer
Open

perf: Add append_with to string builders, use in replace#22029
neilconway wants to merge 1 commit intoapache:mainfrom
neilconway:neilc/perf-builder-append-writer

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

This PR adds two new APIs to GenericStringArrayBuilder and StringViewArrayBuilder:

  1. append_with appends a row whose bytes are produced by invoking a closure that is passed a StringWriter
  2. append_byte_map appends a row whose bytes are produced by mapping each byte of the input with a byte-to-byte map closure.

For StringViewArrayBuilder, StringWriter is an append-only string writer that switches between writing to a new inline view (for short strings) or to the in-progress data block automatically. For GenericStringArrayBuilder, StringWriter just appends to the value buffer directly.

(We need two new APIs because append_byte_map vectorizes a lot better than append_with, so callers that fit the byte-to-byte map pattern should prefer it.)

Both of these new APIs allow string UDFs to avoid creating an intermediate data copy in many cases. To illustrate this, this PR adopts the new APIs in replace.

Benchmarks (Arm64):

ASCII single-byte from (byte-map path)

  • size=1024 str_len=32 nulls=0 : 15.2 µs -> 12.7 µs (−16.4%)
  • size=1024 str_len=32 nulls=0.2 : 13.8 µs -> 12.0 µs (−13.1%)
  • size=1024 str_len=128 nulls=0 : 10.8 µs -> 8.0 µs (−26.6%)
  • size=1024 str_len=128 nulls=0.2 : 10.6 µs -> 7.7 µs (−27.0%)
  • size=4096 str_len=32 nulls=0 : 59.7 µs -> 48.4 µs (−18.9%)
  • size=4096 str_len=32 nulls=0.2 : 53.0 µs -> 46.1 µs (−13.0%)
  • size=4096 str_len=128 nulls=0 : 40.7 µs -> 30.7 µs (−24.6%)
  • size=4096 str_len=128 nulls=0.2 : 38.8 µs -> 28.0 µs (−27.9%)

Multi-byte from, StringArray (Writer general path)

  • size=1024 str_len=32 nulls=0 : 24.4 µs -> 20.9 µs (−14.5%)
  • size=1024 str_len=32 nulls=0.2 : 19.0 µs -> 16.6 µs (−12.7%)
  • size=1024 str_len=128 nulls=0 : 39.8 µs -> 34.5 µs (−13.4%)
  • size=1024 str_len=128 nulls=0.2 : 31.2 µs -> 28.0 µs (−10.1%)
  • size=4096 str_len=32 nulls=0 : 99.4 µs -> 83.6 µs (−15.9%)
  • size=4096 str_len=32 nulls=0.2 : 78.2 µs -> 67.6 µs (−13.5%)
  • size=4096 str_len=128 nulls=0 : 180.9 µs -> 160.3 µs (−11.4%)
  • size=4096 str_len=128 nulls=0.2 : 137.4 µs -> 124.3 µs (−9.5%)

Multi-byte from, StringViewArray (Writer general path)

  • size=1024 str_len=32 nulls=0 : 24.7 µs -> 21.2 µs (−14.0%)
  • size=1024 str_len=32 nulls=0.2 : 19.4 µs -> 17.0 µs (−12.3%)
  • size=1024 str_len=128 nulls=0 : 39.6 µs -> 34.7 µs (−12.6%)
  • size=1024 str_len=128 nulls=0.2 : 31.9 µs -> 28.3 µs (−11.0%)
  • size=4096 str_len=32 nulls=0 : 100.1 µs -> 84.0 µs (−16.1%)
  • size=4096 str_len=32 nulls=0.2 : 79.9 µs -> 69.7 µs (−12.9%)
  • size=4096 str_len=128 nulls=0 : 177.5 µs -> 158.1 µs (−10.9%)
  • size=4096 str_len=128 nulls=0.2 : 139.3 µs -> 127.3 µs (−8.6%)

What changes are included in this PR?

  • Add append_byte_map and append_with to both of the bulk-NULL string builders
  • Add unit tests
  • Adopt the new APIs in replace

Are these changes tested?

Yes; new tests added.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the functions Changes to functions implementation label May 5, 2026
@neilconway
Copy link
Copy Markdown
Contributor Author

neilconway commented May 5, 2026

Other places where these APIs should be useful:

  • initcap
  • lower, upper: at least for the Unicode code path; for ASCII, we might not beat the hand-optimized code added in perf: Optimize lower, upper for ASCII inputs #21980
  • translate
  • reverse (might need a slightly different API)
  • to_char (might need a small API extension)
  • lpad, rpad (needs a closer look)

If we make the builders accessible outside the current crate, some of the Spark functions could use these APIs, as well as || for Utf8View values.

@neilconway
Copy link
Copy Markdown
Contributor Author

neilconway commented May 5, 2026

My initial plan was to have an API where the closure is passed a caller-sized byte slice. That has two shortcomings:

  1. caller needs to size the byte-slice in advance
  2. for efficiency, we can't initialize the contents of the slice, so (a) this needs unsafe code (b) the closure must be careful to write to EXACTLY the specified number of bytes, no more and no less.

That seemed like a footgun, so I started with these safer APIs instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce StringViewArrayBuilder::map to avoid duplication

1 participant