Presize array copy consumers by He-Pin · Pull Request #823 · databricks/sjsonnet

He-Pin · 2026-05-05T08:56:21Z

Motivation

#822 gives consumers a cheap Eval copy API, but std.flattenArrays and array-separator std.join can still pay ArrayBuilder growth/copy costs when the outer array has a modest number of large child arrays.

This PR adds a guarded two-pass pre-size path for those consumers. The goal is to remove avoidable intermediate allocation in few-large-array workloads without regressing many-small-array workloads.

Constraints:

do not force element values
avoid many-small-array regressions
guard total length before allocation
keep hot paths as straight indexed loops
keep this PR narrowly stacked on Add array eval copy API #822

Modification

Stacked on #822.

Use Arr.copyEvalTo to presize high-volume array-copy consumers:

std.flattenArrays
array-separator std.join

The pre-sized path uses two linear scans only when the outer part count is modest (<= 1024). Large outer arrays fall back to the one-pass ArrayBuilder + copyEvalTo path from #822.

Result

Verification passed:

./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
./mill --no-server 'sjsonnet.jvm[2.13.18].compile'
./mill --no-server 'sjsonnet.jvm[2.12.21].compile'
./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.Std0150FunctionsTests sjsonnet.ValArrayViewTests
./mill --no-server 'sjsonnet.jvm[3.3.7].test'
./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'
git diff --check

JMH, JVM harness, compared with #822 copy-api baseline:

Benchmark	Before	After
`array_copy_views`	13.002 ms/op	8.454 ms/op
`realistic2`	see Native data	see Native data

Scala Native hyperfine, compared with #822 copy-api baseline, using Scala Native binaries, not JVM jars:

Benchmark	Before	After
`array_copy_views`	11.9 ms +/- 1.2 ms	10.5 ms +/- 1.0 ms
many-small fallback	7.0 ms +/- 0.7 ms	6.6 ms +/- 0.5 ms
`realistic2`	82.6 ms +/- 0.8 ms	82.5 ms +/- 0.7 ms

External performance diff, against jrsonnet built from source at 80cd36a with cargo build --release -p jrsonnet (jrsonnet 0.5.0-pre98):

Benchmark	sjsonnet Scala Native (#823)	source-built jrsonnet	Result
`array_copy_views`	9.3 ms +/- 0.2 ms	14.3 ms +/- 0.4 ms	sjsonnet 1.53 +/- 0.06x faster
`realistic2`	79.9 ms +/- 2.2 ms	92.9 ms +/- 1.9 ms	sjsonnet 1.16 +/- 0.04x faster

JIT / GC review:

The second pass copies Eval references into one preallocated Array[Eval]; it does not force element values.
totalLen is accumulated as Long and checked before allocating the final Array[Eval].
PresizedCopyMaxParts = 1024 avoids turning many-small arrays into an always-two-pass workload.
The fallback path preserves Add array eval copy API #822 behavior for large outer arrays.
The hot path is simple counted while-loops plus copyEvalTo, so it stays friendly to JIT inlining and Scala Native codegen.

Rollback boundary:

This PR only changes fully-consumed array-copy consumers.
It does not change string join, renderer, sort, callback invocation, or global array view semantics.
If a workload shows a many-small regression, the threshold can be lowered or the affected consumer can use the Add array eval copy API #822 one-pass path.

References

Builds on Add array eval copy API #822 array eval copy API.

Motivation: Avoid copying large array slices and remove/removeAt intermediates after the lazy-array work. This follows jrsonnet's indexed slice-view idea while keeping JVM retention under control for small sub-slices. Modifications: - add Val.Arr.sliced and SliceArr for large or compact-source slices - route array slicing and std.remove/removeAt through slice/concat views - let large concat decisions use total length, with overflow protection - add correctness coverage and a slice/remove benchmark resource Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server bench.checkFormat - JMH runRegressions: lazy_array_slice_remove 5.890 -> 1.089 ms/op - hyperfine macro slice/remove: 498.6 ms -> 335.5 ms

Motivation: Several stdlib consumers fully copy array elements after the lazy-array work. Centralizing that path avoids repeated directBackingArray/range/view branches and lets concat, repeat, slice, range, and byte arrays expose cheap bulk Eval copies without forcing Val values. Modifications: - add Arr.copyEvalTo overloads for ArrayBuilder and preallocated Array[Eval] - teach concat materialization/eager concat to copy through the new API - add specialized copy implementations for repeat, slice, reversed lazy views, range, and byte arrays - route std.flattenArrays, array flatMap, and array-separator std.join through the API - add correctness coverage and an array_copy_views regression benchmark Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server bench.checkFormat - ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink' - JMH runRegressions vs slice baseline: array_copy_views 16.871 -> 13.937 ms/op - Scala Native hyperfine vs slice baseline: array_copy_views 26.1 ms -> 10.9 ms, 2.39x faster

Motivation: After adding Arr.copyEvalTo, high-volume consumers can avoid ArrayBuilder growth by counting output length first and copying into a single Array[Eval]. This targets small outer arrays that contain large view-backed subarrays, while preserving the one-pass builder path for many-small-array workloads. Modifications: - presize std.flattenArrays when the outer part count is modest - presize array-separator std.join when the outer part count is modest - keep the one-pass ArrayBuilder + copyEvalTo fallback for large part counts Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.Std0150FunctionsTests sjsonnet.ValArrayViewTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink' - JMH runRegressions vs copy-api baseline: array_copy_views 13.002 -> 8.454 ms/op - Scala Native hyperfine vs copy-api baseline: array_copy_views 11.9 ms -> 10.5 ms - Scala Native hyperfine many-small fallback: 7.0 ms -> 6.6 ms - Scala Native hyperfine realistic2: 82.6 ms -> 82.5 ms

He-Pin added 3 commits May 5, 2026 16:16

He-Pin marked this pull request as draft May 5, 2026 09:15

This was referenced May 5, 2026

Add lazy slice array view #821

Open

Add array eval copy API #822

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presize array copy consumers#823

Presize array copy consumers#823
He-Pin wants to merge 3 commits intodatabricks:masterfrom
He-Pin:perf/presized-array-copy-consumers

He-Pin commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Result

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented May 5, 2026 •

edited

Loading