Skip to content

Add array eval copy API#822

Draft
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/arr-copy-eval-api
Draft

Add array eval copy API#822
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/arr-copy-eval-api

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 5, 2026

Motivation

#821 adds slice views, but stdlib consumers still need one narrow way to bulk-copy arrays without each call site reimplementing directBackingArray / range / view branches.

This PR adds a shared Eval copy API that preserves laziness and gives fully-consumed array paths a simple JVM/Scala Native friendly implementation.

Constraints:

  • copy references, not values
  • do not force thunks
  • keep loops simple enough for HotSpot and Scala Native
  • avoid widening the API beyond current consumers
  • keep the PR stacked and reviewable on top of Add lazy slice array view #821

Modification

Stacked on #821.

Add a narrow Arr.copyEvalTo API for bulk copying Eval references without forcing Val values.

This PR routes these consumers through the API:

  • std.flattenArrays
  • array std.flatMap
  • array-separator std.join

Specialized copy paths cover concat, repeat, slice, reversed lazy views, range, and byte arrays.

Result

Verification passed:

  • ./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
  • ./mill --no-server 'sjsonnet.jvm[2.13.18].compile'
  • ./mill --no-server 'sjsonnet.jvm[2.12.21].compile'
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].test'
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
  • ./mill --no-server bench.checkFormat
  • ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'

JMH, JVM harness, compared with #821 slice baseline:

Benchmark Before After
array_copy_views 16.871 ms/op 13.937 ms/op
realistic2 41.292 ms/op 41.149 ms/op
lazy_array_slice_remove 1.070 ms/op 1.067 ms/op
large_string_join 0.541 ms/op 0.539 ms/op

Scala Native hyperfine, compared with #821 slice baseline, using Scala Native binaries, not JVM jars:

Benchmark Before After
array_copy_views 26.1 ms +/- 0.4 ms 10.9 ms +/- 0.3 ms

copy-api ran 2.39 +/- 0.08 times faster than slice-baseline.

External performance diff, against jrsonnet built from source at 80cd36a with cargo build --release -p jrsonnet (jrsonnet 0.5.0-pre98):

Benchmark sjsonnet Scala Native (#822) source-built jrsonnet Result
array_copy_views 10.5 ms +/- 0.2 ms 14.2 ms +/- 0.4 ms sjsonnet 1.35 +/- 0.05x faster

JIT / GC review:

  • The API copies Eval, not Val, so laziness and callback/error retry behavior stay intact.
  • copyEvalTo(Array[Eval], offset) lets hot consumers write into one preallocated array with indexed while-loops.
  • Concat materialization now copies through the same API, so concat children can provide cheap copy paths without first materializing themselves.
  • Consumer code no longer repeats polymorphic branches at every call site.

Rollback boundary:

  • This PR is infrastructure plus three consumers.
  • It does not change string join sizing, callback invocation, sort, or renderer behavior.
  • If a consumer regresses, it can move back to the previous local copy path while keeping the API for other consumers.

References

He-Pin added 2 commits May 5, 2026 16:16
Motivation:

Avoid copying large array slices and remove/removeAt intermediates after the lazy-array work. This follows jrsonnet's indexed slice-view idea while keeping JVM retention under control for small sub-slices.

Modifications:

- add Val.Arr.sliced and SliceArr for large or compact-source slices
- route array slicing and std.remove/removeAt through slice/concat views
- let large concat decisions use total length, with overflow protection
- add correctness coverage and a slice/remove benchmark resource

Results:

- ./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
- ./mill --no-server 'sjsonnet.jvm[2.13.18].compile'
- ./mill --no-server 'sjsonnet.jvm[2.12.21].compile'
- ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests
- ./mill --no-server 'sjsonnet.jvm[3.3.7].test'
- ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
- ./mill --no-server bench.checkFormat
- JMH runRegressions: lazy_array_slice_remove 5.890 -> 1.089 ms/op
- hyperfine macro slice/remove: 498.6 ms -> 335.5 ms
Motivation:

Several stdlib consumers fully copy array elements after the lazy-array work. Centralizing that path avoids repeated directBackingArray/range/view branches and lets concat, repeat, slice, range, and byte arrays expose cheap bulk Eval copies without forcing Val values.

Modifications:

- add Arr.copyEvalTo overloads for ArrayBuilder and preallocated Array[Eval]
- teach concat materialization/eager concat to copy through the new API
- add specialized copy implementations for repeat, slice, reversed lazy views, range, and byte arrays
- route std.flattenArrays, array flatMap, and array-separator std.join through the API
- add correctness coverage and an array_copy_views regression benchmark

Results:

- ./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
- ./mill --no-server 'sjsonnet.jvm[2.13.18].compile'
- ./mill --no-server 'sjsonnet.jvm[2.12.21].compile'
- ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests
- ./mill --no-server 'sjsonnet.jvm[3.3.7].test'
- ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
- ./mill --no-server bench.checkFormat
- ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'
- JMH runRegressions vs slice baseline: array_copy_views 16.871 -> 13.937 ms/op
- Scala Native hyperfine vs slice baseline: array_copy_views 26.1 ms -> 10.9 ms, 2.39x faster
@He-Pin He-Pin marked this pull request as draft May 5, 2026 09:14
@He-Pin He-Pin mentioned this pull request May 5, 2026
stephenamar-db pushed a commit that referenced this pull request May 7, 2026
## Motivation

After #814, sjsonnet has lazy array views for large stdlib arrays, but
array slicing and `std.remove` / `std.removeAt` still had paths that
eagerly allocated new `Array[Eval]` backing arrays.

This PR adds a focused slice view so large slices avoid copying unless
fully materialized, while keeping JVM/Scala Native behavior
conservative.

Constraints:

- keep Jsonnet indexed laziness: `length`, `eval(i)`, and `value(i)`
stay O(1)
- do not force elements while slicing
- keep small slices eager to avoid retaining large sources
- prevent deep concat trees
- keep hot paths JIT/GC friendly

## Modification

Add a lazy `SliceArr` view and route `Arr.sliced(...)` through it when
the slice is large enough or the source is already compact/view-backed.

Changed behavior:

- array slicing can return `SliceArr` instead of eagerly copying
`Array[Eval]`
- `std.remove` and `std.removeAt` reuse slice + concat views
- compact sources (`RangeArr`, `ByteArr`, lazy indexed arrays, repeat,
slice) can slice as O(1) views
- flat/reversed/concat arrays only use a view when the slice is large
enough to justify source retention
- concat still caps tree depth by flattening when either side is already
a concat view

## Result

Verification passed:

- `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'`
- `./mill --no-server 'sjsonnet.jvm[2.13.18].compile'`
- `./mill --no-server 'sjsonnet.jvm[2.12.21].compile'`
- `./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly'
sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests`
- `./mill --no-server 'sjsonnet.jvm[3.3.7].test'`
- `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'`
- `./mill --no-server bench.checkFormat`
- `./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'`

JMH, full JVM regression sweep, compared with `upstream/master` (lower
is better, notable result only):

- Baseline: `upstream/master` at `b4c667d5`
- PR head: `3bde215b`
- Command: `./mill --no-server bench.runRegressions`
- Full sweep covered 45 regression inputs; non-target movement was
noise-level, so only the targeted slice/remove case is listed.

| Benchmark | Before | After | Result |
| --- | ---: | ---: | ---: |
| `lazy_array_slice_remove` | 5.805 ms/op | 1.088 ms/op | 5.34x faster |

Scala Native hyperfine, full regression-input sweep, compared with
`upstream/master` (lower is better, notable result only):

- Binaries: `./mill --no-server show
'sjsonnet.native[3.3.7].nativeLink'`
- Command shape: `hyperfine --warmup 5 --min-runs 20 -N --output=null
...`
- Full sweep covered the same 45 regression inputs; `bench.07` was run
with `ulimit -s 65520` for both sides because the native binary needs a
larger process stack for that input.

| Benchmark | Before | After | Result |
| --- | ---: | ---: | ---: |
| `lazy_array_slice_remove` | 13.2 +/- 0.4 ms | 5.89 +/- 0.32 ms | 2.24x
faster |

External performance diff, against jrsonnet built from source at
`80cd36a` with `cargo build --release -p jrsonnet` (`jrsonnet
0.5.0-pre98`):

| Benchmark | sjsonnet Scala Native (#821) | source-built jrsonnet |
Result |
| --- | ---: | ---: | --- |
| `lazy_array_slice_remove` | 5.8 ms +/- 0.2 ms | 7.0 ms +/- 0.2 ms |
sjsonnet 1.21 +/- 0.05x faster |

JIT / GC review:

- `SliceArr` preserves indexed laziness: `eval(i)` returns an `Eval`;
`value(i)` forces only the requested element.
- Materializing a slice releases the source reference, so long-lived
fully-consumed slices do not keep the original array alive.
- Large slices avoid allocating and copying `Array[Eval]`; small slices
still copy to avoid source-retention overhead.
- `std.remove` / `std.removeAt` reuse slice and concat views, avoiding
large intermediate arrays.
- Concat depth remains bounded.

Rollback boundary:

- This PR only changes slice/remove array representation.
- If a retained-source workload regresses, the slice threshold is the
rollback lever without changing the public API.

## References

- Builds on #814 lazy array architecture.
- Follow-up stack: #822 adds shared `Arr.copyEvalTo`; #823 presizes
selected copy consumers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant