diff --git a/bench/reports/sjsonnet-vs-jrsonnet-gaps.md b/bench/reports/sjsonnet-vs-jrsonnet-gaps.md new file mode 100644 index 00000000..a6f109df --- /dev/null +++ b/bench/reports/sjsonnet-vs-jrsonnet-gaps.md @@ -0,0 +1,147 @@ +# sjsonnet vs jrsonnet current gap ledger + +This report tracks only locally rechecked gaps where current sjsonnet is still +slower than a source-built jrsonnet reference. The jrsonnet upstream benchmark +document is still useful for broad ranking, but its sjsonnet rows reference older +released builds and must not be treated as current truth without local recheck. + +## Baseline + +| Field | Value | +|---|---| +| sjsonnet base | `upstream/master` at `cedc083b4676be43e01bdd6f6cb5d7f4432d0d32` | +| sjsonnet binary | Scala Native `sjsonnet.native[3.3.7].nativeLink` | +| jrsonnet reference | `origin/master` at `5e8cbcdbc860a616dbd193428f8933dd7532f537`, `cargo build --release -p jrsonnet` | +| benchmark rule | single benchmark process; no concurrent Mill/JMH/hyperfine | + +## Latest confirmed local gaps + +| priority | workload | sjsonnet Native | jrsonnet | gap | status | next direction | +|---:|---|---:|---:|---:|---|---| +| 1 | `bench/resources/cpp_suite/large_string_template.jsonnet` | `8.01-8.17 ms` | `6.0 +/- 1.2 ms` | jrsonnet `~1.34x` faster | improved | ASCII-safe propagation for simple named formats closed the largest local slice; remaining gap is mostly startup plus format/render overhead. | +| 2 | `jrsonnet/tests/realworld/entry-kube-prometheus.jsonnet -J vendor` | `132.09 +/- 2.33 ms` | `85.29 +/- 1.12 ms` | jrsonnet `1.55x` faster | improved | Strict JSON byte import parsing reduced sjsonnet Native time by about 5% locally; remaining gap is mostly materialization/rendering and startup. | + +## Accepted in this session + +| idea | validation | result | +|---|---|---| +| Parse strict `.json` imports from UTF-8 bytes and cache small resolved files as bytes until text is needed | Output equality against jrsonnet and previous sjsonnet on kube-prometheus; focused `PreloaderTests`; Native A/B forward+reverse; focused JMH guards; full `__.test` | kept: kube Native improved from clean `139.4 +/- 2.8 ms` to candidate `132.7 +/- 1.9 ms` forward, and clean `140.3 +/- 2.6 ms` to candidate `132.1 +/- 1.9 ms` reverse. Debug stats parse time improved from prior clean `~88.3ms` to `69.8ms` in the final run. | +| Use in-place quicksort for inline object sorted order when field count is large | `sample` on repeated kube materialization; output equality on kube and `large_string_template`; Native A/B forward+reverse; focused renderer/json tests; `__.checkFormat`; full `__.test` | kept: `sample` reduced `computeSortedInlineOrder` top-stack samples from `164` to `63` and sort-specific samples from `164` to `75`; kube Native improved from `145.3 +/- 3.6 ms` to `140.0 +/- 3.2 ms` forward and from `151.6 +/- 10.2 ms` to `148.9 +/- 3.7 ms` reverse. | +| Preserve ASCII-safe metadata for simple named `%s` format results | Regression tests for safe and unsafe dynamic values; output equality on `large_string_template` and kube-prometheus; JVM test suite; focused JMH guard; Native A/B forward+reverse; jrsonnet reference run | kept: `large_string_template` Native improved in both command orders (`8.64 -> 8.01 ms` forward, `8.65 -> 8.17 ms` reverse). JVM JMH stayed neutral-positive (`0.683 -> 0.677 ms/op`). Kube-prometheus was neutral/noisy, not a target regression. | + +## Historical jrsonnet-doc gaps that are no longer primary local gaps + +| workload | reason | +|---|---| +| Foldl string concat | Prior stacked recheck showed sjsonnet faster than source-built jrsonnet on extracted foldl workloads. | +| Go `std.foldl` | Prior stacked recheck showed sjsonnet faster than source-built jrsonnet. | +| Big object | Prior stacked recheck was effectively neutral; latest focus stays on larger confirmed gaps. | +| `realistic2` | Prior stacked recheck showed sjsonnet faster than source-built jrsonnet. | +| `large_string_join` | Prior local join work closed the jrsonnet gap; keep as guard only. | + +## Rejected in this session + +| idea | validation | result | +|---|---|---| +| Raise nested `ByteBuilder` flush threshold from 8 KiB to 64 KiB | Output equality on large-template and kube; kube Native A/B | negative: clean `144.0 +/- 2.1 ms`, candidate `153.3 +/- 15.5 ms`. | +| Raise nested flush threshold to 16 KiB / 32 KiB | Output equality on kube; forward and reverse hyperfine | unstable/noisy: 32 KiB forward looked `~3%` faster, reverse had clean `141.5 +/- 1.6 ms` faster than candidate `143.7 +/- 1.9 ms`. | +| Fast-path single-part parsed string instead of always calling `mkString` | Output equality on `large_string_template`; forward and reverse hyperfine | unstable/noisy: forward candidate `10.4 +/- 0.6 ms` vs clean `10.6 +/- 1.2 ms`; reverse clean `10.3 +/- 0.7 ms` vs candidate `10.5 +/- 0.8 ms`. | +| Add 4 inline object value-cache slots | Output equality on kube; debug stats; forward and reverse hyperfine; focused JMH guards | not enough: overflows `2452 -> 946`, but Native A/B was neutral (`1.00x` in reverse). | +| Add lazy small overflow cache before HashMap | Output equality on kube; debug stats; hyperfine | negative: overflows `2452 -> 83`, but clean `140.9 +/- 1.4 ms` beat candidate `141.7 +/- 2.3 ms`. | +| Mark strict JSON import objects to skip materializer cycle tracking | Output equality on kube; debug stats; forward and reverse hyperfine | not enough: materialize debug time improved, but Native A/B was only weak positive forward and neutral reverse. | +| Parse strict JSON integers through `ParseUtils.parseIntegralNum` before `toDouble` fallback | Output equality on kube and `large_string_template`; JSON fast-path tests; Native kube forward/reverse A/B | not enough: explicit integral scan regressed parse debug time; `decIndex/expIndex` variant removed the scan but remained noisy. Forward median favored candidate, while reverse median/min favored baseline, so it was reverted. | +| Precheck object keys with `Platform.isAsciiJsonSafe` before direct byte copy | Output equality on kube and `large_string_template`; renderer test covering safe, escaped, and Unicode keys; Native kube forward/reverse A/B | negative: forward median/min were weakly positive but mean was worse; reverse favored baseline across mean/median/min (`141.3/140.2/135.8ms` baseline vs `144.2/142.5/138.1ms` candidate). Reverted. | +| Render short strings by scanning `String.charAt` directly instead of copying to the reusable char buffer first | Output equality on kube and `large_string_template`; renderer/json focused tests; Native kube and `large_string_template` forward/reverse A/B | reject: kube moved weakly positive (`140.75ms` baseline to `139.38ms` candidate forward; `157.39ms` baseline to `147.77ms` candidate reverse), but `large_string_template` regressed/noised negative (`10.99ms` baseline to `14.96ms` candidate forward; `10.27ms` baseline to `10.82ms` candidate reverse). The reusable-buffer `getChars` path remains safer for the large-string priority gap. | +| Mark long strict-JSON imported string values as ASCII-safe during parse | Output equality on kube and `large_string_template`; JVM tests; Native kube forward/reverse A/B | reject: debug stats looked lower in one run, but wall-clock did not hold. Forward was noise-level (`141.19ms` baseline vs `138.64ms` candidate mean, nearly identical median/min), while reverse favored baseline (`134.93ms` baseline vs `137.95ms` candidate). Reverted. | +| Lower parsed Jsonnet string ASCII-safe threshold from `>1024` to `>=128` | Output equality on kube and `large_string_template`; JVM tests; Native kube forward/reverse A/B | reject: the extra parse-time scan did not pay back. Forward favored baseline (`142.00ms` vs `147.34ms` candidate mean), and reverse again favored baseline (`139.74ms` vs `142.20ms` candidate mean). Reverted. | +| Lazily cache computed inline-object sorted order during materialization | Output equality on kube and `large_string_template`; JVM tests; Native kube forward/reverse A/B | reject: reduced a repeated-work sampling hotspot in theory, but single-run kube was not stable-positive. Forward only improved median/min while mean worsened (`137.43ms` baseline vs `143.38ms` candidate); reverse favored baseline mean/median (`134.71/133.77ms` baseline vs `135.61/134.03ms` candidate). Reverted. | +| Native CLI path-only parse cache to avoid file content hashing | JVM tests; Native link; output equality on `null`, kube, and `large_string_template`; Native `null` and kube A/B | reject: skipping `contentHash()` was neutral on `null` and negative/noisy on kube. `null` was effectively unchanged (`4.98ms` baseline vs `4.95ms` candidate mean), while kube favored baseline in both command orders (`141.49/137.67ms` baseline vs `141.87/139.48ms` candidate forward; reverse baseline `138.86/136.33ms` vs candidate `168.07/143.16ms`). Reverted. | +| Switch Native release GC from default Immix to Commix | Mill build check | rejected before benchmarking: the current Mill Scala Native plugin API in this build did not expose `GC.commix` through `scalanativelib.api`, and `scala.scalanative.build.GC` was not on the build script classpath. Reverted rather than guessing further. | +| Reuse parser `_asciiSafe` as a static format safety hint | JVM tests; Native link; output equality on `large_string_template`; Native forward/reverse A/B against the accepted simple-format ASCII-safe candidate | reject: debug stats improved (`parse_time` roughly `5.9ms -> 2.5ms` in one run), but whole-process Native wall-clock regressed in both command orders (`8.23ms` baseline vs `9.47ms` candidate forward; `8.00ms` baseline vs `8.47ms` candidate reverse). Reverted rather than trading true wall-clock performance for better internal counters. | +| Native manual ASCII-safe string-to-byte copy | Native link; output equality on `large_string_template`; Native forward/reverse A/B against the accepted simple-format ASCII-safe candidate | reject: replacing Native `String.getBytes(0, len, dst, dstPos)` with a manual `charAt` loop was much slower in both command orders (`7.89ms` baseline vs `10.78ms` candidate forward; `7.97ms` baseline vs `11.17ms` candidate reverse). Reverted. | +| Append single-character simple format values with `StringBuilder.append(Char)` | JVM tests; Native link; output equality on `large_string_template`; Native forward/reverse A/B against the accepted simple-format ASCII-safe candidate | reject: the single-character branch regressed Native in both command orders (`7.97ms` baseline vs `8.65ms` candidate forward; `7.97ms` baseline vs `8.84ms` candidate reverse). Reverted. | +| Specialize ByteRenderer minified object comma/empty-state handling | JVM compile; Native link; output equality on kube and `large_string_template`; kube and `large_string_template` Native forward/reverse A/B | reject: broad direct-object specialization improved kube weakly (`130.32ms -> 128.31ms` forward; `130.62ms -> 129.74ms` reverse), but `large_string_template` regressed/noised negative (`10.24ms -> 10.45ms` forward; `10.99ms -> 11.27ms` reverse). The sorted-inline-only variant was also unstable, so the generic `flushBuffer`/emptyBits path remains safer. | +| Native-only long ASCII escaped string renderer | Native link; output equality on kube, `large_string_template`, and a long Unicode fallback guard; `large_string_template` Native forward/reverse A/B | reject: avoiding `str.getBytes(UTF_8)` with a Native-only two-pass `charAt` renderer regressed the largest guard in both command orders (`9.96ms -> 11.77ms` forward; `9.56ms -> 10.54ms` reverse). Reverted before kube A/B. | +| Inline small-stack cycle tracking before `IdentityHashMap` overflow | JVM tests; Native link; output equality on kube and `large_string_template`; shallow/deep recursive error equality; kube and `large_string_template` Native forward/reverse A/B | reject: preserving cycle semantics with four inline slots did not pay back. Kube was noise-level, while `large_string_template` regressed in both command orders (`11.59ms -> 12.32ms` forward; `9.67ms -> 10.80ms` reverse). Reverted. | +| Cache quoted object-key bytes in ByteRenderer | JVM compile; Native link; output equality on kube, `large_string_template`, escaped/unicode keys, and long-key fallback; kube and `large_string_template` Native forward/reverse A/B across HashMap, direct-mapped, and capped variants | reject: default HashMap was weakly positive on kube forward but reverse was neutral-to-negative by median; direct-mapped and capped variants regressed/noised negative in reverse. Reverted rather than adding cache lookup/memory risk without stable wall-clock gain. | + +## Current hypothesis + +Large-template remains ratio-priority after the simple-format ASCII-safe win, but +the gap is now much smaller and includes whole-process startup. Kube-prometheus +improved through byte-based strict JSON imports, but source-built jrsonnet is +still about `1.55x` faster. Next work should profile the remaining +materialization/render/startup costs and target a larger structural cost rather +than a single parameter or small cache tweak. + +## 2026-05-13 native-vs-jvm split + +Re-profile shows the `large_string_template` gap is largely a Scala Native +runtime artifact, not an algorithmic gap in the formatter or parser: + +| metric | value | +|---|---| +| JVM JMH (warm) `RegressionBenchmark.main` | `0.873 ms/op` | +| Native cold hyperfine | `~14.5 ms` mean, `9.8 ms` min | +| Native `--debug-stats` (single run, with timing overhead) | parse `12.8ms` + eval `14.6ms` + materialize `2.3ms` | +| jrsonnet native | `5.5 +/- 0.5 ms` | +| sjsonnet Native trivial-startup (`null`) | `~6.6 ms` mean, `5.5 ms` min | +| jrsonnet trivial-startup (`null`) | `~3.9 ms` mean, `2.7 ms` min | + +Implication: ~5.5ms of the sjsonnet 14.5ms is process startup. Actual work +ratio min-to-min is roughly `4.3ms / 2.1ms` = ~2x, not 3.4x. Stream-render to +stdout is already in place (`SjsonnetMainBase.renderNormal` uses `ByteRenderer` +directly to `stdoutStream` in `case None if stdoutStream != null`), so the +final output stage is already byte-streamed. + +The remaining double work is: `Format.format` builds a full ~590KB String into +a `StringBuilder`, then `BaseByteRenderer.visitString` re-scans that String for +JSON escape chars. Removing this double scan requires routing the renderer into +`Format` so format chunks are escaped and emitted as they are produced. That is +a structural cross-cutting change touching the Format ABI and several stdlib +callers; it is not a single-file micro-optimization and warrants explicit user +go-ahead before implementation. + +## 2026-05-13 rejected: visitLongString chunked-char copy + +Rewrote `BaseByteRenderer.visitLongString` to avoid the `str.getBytes(UTF-8)` +allocation by scanning chars directly, copying ASCII runs via +`Platform.copyAsciiStringRangeToBytes` (which wraps `String.getBytes(srcBegin, +srcEnd, dst, dstPos)`), and emitting escapes inline. + +JVM JMH guard on `large_string_template`: + +| variant | iter median (ms/op) | +|---|---:| +| clean baseline (5 runs) | `0.82` (range `0.79-0.92`) | +| chunked-char path (5 runs) | `1.21` (range `1.20-1.24`) | + +Result: **+46% JVM regression**. JVM's intrinsified `String.getBytes(UTF-8)` on +the whole string plus a single SWAR scan is faster than per-chunk +`String.getBytes(srcBegin, srcEnd, dst, dstPos)` calls. The hypothesized Native +gain (skip a ~600KB allocation per long string) was not measured, but the +shared-code JVM cost makes the change unshippable per PR-rule-#19 (no +regression). Rejected without Native A/B; pursuing platform-gating would add +complexity disproportionate to the unproven benefit. + +## 2026-05-13 rejected: lazy simple-named format byte rendering + +Explored a structural version of "Format renders directly to bytes" for large +`%(key)s` object format strings. The implementation kept `Format.format` +unchanged for string semantics, added a lazy `Val.Str` representation only for +large simple-named formats, forced object key lookups up front to preserve error +timing, and taught `ByteRenderer` to render pre-escaped format pieces directly. + +Three variants were tried: + +| variant | JVM JMH `large_string_template` | Native forward A/B | Native reverse A/B | decision | +|---|---:|---:|---:|---| +| per-static-chunk escaped byte arrays | `0.81-0.85 ms/op` | baseline `10.05ms`, candidate `10.40ms` | baseline `10.53ms`, candidate `10.94ms` | reject | +| flat static byte buffer + offsets | `0.73-0.74 ms/op` | baseline `10.23ms`, candidate `10.39ms` | baseline `10.40ms`, candidate `10.61ms` | reject | +| flat static bytes + pre-escaped dynamic bytes | `0.73-0.78 ms/op` | baseline `10.070ms`, candidate `10.047ms` | baseline `9.890ms`, candidate `10.226ms` | reject | + +Conclusion: this direction improves warm JVM JMH but does not improve the +Scala Native whole-process target. The extra Native work to pre-escape and +retain byte slices offsets the avoided final `StringBuilder`/renderer scan, and +the only positive Native run was within noise and reversed when command order +changed. Code was reverted; no runtime optimization retained. diff --git a/bench/reports/sync-points.md b/bench/reports/sync-points.md new file mode 100644 index 00000000..70ab8b27 --- /dev/null +++ b/bench/reports/sync-points.md @@ -0,0 +1,59 @@ +# Performance sync points + +This file tracks current performance migration and exploration work so the same +idea is not repeated without new evidence. + +## Active baselines + +| Area | Ref | Notes | +|---|---|---| +| upstream/master | `cedc083b4676be43e01bdd6f6cb5d7f4432d0d32` | Clean base used for current local rechecks. | +| jrsonnet | `5e8cbcdbc860a616dbd193428f8933dd7532f537` | Source-built with `cargo build --release -p jrsonnet`. | + +## Current confirmed gaps + +| workload | status | report | +|---|---|---| +| `large_string_template` | improved by simple-format ASCII-safe propagation; jrsonnet still `~1.34x` faster | `bench/reports/sjsonnet-vs-jrsonnet-gaps.md` | +| kube-prometheus realworld | improved by strict JSON byte import parsing; jrsonnet still `1.55x` faster | `bench/reports/sjsonnet-vs-jrsonnet-gaps.md` | + +## Accepted ideas + +| idea | status | evidence | +|---|---|---| +| Strict JSON byte import parsing | implemented locally; not committed | `Importer.parseJsonImport` uses `ujson.ByteArrayParser`; `CachedResolvedFile` caches small files as bytes and lazily decodes text; kube Native A/B improved candidate to `132.7/132.1 ms` vs clean `139.4/140.3 ms`. | +| Hybrid sort for inline object materialization | implemented locally; pending PR | `Materializer.computeSortedInlineOrder` keeps insertion sort for ≤16 visible fields and uses in-place quicksort for larger inline objects. Native kube A/B on top of strict JSON bytes improved forward `145.3 -> 140.0 ms` and reverse `151.6 -> 148.9 ms`; output equality and full `__.test` passed. | +| Simple named format ASCII-safe propagation | implemented locally; pending PR | `Format.PartialApplyFmt` returns `Val.Str.asciiSafe` when all static format literals and simple named dynamic values are JSON-string ASCII-safe. Native `large_string_template` improved in both command orders (`8.64 -> 8.01 ms`, `8.65 -> 8.17 ms`); JVM JMH stayed neutral-positive (`0.683 -> 0.677 ms/op`). | + +## Rejected ideas + +| idea | reason | +|---|---| +| Nested byte-buffer flush threshold 16/32/64 KiB | Not stable positive under same-run forward/reverse Native A/B. | +| Single-part parsed string fast path | Not stable positive under same-run forward/reverse Native A/B. | +| 4-slot object value cache | Reduced overflow count but produced only neutral Native wall-clock results. | +| Lazy small overflow cache before HashMap | Reduced overflow count further but regressed Native wall-clock. | +| Strict JSON object cycle-check skip marker | Debug stats improved, but same-run Native A/B was not stable enough to keep. | +| visitLongString char/range-copy path | Stable JVM JMH regression on `large_string_template` (`~0.82ms` baseline to `~1.21ms` candidate); rejected before Native A/B. | +| Lazy simple-named format byte rendering | Three structural variants improved/held JVM JMH but were neutral-to-negative on Scala Native whole-process `large_string_template`; code reverted. | +| Strict JSON integer parse via `ParseUtils.parseIntegralNum` | Tried both an explicit integral scan and the parser-provided `decIndex/expIndex` fast path. Output stayed identical, but kube Native A/B was not stable-positive; reverse median/min favored the existing `toString.toDouble` path. | +| ByteRenderer ASCII-safe object key precheck | Replaced direct key rendering with `Platform.isAsciiJsonSafe` + low-byte copy for safe keys. Output stayed identical, but kube Native reverse A/B favored the existing short-string renderer across mean/median/min. | +| Direct `String.charAt` scan in `visitShortString` | Avoided the reusable `getChars` temp-buffer copy. Output stayed identical and kube Native improved weakly, but `large_string_template` regressed/noised negative in both command orders, so the existing reusable-buffer renderer path was restored. | +| Long strict-JSON imported string values marked ASCII-safe during parse | Mirrored the large Jsonnet string literal optimization for `.json` imports. Output stayed identical, but kube Native reverse A/B favored baseline, so the parse-time scan was removed. | +| Lower parsed Jsonnet string ASCII-safe threshold to `>=128` | Tried to align parser marking with ByteRenderer's long-string cutoff. Output stayed identical, but the parse-time scan regressed kube Native in both command orders. | +| Lazy materialization-time cache for inline-object sorted order | Stored `computeSortedInlineOrder` results back on `Val.Obj` when absent. Output stayed identical, but real kube Native single-run A/B was neutral-to-negative, so the lazy write was removed. | +| Native CLI path-only parse cache | Avoided `ResolvedFile.contentHash()` for the Native CLI to bypass SHA-256/OpenSSL provider work. It linked and preserved output, but Native wall-clock was neutral on `null` and negative/noisy on kube, so the default content-hash cache was restored. | +| Native GC switch to Commix | Attempted to set `nativeGC` to Commix in Mill. Build script compilation failed because the GC API was not exposed on the current Mill build classpath, so the config experiment was reverted. | +| Parser `_asciiSafe` hint for static format safety | Reused the parser's large-string ASCII-safe marker to avoid re-scanning static format literals. Debug stats improved, but Native whole-process `large_string_template` regressed in both command orders, so the hint path was removed. | +| Native manual ASCII-safe string-to-byte copy | Replaced `String.getBytes(0, len, dst, dstPos)` with a manual `charAt` loop for known ASCII-safe strings. Native `large_string_template` regressed heavily in both command orders, so the platform copy stays on `getBytes`. | +| Single-character append in simple format loop | Branched the single-label simple format path to call `StringBuilder.append(Char)` when the dynamic value length is one. Native `large_string_template` regressed in both command orders, so the existing `append(String)` loop remains. | +| ByteRenderer minified object comma path | Specialized direct/generic object rendering to manage comma/empty state locally for minified JSON. Output stayed identical and kube improved weakly, but `large_string_template` regressed/noised negative in both command orders, so the generic renderer path was restored. | +| Native-only long ASCII escaped string renderer | Gated a direct `charAt` long-string renderer to Scala Native to avoid UTF-8 byte-array allocation for escaped ASCII strings. Output stayed identical, but `large_string_template` regressed in both command orders, so the UTF-8 encode plus SWAR scan remains the best path. | +| Inline small-stack cycle tracking | Replaced eager `IdentityHashMap` cycle tracking with four inline identity slots plus overflow map while preserving recursive error behavior. Kube was noise-level and `large_string_template` regressed in both command orders, so eager `IdentityHashMap` tracking was restored. | +| ByteRenderer quoted key cache | Cached quoted object-key bytes per renderer using HashMap, direct-mapped, and capped variants. Output stayed identical, but kube reverse A/B was not stable-positive and some variants regressed, so direct key rendering was restored. | + +## Policy + +Before opening a performance PR, rerun focused JMH and Scala Native hyperfine +against the current base and source-built jrsonnet. Keep a change only when the +target benchmark is stable-positive and guard benchmarks do not regress. diff --git a/sjsonnet/src-jvm-native/sjsonnet/CachedResolvedFile.scala b/sjsonnet/src-jvm-native/sjsonnet/CachedResolvedFile.scala index b0d1cd7b..f29f12aa 100644 --- a/sjsonnet/src-jvm-native/sjsonnet/CachedResolvedFile.scala +++ b/sjsonnet/src-jvm-native/sjsonnet/CachedResolvedFile.scala @@ -5,6 +5,7 @@ import fastparse.ParserInput import java.io.File import java.nio.charset.StandardCharsets import java.nio.file.Files +import java.security.MessageDigest /** * A class that encapsulates a resolved import. This is used to cache the result of resolving an @@ -37,17 +38,13 @@ class CachedResolvedFile( s"Resolved import path $resolvedImportPath is too large: ${jFile.length()} bytes > $memoryLimitBytes bytes" ) - private val resolvedImportContent: ResolvedFile = { - // TODO: Support caching binary data - if (jFile.length() > cacheThresholdBytes) { - // If the file is too large, then we will just read it from disk - null - } else if (binaryData) { - StaticBinaryResolvedFile(readRawBytes(jFile)) - } else { - StaticResolvedFile(readString(jFile)) - } - } + private val cachedBytes: Array[Byte] = + if (jFile.length() > cacheThresholdBytes) null + else readRawBytes(jFile) + + private val cachedBinaryContent: ResolvedFile = + if (cachedBytes != null && binaryData) StaticBinaryResolvedFile(cachedBytes) + else null private def readString(jFile: File): String = { new String(Files.readAllBytes(jFile.toPath), StandardCharsets.UTF_8) @@ -55,45 +52,72 @@ class CachedResolvedFile( private def readRawBytes(jFile: File): Array[Byte] = Files.readAllBytes(jFile.toPath) + private lazy val resolvedTextContent: ResolvedFile = + StaticResolvedFile(new String(cachedBytes, StandardCharsets.UTF_8)) + + private lazy val cachedBytesHash: String = + cachedBytes.length.toString + ":" + bytesToHex( + MessageDigest.getInstance("SHA-256").digest(cachedBytes) + ) + + private def bytesToHex(bytes: Array[Byte]): String = { + val hexChars = "0123456789abcdef" + val out = new Array[Char](bytes.length * 2) + var i = 0 + var j = 0 + while (i < bytes.length) { + val b = bytes(i) & 0xff + out(j) = hexChars.charAt(b >>> 4) + out(j + 1) = hexChars.charAt(b & 0x0f) + i += 1 + j += 2 + } + new String(out) + } + /** * A method that will return a reader for the resolved import. If the import is too large, then * this will return a reader that will read the file from disk. Otherwise, it will return a reader * that reads from memory. */ def getParserInput(): ParserInput = { - if (resolvedImportContent == null) { + if (cachedBytes == null) { FileParserInput(jFile) + } else if (binaryData) { + cachedBinaryContent.getParserInput() } else { - resolvedImportContent.getParserInput() + resolvedTextContent.getParserInput() } } override def readString(): String = { - if (resolvedImportContent == null) { + if (cachedBytes == null) { // If the file is too large, then we will just read it from disk readString(jFile) + } else if (binaryData) { + cachedBinaryContent.readString() } else { // Otherwise, we will read it from memory - resolvedImportContent.readString() + resolvedTextContent.readString() } } override def contentHash(): String = { - if (resolvedImportContent == null) { + if (cachedBytes == null) { // If the file is too large, then we will just read it from disk Platform.hashFile(jFile) } else { - resolvedImportContent.contentHash() + cachedBytesHash } } override def readRawBytes(): Array[Byte] = { - if (resolvedImportContent == null) { + if (cachedBytes == null) { // If the file is too large, then we will just read it from disk readRawBytes(jFile) } else { // Otherwise, we will read it from memory - resolvedImportContent.readRawBytes() + cachedBytes } } } diff --git a/sjsonnet/src/sjsonnet/Format.scala b/sjsonnet/src/sjsonnet/Format.scala index 9d272d49..1a86ffb3 100644 --- a/sjsonnet/src/sjsonnet/Format.scala +++ b/sjsonnet/src/sjsonnet/Format.scala @@ -41,6 +41,8 @@ object Format { val literalEnds: Array[Int], /** Non-null when all simple named specs use the same label. */ val singleNamedLabel: String, + /** True when all literal text copied to the output is already JSON-string ASCII-safe. */ + val staticAsciiSafe: Boolean, /** * True when ALL specs are simple `%(key)s` with a named label and no formatting flags. In * this case we can use a fast path that caches the object key lookup and avoids widenRaw @@ -483,6 +485,7 @@ object Format { litStarts, litEnds, singleNamedLabel, + Platform.isAsciiJsonSafe(s), allSimpleNamed ) } @@ -497,6 +500,7 @@ object Format { val emptyStarts = new Array[Int](size) val emptyEnds = new Array[Int](size) var staticChars = leading.length + var staticAsciiSafe = Platform.isAsciiJsonSafe(leading) var hasAnyStar = false var allSimpleNamed = true var idx = 0 @@ -508,6 +512,7 @@ object Format { specs(idx) = formatted.bits literals(idx) = literal staticChars += literal.length + staticAsciiSafe &&= Platform.isAsciiJsonSafe(literal) hasAnyStar ||= formatted.widthStar || formatted.precisionStar allSimpleNamed = false idx += 1 @@ -526,6 +531,7 @@ object Format { emptyStarts, emptyEnds, null, + staticAsciiSafe, allSimpleNamed ) } @@ -556,7 +562,7 @@ object Format { // Super-fast path: all specs are simple %(key)s with an object value. // Avoids per-spec pattern matching, widenRaw, and uses offset-based literal appends. if (parsed.allSimpleNamedString && values0.isInstanceOf[Val.Obj]) { - return formatSimpleNamedString(parsed, values0.asInstanceOf[Val.Obj], pos) + return formatSimpleNamedStringValue(parsed, values0.asInstanceOf[Val.Obj], pos).str } val values = values0 match { @@ -751,34 +757,47 @@ object Format { if (singleSpecNoStatic) singleFormatted else output.toString() } + private[sjsonnet] def formatValue(parsed: RuntimeFormat, values0: Val, pos: Position)(implicit + evaluator: EvalScope): Val.Str = + if (parsed.allSimpleNamedString && values0.isInstanceOf[Val.Obj]) { + formatSimpleNamedStringValue(parsed, values0.asInstanceOf[Val.Obj], pos) + } else { + Val.Str(pos, format(parsed, values0, pos)) + } + /** * Super-fast path for format strings where ALL specs are simple `%(key)s` with a `Val.Obj`. This * avoids per-spec pattern matching, widenRaw overhead, and caches repeated key lookups. For the * large_string_template benchmark (605KB, 256 `%(x)s` interpolations), this eliminates 256 * redundant object lookups and the generic dispatch overhead. */ - private def formatSimpleNamedString(parsed: RuntimeFormat, obj: Val.Obj, pos: Position)(implicit - evaluator: EvalScope): String = { + private def formatSimpleNamedStringValue(parsed: RuntimeFormat, obj: Val.Obj, pos: Position)( + implicit evaluator: EvalScope): Val.Str = { val output = new java.lang.StringBuilder(parsed.staticChars + parsed.specBits.length * 16) + var asciiSafe = parsed.staticAsciiSafe // Append leading literal using offsets if source is available, else use string appendLeading(output, parsed) val singleLabel = parsed.singleNamedLabel if (singleLabel != null) { - val str = simpleStringValue(obj.value(singleLabel, pos)(evaluator).value) + val rawVal = obj.value(singleLabel, pos)(evaluator).value + val str = simpleStringValue(rawVal) + asciiSafe &&= simpleStringValueAsciiSafe(rawVal) var idx = 0 while (idx < parsed.specBits.length) { output.append(str) appendLiteral(output, parsed, idx) idx += 1 } - return output.toString + val result = output.toString + return if (asciiSafe) Val.Str.asciiSafe(pos, result) else Val.Str(pos, result) } // Cache for repeated key lookups: most format strings reuse the same key many times var cachedKey: String = null var cachedStr: String = null + var cachedAsciiSafe = false var idx = 0 while (idx < parsed.specBits.length) { @@ -787,12 +806,16 @@ object Format { // Look up and cache the string value for this key // String.equals already does identity check (eq) internally val str = - if (key == cachedKey) cachedStr - else { + if (key == cachedKey) { + asciiSafe &&= cachedAsciiSafe + cachedStr + } else { val rawVal = obj.value(key, pos)(evaluator).value val s = simpleStringValue(rawVal) cachedKey = key cachedStr = s + cachedAsciiSafe = simpleStringValueAsciiSafe(rawVal) + asciiSafe &&= cachedAsciiSafe s } @@ -803,7 +826,8 @@ object Format { idx += 1 } - output.toString + val result = output.toString + if (asciiSafe) Val.Str.asciiSafe(pos, result) else Val.Str(pos, result) } private def simpleStringValue(rawVal: Val)(implicit evaluator: EvalScope): String = @@ -826,6 +850,13 @@ object Format { value.toString } + private def simpleStringValueAsciiSafe(rawVal: Val): Boolean = + rawVal match { + case vs: Val.Str => vs._asciiSafe + case _: Val.Num | _: Val.True | _: Val.False | _: Val.Null => true + case _ => false + } + private def formatInteger(formatted: FormatSpec, s: Double): String = { // Fast path: if the value fits in a Long (and isn't Long.MinValue where // negation overflows), avoid BigInt allocation entirely @@ -1013,6 +1044,6 @@ object Format { // Each PartialApplyFmt instance caches its own parsed format, so no external cache needed. private val parsed = scanFormat(fmt) def evalRhs(values0: Eval, ev: EvalScope, pos: Position): Val = - Val.Str(pos, format(parsed, values0.value, pos)(ev)) + formatValue(parsed, values0.value, pos)(ev) } } diff --git a/sjsonnet/src/sjsonnet/Importer.scala b/sjsonnet/src/sjsonnet/Importer.scala index ca823389..0ddc7c78 100644 --- a/sjsonnet/src/sjsonnet/Importer.scala +++ b/sjsonnet/src/sjsonnet/Importer.scala @@ -302,7 +302,7 @@ object CachedResolver { try { val visitor = new JsonImportVisitor(fileScope, internedStrings, settings) - Some((ujson.StringParser.transform(content.readString(), visitor), fileScope)) + Some((ujson.ByteArrayParser.transform(content.readRawBytes(), visitor), fileScope)) } catch { case _: ujson.ParsingFailedException | _: DuplicateJsonKey | _: InvalidJsonNumber | _: JsonParseDepthExceeded | _: NumberFormatException => diff --git a/sjsonnet/src/sjsonnet/Materializer.scala b/sjsonnet/src/sjsonnet/Materializer.scala index 9892e3da..bc770d15 100644 --- a/sjsonnet/src/sjsonnet/Materializer.scala +++ b/sjsonnet/src/sjsonnet/Materializer.scala @@ -619,20 +619,66 @@ object Materializer extends Materializer { } i += 1 } - // Insertion sort by key name (optimal for 2-8 elements) - i = 1 - while (i < visCount) { + sortInlineOrder(order, keys, visCount) + order + } + + private def sortInlineOrder(order: Array[Int], keys: Array[String], len: Int): Unit = { + if (len <= 1) return + if (len <= 16) insertionSortInlineOrder(order, keys, 0, len - 1) + else quickSortInlineOrder(order, keys, 0, len - 1) + } + + private def insertionSortInlineOrder( + order: Array[Int], + keys: Array[String], + left: Int, + right: Int): Unit = { + var i = left + 1 + while (i <= right) { val pivotIdx = order(i) val pivotKey = keys(pivotIdx) var j = i - 1 - while (j >= 0 && Util.compareStringsByCodepoint(keys(order(j)), pivotKey) > 0) { + while (j >= left && Util.compareStringsByCodepoint(keys(order(j)), pivotKey) > 0) { order(j + 1) = order(j) j -= 1 } order(j + 1) = pivotIdx i += 1 } - order + } + + private def quickSortInlineOrder( + order: Array[Int], + keys: Array[String], + left0: Int, + right0: Int): Unit = { + var left = left0 + var right = right0 + while (right - left > 16) { + val pivotKey = keys(order((left + right) >>> 1)) + var i = left + var j = right + while (i <= j) { + while (Util.compareStringsByCodepoint(keys(order(i)), pivotKey) < 0) i += 1 + while (Util.compareStringsByCodepoint(keys(order(j)), pivotKey) > 0) j -= 1 + if (i <= j) { + val tmp = order(i) + order(i) = order(j) + order(j) = tmp + i += 1 + j -= 1 + } + } + if (j - left < right - i) { + if (left < j) quickSortInlineOrder(order, keys, left, j) + left = i + } else { + if (i < right) quickSortInlineOrder(order, keys, i, right) + right = j + } + } + insertionSortInlineOrder(order, keys, left, right) } /** diff --git a/sjsonnet/test/src/sjsonnet/FormatTests.scala b/sjsonnet/test/src/sjsonnet/FormatTests.scala new file mode 100644 index 00000000..61085be0 --- /dev/null +++ b/sjsonnet/test/src/sjsonnet/FormatTests.scala @@ -0,0 +1,73 @@ +package sjsonnet + +import sjsonnet.Expr.Member.Visibility +import utest._ + +object FormatTests extends TestSuite { + private val pos = new Position(null, 0) + + private implicit val scope: EvalScope = new EvalScope { + def extVars: String => Option[Expr] = _ => None + def importer: CachedImporter = new CachedImporter(Importer.empty) + def wd: Path = DummyPath() + def visitExpr(expr: Expr)(implicit scope: ValScope): Val = + throw new UnsupportedOperationException("not used") + def materialize(v: Val): ujson.Value = + throw new UnsupportedOperationException("not used") + def equal(x: Val, y: Val): Boolean = x == y + def compare(x: Val, y: Val): Int = 0 + def settings: Settings = Settings.default + def debugStats: DebugStats = null + def trace(msg: String): Unit = () + def warn(e: Error): Unit = () + } + + def tests: Tests = Tests { + test("simple named format preserves ascii-safe result for numeric values") { + val fmt = new Format.PartialApplyFmt("hello %(x)s") + val obj = Val.Obj.mk( + pos, + "x" -> new Val.Obj.ConstMember(add2 = false, Visibility.Normal, Val.Num(pos, 3)) + ) + val result = fmt.evalRhs(obj, scope, pos).asInstanceOf[Val.Str] + result.str ==> "hello 3" + result._asciiSafe ==> true + } + + test("simple named format does not mark unsafe string values ascii-safe") { + val fmt = new Format.PartialApplyFmt("hello %(x)s") + val obj = Val.Obj.mk( + pos, + "x" -> new Val.Obj.ConstMember(add2 = false, Visibility.Normal, Val.Str(pos, "\"")) + ) + val result = fmt.evalRhs(obj, scope, pos).asInstanceOf[Val.Str] + result.str ==> "hello \"" + result._asciiSafe ==> false + } + + test("simple named format does not mark unsafe static literals ascii-safe") { + val fmt = new Format.PartialApplyFmt("hello \"%(x)s") + val obj = Val.Obj.mk( + pos, + "x" -> new Val.Obj.ConstMember(add2 = false, Visibility.Normal, Val.Num(pos, 3)) + ) + val result = fmt.evalRhs(obj, scope, pos).asInstanceOf[Val.Str] + result.str ==> "hello \"3" + result._asciiSafe ==> false + } + + test("simple named format combines ascii-safety across multiple keys") { + val safe = Val.Str.asciiSafe(pos, "safe") + val unsafe = Val.Str(pos, "\\") + val fmt = new Format.PartialApplyFmt("%(safe)s %(unsafe)s %(safe)s") + val obj = Val.Obj.mk( + pos, + "safe" -> new Val.Obj.ConstMember(add2 = false, Visibility.Normal, safe), + "unsafe" -> new Val.Obj.ConstMember(add2 = false, Visibility.Normal, unsafe) + ) + val result = fmt.evalRhs(obj, scope, pos).asInstanceOf[Val.Str] + result.str ==> "safe \\ safe" + result._asciiSafe ==> false + } + } +} diff --git a/sjsonnet/test/src/sjsonnet/PreloaderTests.scala b/sjsonnet/test/src/sjsonnet/PreloaderTests.scala index 9d3bc985..f8b9e081 100644 --- a/sjsonnet/test/src/sjsonnet/PreloaderTests.scala +++ b/sjsonnet/test/src/sjsonnet/PreloaderTests.scala @@ -173,7 +173,8 @@ object PreloaderTests extends TestSuite { class JsonOnlyResolvedFile(content: String) extends ResolvedFile { def getParserInput(): fastparse.ParserInput = throw new RuntimeException("strict JSON should not be parsed with fastparse") - def readString(): String = content + def readString(): String = + throw new RuntimeException("strict JSON should not be decoded as text") def contentHash(): String = content def readRawBytes(): Array[Byte] = content.getBytes(java.nio.charset.StandardCharsets.UTF_8)