Skip to content
Draft
147 changes: 147 additions & 0 deletions bench/reports/sjsonnet-vs-jrsonnet-gaps.md

Large diffs are not rendered by default.

59 changes: 59 additions & 0 deletions bench/reports/sync-points.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Performance sync points

This file tracks current performance migration and exploration work so the same
idea is not repeated without new evidence.

## Active baselines

| Area | Ref | Notes |
|---|---|---|
| upstream/master | `cedc083b4676be43e01bdd6f6cb5d7f4432d0d32` | Clean base used for current local rechecks. |
| jrsonnet | `5e8cbcdbc860a616dbd193428f8933dd7532f537` | Source-built with `cargo build --release -p jrsonnet`. |

## Current confirmed gaps

| workload | status | report |
|---|---|---|
| `large_string_template` | improved by simple-format ASCII-safe propagation; jrsonnet still `~1.34x` faster | `bench/reports/sjsonnet-vs-jrsonnet-gaps.md` |
| kube-prometheus realworld | improved by strict JSON byte import parsing; jrsonnet still `1.55x` faster | `bench/reports/sjsonnet-vs-jrsonnet-gaps.md` |

## Accepted ideas

| idea | status | evidence |
|---|---|---|
| Strict JSON byte import parsing | implemented locally; not committed | `Importer.parseJsonImport` uses `ujson.ByteArrayParser`; `CachedResolvedFile` caches small files as bytes and lazily decodes text; kube Native A/B improved candidate to `132.7/132.1 ms` vs clean `139.4/140.3 ms`. |
| Hybrid sort for inline object materialization | implemented locally; pending PR | `Materializer.computeSortedInlineOrder` keeps insertion sort for ≤16 visible fields and uses in-place quicksort for larger inline objects. Native kube A/B on top of strict JSON bytes improved forward `145.3 -> 140.0 ms` and reverse `151.6 -> 148.9 ms`; output equality and full `__.test` passed. |
| Simple named format ASCII-safe propagation | implemented locally; pending PR | `Format.PartialApplyFmt` returns `Val.Str.asciiSafe` when all static format literals and simple named dynamic values are JSON-string ASCII-safe. Native `large_string_template` improved in both command orders (`8.64 -> 8.01 ms`, `8.65 -> 8.17 ms`); JVM JMH stayed neutral-positive (`0.683 -> 0.677 ms/op`). |

## Rejected ideas

| idea | reason |
|---|---|
| Nested byte-buffer flush threshold 16/32/64 KiB | Not stable positive under same-run forward/reverse Native A/B. |
| Single-part parsed string fast path | Not stable positive under same-run forward/reverse Native A/B. |
| 4-slot object value cache | Reduced overflow count but produced only neutral Native wall-clock results. |
| Lazy small overflow cache before HashMap | Reduced overflow count further but regressed Native wall-clock. |
| Strict JSON object cycle-check skip marker | Debug stats improved, but same-run Native A/B was not stable enough to keep. |
| visitLongString char/range-copy path | Stable JVM JMH regression on `large_string_template` (`~0.82ms` baseline to `~1.21ms` candidate); rejected before Native A/B. |
| Lazy simple-named format byte rendering | Three structural variants improved/held JVM JMH but were neutral-to-negative on Scala Native whole-process `large_string_template`; code reverted. |
| Strict JSON integer parse via `ParseUtils.parseIntegralNum` | Tried both an explicit integral scan and the parser-provided `decIndex/expIndex` fast path. Output stayed identical, but kube Native A/B was not stable-positive; reverse median/min favored the existing `toString.toDouble` path. |
| ByteRenderer ASCII-safe object key precheck | Replaced direct key rendering with `Platform.isAsciiJsonSafe` + low-byte copy for safe keys. Output stayed identical, but kube Native reverse A/B favored the existing short-string renderer across mean/median/min. |
| Direct `String.charAt` scan in `visitShortString` | Avoided the reusable `getChars` temp-buffer copy. Output stayed identical and kube Native improved weakly, but `large_string_template` regressed/noised negative in both command orders, so the existing reusable-buffer renderer path was restored. |
| Long strict-JSON imported string values marked ASCII-safe during parse | Mirrored the large Jsonnet string literal optimization for `.json` imports. Output stayed identical, but kube Native reverse A/B favored baseline, so the parse-time scan was removed. |
| Lower parsed Jsonnet string ASCII-safe threshold to `>=128` | Tried to align parser marking with ByteRenderer's long-string cutoff. Output stayed identical, but the parse-time scan regressed kube Native in both command orders. |
| Lazy materialization-time cache for inline-object sorted order | Stored `computeSortedInlineOrder` results back on `Val.Obj` when absent. Output stayed identical, but real kube Native single-run A/B was neutral-to-negative, so the lazy write was removed. |
| Native CLI path-only parse cache | Avoided `ResolvedFile.contentHash()` for the Native CLI to bypass SHA-256/OpenSSL provider work. It linked and preserved output, but Native wall-clock was neutral on `null` and negative/noisy on kube, so the default content-hash cache was restored. |
| Native GC switch to Commix | Attempted to set `nativeGC` to Commix in Mill. Build script compilation failed because the GC API was not exposed on the current Mill build classpath, so the config experiment was reverted. |
| Parser `_asciiSafe` hint for static format safety | Reused the parser's large-string ASCII-safe marker to avoid re-scanning static format literals. Debug stats improved, but Native whole-process `large_string_template` regressed in both command orders, so the hint path was removed. |
| Native manual ASCII-safe string-to-byte copy | Replaced `String.getBytes(0, len, dst, dstPos)` with a manual `charAt` loop for known ASCII-safe strings. Native `large_string_template` regressed heavily in both command orders, so the platform copy stays on `getBytes`. |
| Single-character append in simple format loop | Branched the single-label simple format path to call `StringBuilder.append(Char)` when the dynamic value length is one. Native `large_string_template` regressed in both command orders, so the existing `append(String)` loop remains. |
| ByteRenderer minified object comma path | Specialized direct/generic object rendering to manage comma/empty state locally for minified JSON. Output stayed identical and kube improved weakly, but `large_string_template` regressed/noised negative in both command orders, so the generic renderer path was restored. |
| Native-only long ASCII escaped string renderer | Gated a direct `charAt` long-string renderer to Scala Native to avoid UTF-8 byte-array allocation for escaped ASCII strings. Output stayed identical, but `large_string_template` regressed in both command orders, so the UTF-8 encode plus SWAR scan remains the best path. |
| Inline small-stack cycle tracking | Replaced eager `IdentityHashMap` cycle tracking with four inline identity slots plus overflow map while preserving recursive error behavior. Kube was noise-level and `large_string_template` regressed in both command orders, so eager `IdentityHashMap` tracking was restored. |
| ByteRenderer quoted key cache | Cached quoted object-key bytes per renderer using HashMap, direct-mapped, and capped variants. Output stayed identical, but kube reverse A/B was not stable-positive and some variants regressed, so direct key rendering was restored. |

## Policy

Before opening a performance PR, rerun focused JMH and Scala Native hyperfine
against the current base and source-built jrsonnet. Keep a change only when the
target benchmark is stable-positive and guard benchmarks do not regress.
62 changes: 43 additions & 19 deletions sjsonnet/src-jvm-native/sjsonnet/CachedResolvedFile.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import fastparse.ParserInput
import java.io.File
import java.nio.charset.StandardCharsets
import java.nio.file.Files
import java.security.MessageDigest

/**
* A class that encapsulates a resolved import. This is used to cache the result of resolving an
Expand Down Expand Up @@ -37,63 +38,86 @@ class CachedResolvedFile(
s"Resolved import path $resolvedImportPath is too large: ${jFile.length()} bytes > $memoryLimitBytes bytes"
)

private val resolvedImportContent: ResolvedFile = {
// TODO: Support caching binary data
if (jFile.length() > cacheThresholdBytes) {
// If the file is too large, then we will just read it from disk
null
} else if (binaryData) {
StaticBinaryResolvedFile(readRawBytes(jFile))
} else {
StaticResolvedFile(readString(jFile))
}
}
private val cachedBytes: Array[Byte] =
if (jFile.length() > cacheThresholdBytes) null
else readRawBytes(jFile)

private val cachedBinaryContent: ResolvedFile =
if (cachedBytes != null && binaryData) StaticBinaryResolvedFile(cachedBytes)
else null

private def readString(jFile: File): String = {
new String(Files.readAllBytes(jFile.toPath), StandardCharsets.UTF_8)
}

private def readRawBytes(jFile: File): Array[Byte] = Files.readAllBytes(jFile.toPath)

private lazy val resolvedTextContent: ResolvedFile =
StaticResolvedFile(new String(cachedBytes, StandardCharsets.UTF_8))

private lazy val cachedBytesHash: String =
cachedBytes.length.toString + ":" + bytesToHex(
MessageDigest.getInstance("SHA-256").digest(cachedBytes)
)

private def bytesToHex(bytes: Array[Byte]): String = {
val hexChars = "0123456789abcdef"
val out = new Array[Char](bytes.length * 2)
var i = 0
var j = 0
while (i < bytes.length) {
val b = bytes(i) & 0xff
out(j) = hexChars.charAt(b >>> 4)
out(j + 1) = hexChars.charAt(b & 0x0f)
i += 1
j += 2
}
new String(out)
}

/**
* A method that will return a reader for the resolved import. If the import is too large, then
* this will return a reader that will read the file from disk. Otherwise, it will return a reader
* that reads from memory.
*/
def getParserInput(): ParserInput = {
if (resolvedImportContent == null) {
if (cachedBytes == null) {
FileParserInput(jFile)
} else if (binaryData) {
cachedBinaryContent.getParserInput()
} else {
resolvedImportContent.getParserInput()
resolvedTextContent.getParserInput()
}
}

override def readString(): String = {
if (resolvedImportContent == null) {
if (cachedBytes == null) {
// If the file is too large, then we will just read it from disk
readString(jFile)
} else if (binaryData) {
cachedBinaryContent.readString()
} else {
// Otherwise, we will read it from memory
resolvedImportContent.readString()
resolvedTextContent.readString()
}
}

override def contentHash(): String = {
if (resolvedImportContent == null) {
if (cachedBytes == null) {
// If the file is too large, then we will just read it from disk
Platform.hashFile(jFile)
} else {
resolvedImportContent.contentHash()
cachedBytesHash
}
}

override def readRawBytes(): Array[Byte] = {
if (resolvedImportContent == null) {
if (cachedBytes == null) {
// If the file is too large, then we will just read it from disk
readRawBytes(jFile)
} else {
// Otherwise, we will read it from memory
resolvedImportContent.readRawBytes()
cachedBytes
}
}
}
49 changes: 40 additions & 9 deletions sjsonnet/src/sjsonnet/Format.scala
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ object Format {
val literalEnds: Array[Int],
/** Non-null when all simple named specs use the same label. */
val singleNamedLabel: String,
/** True when all literal text copied to the output is already JSON-string ASCII-safe. */
val staticAsciiSafe: Boolean,
/**
* True when ALL specs are simple `%(key)s` with a named label and no formatting flags. In
* this case we can use a fast path that caches the object key lookup and avoids widenRaw
Expand Down Expand Up @@ -483,6 +485,7 @@ object Format {
litStarts,
litEnds,
singleNamedLabel,
Platform.isAsciiJsonSafe(s),
allSimpleNamed
)
}
Expand All @@ -497,6 +500,7 @@ object Format {
val emptyStarts = new Array[Int](size)
val emptyEnds = new Array[Int](size)
var staticChars = leading.length
var staticAsciiSafe = Platform.isAsciiJsonSafe(leading)
var hasAnyStar = false
var allSimpleNamed = true
var idx = 0
Expand All @@ -508,6 +512,7 @@ object Format {
specs(idx) = formatted.bits
literals(idx) = literal
staticChars += literal.length
staticAsciiSafe &&= Platform.isAsciiJsonSafe(literal)
hasAnyStar ||= formatted.widthStar || formatted.precisionStar
allSimpleNamed = false
idx += 1
Expand All @@ -526,6 +531,7 @@ object Format {
emptyStarts,
emptyEnds,
null,
staticAsciiSafe,
allSimpleNamed
)
}
Expand Down Expand Up @@ -556,7 +562,7 @@ object Format {
// Super-fast path: all specs are simple %(key)s with an object value.
// Avoids per-spec pattern matching, widenRaw, and uses offset-based literal appends.
if (parsed.allSimpleNamedString && values0.isInstanceOf[Val.Obj]) {
return formatSimpleNamedString(parsed, values0.asInstanceOf[Val.Obj], pos)
return formatSimpleNamedStringValue(parsed, values0.asInstanceOf[Val.Obj], pos).str
}

val values = values0 match {
Expand Down Expand Up @@ -751,34 +757,47 @@ object Format {
if (singleSpecNoStatic) singleFormatted else output.toString()
}

private[sjsonnet] def formatValue(parsed: RuntimeFormat, values0: Val, pos: Position)(implicit
evaluator: EvalScope): Val.Str =
if (parsed.allSimpleNamedString && values0.isInstanceOf[Val.Obj]) {
formatSimpleNamedStringValue(parsed, values0.asInstanceOf[Val.Obj], pos)
} else {
Val.Str(pos, format(parsed, values0, pos))
}

/**
* Super-fast path for format strings where ALL specs are simple `%(key)s` with a `Val.Obj`. This
* avoids per-spec pattern matching, widenRaw overhead, and caches repeated key lookups. For the
* large_string_template benchmark (605KB, 256 `%(x)s` interpolations), this eliminates 256
* redundant object lookups and the generic dispatch overhead.
*/
private def formatSimpleNamedString(parsed: RuntimeFormat, obj: Val.Obj, pos: Position)(implicit
evaluator: EvalScope): String = {
private def formatSimpleNamedStringValue(parsed: RuntimeFormat, obj: Val.Obj, pos: Position)(
implicit evaluator: EvalScope): Val.Str = {
val output = new java.lang.StringBuilder(parsed.staticChars + parsed.specBits.length * 16)
var asciiSafe = parsed.staticAsciiSafe

// Append leading literal using offsets if source is available, else use string
appendLeading(output, parsed)

val singleLabel = parsed.singleNamedLabel
if (singleLabel != null) {
val str = simpleStringValue(obj.value(singleLabel, pos)(evaluator).value)
val rawVal = obj.value(singleLabel, pos)(evaluator).value
val str = simpleStringValue(rawVal)
asciiSafe &&= simpleStringValueAsciiSafe(rawVal)
var idx = 0
while (idx < parsed.specBits.length) {
output.append(str)
appendLiteral(output, parsed, idx)
idx += 1
}
return output.toString
val result = output.toString
return if (asciiSafe) Val.Str.asciiSafe(pos, result) else Val.Str(pos, result)
}

// Cache for repeated key lookups: most format strings reuse the same key many times
var cachedKey: String = null
var cachedStr: String = null
var cachedAsciiSafe = false

var idx = 0
while (idx < parsed.specBits.length) {
Expand All @@ -787,12 +806,16 @@ object Format {
// Look up and cache the string value for this key
// String.equals already does identity check (eq) internally
val str =
if (key == cachedKey) cachedStr
else {
if (key == cachedKey) {
asciiSafe &&= cachedAsciiSafe
cachedStr
} else {
val rawVal = obj.value(key, pos)(evaluator).value
val s = simpleStringValue(rawVal)
cachedKey = key
cachedStr = s
cachedAsciiSafe = simpleStringValueAsciiSafe(rawVal)
asciiSafe &&= cachedAsciiSafe
s
}

Expand All @@ -803,7 +826,8 @@ object Format {

idx += 1
}
output.toString
val result = output.toString
if (asciiSafe) Val.Str.asciiSafe(pos, result) else Val.Str(pos, result)
}

private def simpleStringValue(rawVal: Val)(implicit evaluator: EvalScope): String =
Expand All @@ -826,6 +850,13 @@ object Format {
value.toString
}

private def simpleStringValueAsciiSafe(rawVal: Val): Boolean =
rawVal match {
case vs: Val.Str => vs._asciiSafe
case _: Val.Num | _: Val.True | _: Val.False | _: Val.Null => true
case _ => false
}

private def formatInteger(formatted: FormatSpec, s: Double): String = {
// Fast path: if the value fits in a Long (and isn't Long.MinValue where
// negation overflows), avoid BigInt allocation entirely
Expand Down Expand Up @@ -1013,6 +1044,6 @@ object Format {
// Each PartialApplyFmt instance caches its own parsed format, so no external cache needed.
private val parsed = scanFormat(fmt)
def evalRhs(values0: Eval, ev: EvalScope, pos: Position): Val =
Val.Str(pos, format(parsed, values0.value, pos)(ev))
formatValue(parsed, values0.value, pos)(ev)
}
}
2 changes: 1 addition & 1 deletion sjsonnet/src/sjsonnet/Importer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ object CachedResolver {
try {
val visitor =
new JsonImportVisitor(fileScope, internedStrings, settings)
Some((ujson.StringParser.transform(content.readString(), visitor), fileScope))
Some((ujson.ByteArrayParser.transform(content.readRawBytes(), visitor), fileScope))
} catch {
case _: ujson.ParsingFailedException | _: DuplicateJsonKey | _: InvalidJsonNumber |
_: JsonParseDepthExceeded | _: NumberFormatException =>
Expand Down
Loading
Loading