perf: preserve ASCII-safe simple format results#856
Draft
He-Pin wants to merge 11 commits into
Draft
Conversation
Motivation: PR databricks#840 introduced a strict JSON fast path for .json imports but still forces a full UTF-8 string decode for every cached file before handing the text to ujson.StringParser. Real-world workloads (e.g. kube-prometheus) import many .json files; decoding each one twice (once into String for parsing, again as cache content) is pure overhead. Key Design Decision: ujson 4.4.3 ships ByteArrayParser, which parses UTF-8 JSON directly from a byte array without an intermediate String. Cache small resolved files as raw bytes (already what we read from disk) and lazily decode text only when the importstr/parser-input path actually needs it. Preserve parse-cache content identity by hashing the cached bytes with SHA-256 (length + hex digest) so external ParseCache implementations keep the same collision resistance as the old full-string key. Modification: * Importer.scala: CachedResolver.parseJsonImport now calls ujson.ByteArrayParser.transform(content.readRawBytes(), visitor) instead of decoding the whole file to String first. * CachedResolvedFile.scala (JVM/Native): small files are cached as Array[Byte]; getParserInput / readString materialize the String lazily; readRawBytes returns the cached bytes directly; contentHash is length + SHA-256 over the cached bytes; binary imports still use StaticBinaryResolvedFile. * PreloaderTests.scala: tighten the strict-JSON fast-path coverage so it fails if the fast path ever falls back to readString(). Result: * Output equality vs upstream sjsonnet and jrsonnet preserved on kube-prometheus and large_string_template. * Native kube-prometheus hyperfine A/B (forward & reverse): clean 139.4 +/- 2.8 ms -> candidate 132.7 +/- 1.9 ms (forward) candidate 132.1 +/- 1.9 ms vs clean 140.3 +/- 2.6 ms (reverse) * Full ./mill __.test green. References: Follow-up to databricks#840
Motivation: Large inline objects produced by strict JSON imports can exceed the small-object shape that computeSortedInlineOrder was originally tuned for. Native sampling on kube-prometheus showed sorted inline-order computation as a materialization hotspot, and insertion sort becomes quadratic on those wider objects. Modification: Keep insertion sort for small inline objects, and use an in-place quicksort with insertion-sort cleanup for larger visible field sets. Record the accepted benchmark result and rejected parser/key-render micro-routes in the performance ledgers. Result: Kube-prometheus Native A/B improved on top of strict JSON byte imports, with forward mean 145.3ms -> 140.0ms and reverse mean 151.6ms -> 148.9ms. Formatting and the full test suite pass. References: Upstream-base: databricks/sjsonnet@cedc083 Prior optimization: 883fca5 perf: parse strict JSON imports from bytes
Motivation: Keep the performance exploration ledger current so future optimization work does not repeat Native-negative or build-invalid routes. Modification: Record rejected short-string, ASCII-safe, inline sort-cache, path-only parse-cache, and Native GC configuration probes with the validation evidence that ruled them out. Result: No runtime code changes are retained; the branch documents the failed hypotheses and preserves the current accepted optimization stack. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: large_string_template still spent time re-encoding and re-scanning the huge string produced by simple named format interpolation, even when the final result was known to be JSON-string ASCII-safe. Modification: Track whether compiled format literals are ASCII-safe and return Val.Str.asciiSafe from PartialApplyFmt when every simple named dynamic value is also safe. Add regression coverage for safe numeric values, unsafe string values, unsafe static literals, and mixed-key safety. Result: Native large_string_template improved in both command orders (8.64 -> 8.01 ms forward, 8.65 -> 8.17 ms reverse); JVM JMH stayed neutral-positive (0.683 -> 0.677 ms/op); full __.test and checkFormat pass. References: bench/reports/sjsonnet-vs-jrsonnet-gaps.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: A follow-up probe reused parser ASCII-safe metadata to skip static format literal safety scanning after PR databricks#856, but internal debug counter gains did not translate into Native whole-process speed. Modification: Record the rejected parser `_asciiSafe` format hint experiment in the gap ledger and sync-points file so it is not repeated without materially new evidence. Result: The accepted simple-format ASCII-safe optimization remains unchanged, and the rejected hint path is documented with forward/reverse Native benchmark evidence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: A Native-only ASCII-safe byte-copy probe tested whether a manual low-byte loop could improve large formatted string rendering after PR databricks#856. Modification: Record the rejected manual copy-loop experiment in the gap ledger and sync-points file with forward/reverse Native benchmark evidence. Result: The optimized implementation remains on the faster platform `String.getBytes(0, len, dst, dstPos)` path, and the slower manual copy route is documented to avoid repeated work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: A simple-format loop probe tested whether single-character dynamic values should use `StringBuilder.append(Char)` instead of appending the one-character String. Modification: Record the rejected single-character append experiment in the gap ledger and sync-points file with forward/reverse Native benchmark evidence. Result: The accepted simple-format implementation remains on the faster `append(String)` path, and the slower single-character branch is documented to avoid repeated work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: Avoid repeating a ByteRenderer object-rendering route that looked promising on kube but failed guard benchmarks. Modification: Record the minified direct-object comma/empty-state specialization in the gap ledger and sync-points rejection table. Result: The implementation remains reverted; future work should not trade a weak kube gain for a large_string_template regression. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: Document a Native-only long-string renderer probe so future optimization work does not repeat the same regression. Modification: Add the failed direct charAt escaped-ASCII renderer result to the gap ledger and sync-points table. Result: The code remains reverted because large_string_template regressed in both Native command orders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: Keep the performance ledger complete after testing an allocation-saving cycle-tracking variant. Modification: Record the inline small-stack cycle tracking probe and its Native benchmark result in both ledgers. Result: The code remains reverted because kube was noise-level and large_string_template regressed in both command orders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: Preserve the outcome of quoted-key cache experiments so future kube work does not repeat unstable variants. Modification: Record HashMap, direct-mapped, and capped ByteRenderer key-cache probes in the performance ledgers. Result: The implementation remains reverted because kube reverse A/B was not stable-positive and some variants regressed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
large_string_template still spent time re-encoding and re-scanning the huge string produced by simple named format interpolation, even when the static format literals and dynamic values made the final string JSON-string ASCII-safe. This keeps the focus on the largest remaining local gap from #666-style benchmarking.
Key Design Decision
Track ASCII safety as metadata on the compiled simple named format path instead of adding another renderer scan. The optimization is deliberately limited to all-simple
%(key)sformats, where every emitted dynamic value goes throughsimpleStringValueand can be conservatively classified.Modification:
staticAsciiSafemetadata to compiled format strings.Val.Str.asciiSafefromFormat.PartialApplyFmtwhen static literals and all simple named dynamic values are JSON-string ASCII-safe.Benchmark Results:
JVM JMH (
bench.runRegressions bench/resources/cpp_suite/large_string_template.jsonnet):0.683 ms/op0.677 ms/opScala Native hyperfine,
large_string_template, before vs after:8.64 +/- 0.75 ms->8.01 +/- 0.52 ms8.65 +/- 0.50 ms->8.17 +/- 0.48 msScala Native hyperfine, source-built jrsonnet comparison:
8.01-8.17 ms6.0 +/- 1.2 ms1.34xGuard benchmark, kube-prometheus:
131.74 ms->129.14 ms129.11 msbaseline vs130.94 mscandidateAnalysis:
The existing renderer already has an ASCII-safe direct byte path, but formatted strings lost that metadata and fell back to UTF-8 encoding plus escape scanning. This change preserves the metadata at the producer where the safety condition is known, avoiding an extra whole-string encoding/scan on large simple format outputs. The safety predicate is conservative: unknown complex values, unsafe strings, or unsafe literal text do not get the fast-path marker.
References:
bench/reports/sjsonnet-vs-jrsonnet-gaps.mdbench/reports/sync-points.mdResult:
large_string_templateimproves in both Native command orders, JVM JMH does not regress, output equality holds forlarge_string_templateand kube-prometheus, and./mill --no-server --ticker false --color false -j 1 __.testplus__.checkFormatpass.Follow-up note:
_asciiSafehint reuse after PR creation: it improved debugparse_timein one run but regressed Native whole-processlarge_string_templatein both command orders (8.23msbaseline vs9.47mscandidate forward;8.00msbaseline vs8.47mscandidate reverse). The code was removed and only the ledger entry remains.Follow-up note:
String.getBytes(0, len, dst, dstPos)with a manualcharAtloop preserved output but regressedlarge_string_templatein both command orders (7.89msbaseline vs10.78mscandidate forward;7.97msbaseline vs11.17mscandidate reverse). The code was removed and only the ledger entry remains.Follow-up note:
StringBuilder.append(Char)fast path after PR creation: output stayed identical but Nativelarge_string_templateregressed in both command orders (7.97msbaseline vs8.65mscandidate forward;7.97msbaseline vs8.84mscandidate reverse). The code was removed and only the ledger entry remains.Follow-up note:
130.32ms -> 128.31msforward;130.62ms -> 129.74msreverse), butlarge_string_templateregressed/noised negative in both command orders (10.24ms -> 10.45msforward;10.99ms -> 11.27msreverse). The code was removed and only the ledger entry remains.Follow-up note:
large_string_template, and a long Unicode fallback guard, but Nativelarge_string_templateregressed in both command orders (9.96ms -> 11.77msforward;9.56ms -> 10.54msreverse). The code was removed and only the ledger entry remains.Follow-up note:
large_string_templateregressed in both command orders (11.59ms -> 12.32msforward;9.67ms -> 10.80msreverse). The code was removed and only the ledger entry remains.Follow-up note:
large_string_template, escaped/unicode keys, and long-key fallback, but kube reverse A/B was not stable-positive. HashMap was only weakly positive forward, while direct-mapped and capped variants regressed/noised negative in reverse. The code was removed and only the ledger entry remains.