perf: preserve ASCII-safe simple format results by He-Pin · Pull Request #856 · databricks/sjsonnet

He-Pin · 2026-05-13T14:04:55Z

Motivation:
large_string_template still spent time re-encoding and re-scanning the huge string produced by simple named format interpolation, even when the static format literals and dynamic values made the final string JSON-string ASCII-safe. This keeps the focus on the largest remaining local gap from #666-style benchmarking.

Key Design Decision
Track ASCII safety as metadata on the compiled simple named format path instead of adding another renderer scan. The optimization is deliberately limited to all-simple %(key)s formats, where every emitted dynamic value goes through simpleStringValue and can be conservatively classified.

Modification:

Add staticAsciiSafe metadata to compiled format strings.
Return Val.Str.asciiSafe from Format.PartialApplyFmt when static literals and all simple named dynamic values are JSON-string ASCII-safe.
Keep unsafe strings, unsafe static literals, and mixed-key unsafe values conservative.
Add focused regression tests for safe numeric values, unsafe dynamic strings, unsafe static literals, and mixed-key safety.
Update the gap and sync ledgers.

Benchmark Results:
JVM JMH (bench.runRegressions bench/resources/cpp_suite/large_string_template.jsonnet):

Before: 0.683 ms/op
After: 0.677 ms/op

Scala Native hyperfine, large_string_template, before vs after:

Forward: 8.64 +/- 0.75 ms -> 8.01 +/- 0.52 ms
Reverse: 8.65 +/- 0.50 ms -> 8.17 +/- 0.48 ms

Scala Native hyperfine, source-built jrsonnet comparison:

sjsonnet after: 8.01-8.17 ms
jrsonnet: 6.0 +/- 1.2 ms
Remaining gap: about 1.34x

Guard benchmark, kube-prometheus:

Forward: 131.74 ms -> 129.14 ms
Reverse: 129.11 ms baseline vs 130.94 ms candidate
Interpreted as neutral/noisy, not a target regression.

Analysis:
The existing renderer already has an ASCII-safe direct byte path, but formatted strings lost that metadata and fell back to UTF-8 encoding plus escape scanning. This change preserves the metadata at the producer where the safety condition is known, avoiding an extra whole-string encoding/scan on large simple format outputs. The safety predicate is conservative: unknown complex values, unsafe strings, or unsafe literal text do not get the fast-path marker.

References:

performance optimization #666
bench/reports/sjsonnet-vs-jrsonnet-gaps.md
bench/reports/sync-points.md

Result:
large_string_template improves in both Native command orders, JVM JMH does not regress, output equality holds for large_string_template and kube-prometheus, and ./mill --no-server --ticker false --color false -j 1 __.test plus __.checkFormat pass.

Follow-up note:

Rejected parser _asciiSafe hint reuse after PR creation: it improved debug parse_time in one run but regressed Native whole-process large_string_template in both command orders (8.23ms baseline vs 9.47ms candidate forward; 8.00ms baseline vs 8.47ms candidate reverse). The code was removed and only the ledger entry remains.

Follow-up note:

Rejected Native manual ASCII-safe byte copy after PR creation: replacing String.getBytes(0, len, dst, dstPos) with a manual charAt loop preserved output but regressed large_string_template in both command orders (7.89ms baseline vs 10.78ms candidate forward; 7.97ms baseline vs 11.17ms candidate reverse). The code was removed and only the ledger entry remains.

Follow-up note:

Rejected single-character StringBuilder.append(Char) fast path after PR creation: output stayed identical but Native large_string_template regressed in both command orders (7.97ms baseline vs 8.65ms candidate forward; 7.97ms baseline vs 8.84ms candidate reverse). The code was removed and only the ledger entry remains.

Follow-up note:

Rejected ByteRenderer minified object comma/empty-state specialization after PR creation: output equality held and kube improved weakly (130.32ms -> 128.31ms forward; 130.62ms -> 129.74ms reverse), but large_string_template regressed/noised negative in both command orders (10.24ms -> 10.45ms forward; 10.99ms -> 11.27ms reverse). The code was removed and only the ledger entry remains.

Follow-up note:

Rejected Native-only long ASCII escaped string renderer after PR creation: output equality held for kube, large_string_template, and a long Unicode fallback guard, but Native large_string_template regressed in both command orders (9.96ms -> 11.77ms forward; 9.56ms -> 10.54ms reverse). The code was removed and only the ledger entry remains.

Follow-up note:

Rejected inline small-stack cycle tracking after PR creation: shallow/deep recursive error behavior and normal outputs matched baseline, but kube was noise-level and Native large_string_template regressed in both command orders (11.59ms -> 12.32ms forward; 9.67ms -> 10.80ms reverse). The code was removed and only the ledger entry remains.

Follow-up note:

Rejected ByteRenderer quoted-key cache after PR creation: output equality held for kube, large_string_template, escaped/unicode keys, and long-key fallback, but kube reverse A/B was not stable-positive. HashMap was only weakly positive forward, while direct-mapped and capped variants regressed/noised negative in reverse. The code was removed and only the ledger entry remains.

Motivation: PR databricks#840 introduced a strict JSON fast path for .json imports but still forces a full UTF-8 string decode for every cached file before handing the text to ujson.StringParser. Real-world workloads (e.g. kube-prometheus) import many .json files; decoding each one twice (once into String for parsing, again as cache content) is pure overhead. Key Design Decision: ujson 4.4.3 ships ByteArrayParser, which parses UTF-8 JSON directly from a byte array without an intermediate String. Cache small resolved files as raw bytes (already what we read from disk) and lazily decode text only when the importstr/parser-input path actually needs it. Preserve parse-cache content identity by hashing the cached bytes with SHA-256 (length + hex digest) so external ParseCache implementations keep the same collision resistance as the old full-string key. Modification: * Importer.scala: CachedResolver.parseJsonImport now calls ujson.ByteArrayParser.transform(content.readRawBytes(), visitor) instead of decoding the whole file to String first. * CachedResolvedFile.scala (JVM/Native): small files are cached as Array[Byte]; getParserInput / readString materialize the String lazily; readRawBytes returns the cached bytes directly; contentHash is length + SHA-256 over the cached bytes; binary imports still use StaticBinaryResolvedFile. * PreloaderTests.scala: tighten the strict-JSON fast-path coverage so it fails if the fast path ever falls back to readString(). Result: * Output equality vs upstream sjsonnet and jrsonnet preserved on kube-prometheus and large_string_template. * Native kube-prometheus hyperfine A/B (forward & reverse): clean 139.4 +/- 2.8 ms -> candidate 132.7 +/- 1.9 ms (forward) candidate 132.1 +/- 1.9 ms vs clean 140.3 +/- 2.6 ms (reverse) * Full ./mill __.test green. References: Follow-up to databricks#840

Motivation: Large inline objects produced by strict JSON imports can exceed the small-object shape that computeSortedInlineOrder was originally tuned for. Native sampling on kube-prometheus showed sorted inline-order computation as a materialization hotspot, and insertion sort becomes quadratic on those wider objects. Modification: Keep insertion sort for small inline objects, and use an in-place quicksort with insertion-sort cleanup for larger visible field sets. Record the accepted benchmark result and rejected parser/key-render micro-routes in the performance ledgers. Result: Kube-prometheus Native A/B improved on top of strict JSON byte imports, with forward mean 145.3ms -> 140.0ms and reverse mean 151.6ms -> 148.9ms. Formatting and the full test suite pass. References: Upstream-base: databricks/sjsonnet@cedc083 Prior optimization: 883fca5 perf: parse strict JSON imports from bytes

Motivation: Keep the performance exploration ledger current so future optimization work does not repeat Native-negative or build-invalid routes. Modification: Record rejected short-string, ASCII-safe, inline sort-cache, path-only parse-cache, and Native GC configuration probes with the validation evidence that ruled them out. Result: No runtime code changes are retained; the branch documents the failed hypotheses and preserves the current accepted optimization stack. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: large_string_template still spent time re-encoding and re-scanning the huge string produced by simple named format interpolation, even when the final result was known to be JSON-string ASCII-safe. Modification: Track whether compiled format literals are ASCII-safe and return Val.Str.asciiSafe from PartialApplyFmt when every simple named dynamic value is also safe. Add regression coverage for safe numeric values, unsafe string values, unsafe static literals, and mixed-key safety. Result: Native large_string_template improved in both command orders (8.64 -> 8.01 ms forward, 8.65 -> 8.17 ms reverse); JVM JMH stayed neutral-positive (0.683 -> 0.677 ms/op); full __.test and checkFormat pass. References: bench/reports/sjsonnet-vs-jrsonnet-gaps.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: A follow-up probe reused parser ASCII-safe metadata to skip static format literal safety scanning after PR databricks#856, but internal debug counter gains did not translate into Native whole-process speed. Modification: Record the rejected parser `_asciiSafe` format hint experiment in the gap ledger and sync-points file so it is not repeated without materially new evidence. Result: The accepted simple-format ASCII-safe optimization remains unchanged, and the rejected hint path is documented with forward/reverse Native benchmark evidence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: A Native-only ASCII-safe byte-copy probe tested whether a manual low-byte loop could improve large formatted string rendering after PR databricks#856. Modification: Record the rejected manual copy-loop experiment in the gap ledger and sync-points file with forward/reverse Native benchmark evidence. Result: The optimized implementation remains on the faster platform `String.getBytes(0, len, dst, dstPos)` path, and the slower manual copy route is documented to avoid repeated work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: A simple-format loop probe tested whether single-character dynamic values should use `StringBuilder.append(Char)` instead of appending the one-character String. Modification: Record the rejected single-character append experiment in the gap ledger and sync-points file with forward/reverse Native benchmark evidence. Result: The accepted simple-format implementation remains on the faster `append(String)` path, and the slower single-character branch is documented to avoid repeated work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: Avoid repeating a ByteRenderer object-rendering route that looked promising on kube but failed guard benchmarks. Modification: Record the minified direct-object comma/empty-state specialization in the gap ledger and sync-points rejection table. Result: The implementation remains reverted; future work should not trade a weak kube gain for a large_string_template regression. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: Document a Native-only long-string renderer probe so future optimization work does not repeat the same regression. Modification: Add the failed direct charAt escaped-ASCII renderer result to the gap ledger and sync-points table. Result: The code remains reverted because large_string_template regressed in both Native command orders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: Keep the performance ledger complete after testing an allocation-saving cycle-tracking variant. Modification: Record the inline small-stack cycle tracking probe and its Native benchmark result in both ledgers. Result: The code remains reverted because kube was noise-level and large_string_template regressed in both command orders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: Preserve the outcome of quoted-key cache experiments so future kube work does not repeat unstable variants. Modification: Record HashMap, direct-mapped, and capped ByteRenderer key-cache probes in the performance ledgers. Result: The implementation remains reverted because kube reverse A/B was not stable-positive and some variants regressed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

He-Pin and others added 11 commits May 13, 2026 15:43

He-Pin closed this May 13, 2026

He-Pin reopened this May 14, 2026

He-Pin mentioned this pull request May 14, 2026

perf: cache repeated long string rendering #857

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: preserve ASCII-safe simple format results#856

perf: preserve ASCII-safe simple format results#856
He-Pin wants to merge 11 commits into
databricks:masterfrom
He-Pin:perf/simple-format-ascii-safe

He-Pin commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented May 13, 2026 •

edited

Loading