Skip to content

Optimize CBOR Codec#1183

Merged
adwsingh merged 3 commits into
mainfrom
adwsingh/cbor-perf-final
May 19, 2026
Merged

Optimize CBOR Codec#1183
adwsingh merged 3 commits into
mainfrom
adwsingh/cbor-perf-final

Conversation

@adwsingh
Copy link
Copy Markdown
Contributor

Issue #, if available:

Description of changes:

  • Replace Sink interface with direct byte[] buffer writes + VarHandle for multi-byte big-endian ops — no virtual dispatch per byte
  • Pool serializer instances via striped AtomicReferenceArray to avoid repeated buffer allocation
  • Pre-compute CBOR-encoded field name bytes at schema load time; write them with a single System.arraycopy
  • Inline UTF-8 encoding with an ASCII fast path: deprecated String.getBytes(int,int,byte[],int) bulk-copies, OR-accumulator detects non-ASCII, rewind path encodes UTF-8 directly into the buffer without allocation
  • Use a long bitmask instead of boolean[] for tracking indefinite-length collection nesting
  • Fuse ensureCapacity for blob writes — single capacity check covers CBOR header + payload, eliminating redundant buffer growth
  • Add Codec.serialize() that returns a ByteBuffer directly from the pooled serializer
Screenshot 2026-05-18 at 1 37 26 PM

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@adwsingh adwsingh requested a review from mtdowling May 18, 2026 20:39
* <ol>
* <li><b>Speculative fast path</b>: check the expected next member via length + Arrays.equals
* (JVM-intrinsified). Fires on every field when CBOR map entries arrive in schema definition
* order (the common case for smithy-to-smithy communication).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided doing this originally because I didn't want adding new fields to have visible performance effects, especially to legacy clients. Adding a new optional field or reordering schema fields should not have a visible performance effect.

@rhernandez35
Copy link
Copy Markdown
Collaborator

Please do some perf testing with clients that have out-of-date schemas and servers that are receiving unknown fields. I'm concerned about adverse impact outside of all but the happiest cases in this PR.

@adwsingh adwsingh force-pushed the adwsingh/cbor-perf-final branch 3 times, most recently from 2ac8255 to db811fa Compare May 19, 2026 01:11
@adwsingh adwsingh force-pushed the adwsingh/cbor-perf-final branch from db811fa to d6f3ecd Compare May 19, 2026 16:28
@adwsingh adwsingh enabled auto-merge (rebase) May 19, 2026 16:28
adwsingh added 3 commits May 19, 2026 12:00
The previous readTimestamp passed the tagged token to readLong, which always
threw SerializationException before reaching Instant.ofEpochSecond, masking
the fact that out-of-range epoch values cause DateTimeException. Now that the
token is correctly stripped, wrap with try-catch to convert the overflow to
SerializationException (zero happy-path cost via exception tables).
@adwsingh adwsingh force-pushed the adwsingh/cbor-perf-final branch from d6f3ecd to 0a6d2ef Compare May 19, 2026 19:00
@adwsingh adwsingh dismissed rhernandez35’s stale review May 19, 2026 19:01

Primary concern addressed, got 1 approval if there are more concerns we can address in a followup.

@adwsingh adwsingh merged commit f72d701 into main May 19, 2026
5 checks passed
@adwsingh adwsingh deleted the adwsingh/cbor-perf-final branch May 19, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants