Skip to content

perf: bsr fast paths for readCode/PeekCode + pool OmitEmpty slices#67

Merged
xe-nvdk merged 2 commits intov6from
perf/bsr-fastpaths-omitempty-pool
Mar 3, 2026
Merged

perf: bsr fast paths for readCode/PeekCode + pool OmitEmpty slices#67
xe-nvdk merged 2 commits intov6from
perf/bsr-fastpaths-omitempty-pool

Conversation

@xe-nvdk
Copy link
Member

@xe-nvdk xe-nvdk commented Mar 3, 2026

Summary

Three optimizations targeting decode throughput and encode GC pressure:

Benchmark (before → after, Apple M3 Max, 3 iterations)

Benchmark Before After Change
StructUnmarshal 361 ns 334 ns -7.5%
StructUnmarshalPartially 247 ns 232 ns -6.1%
MapIntInt 328 ns 314 ns -4.3%
StructManual (roundtrip) 569 ns 544 ns -4.4%

Test plan

  • go test -count=1 ./... — all pass
  • go test -short -race -count=1 -timeout=5m ./... — no races
  • env GOOS=linux GOARCH=386 go vet ./... — cross-platform check
  • Before/after benchmarks — significant decode improvement

Closes #57, closes #59, closes #58

xe-nvdk added 2 commits March 3, 2026 09:13
Three optimizations targeting decode throughput and encode GC pressure:

1. readCode() bsr fast path (#57): read directly from byte-slice data
   instead of going through interface dispatch on the hottest decode
   function. Handles d.rec recording mode inline.

2. PeekCode() bsr fast path (#59): peek directly at byte-slice data
   avoiding ReadByte+UnreadByte interface dispatch.

3. Pool OmitEmpty filtered field slices (#58): use sync.Pool for the
   []*field slice allocated when struct fields are omitted, returned
   via putFilteredFields after iteration in encodeStructValue.

Benchmark results (Apple M3 Max, 3 iterations):
  StructUnmarshal:          361 ns → 334 ns  (-7.5%)
  StructUnmarshalPartially: 247 ns → 232 ns  (-6.1%)
  MapIntInt:                328 ns → 314 ns  (-4.3%)
  StructManual:             569 ns → 544 ns  (-4.4%)

Closes #57, closes #59, closes #58
@xe-nvdk xe-nvdk merged commit dbaa42f into v6 Mar 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: inline PeekCode for byte-slice reader perf: pool OmitEmpty filtered field slices perf: readCode fast path for byte-slice reader

1 participant