-
Notifications
You must be signed in to change notification settings - Fork 832
Description
Context
While benchmarking the Prometheus shim PoC (bridging Prometheus client API to the OTel SDK), I found that classic-only histograms are 30% faster through the OTel SDK than through native Prometheus.
Benchmark numbers (JMH, single thread)
| Path | observe() latency |
|---|---|
| Native Prometheus (classic-only) | 10.5 ns |
| OTel SDK (explicit bucket histogram) | 7.3 ns |
Root cause
Native Prometheus doObserve() uses 3 separate CAS-based atomics per call:
classicBuckets[i].add(1)—LongAddersum.add(value)—DoubleAddercount.increment()—LongAdder
Plus a buffer.append() CAS attempt and volatile reads for reset/scale-down state.
The OTel SDK uses a single synchronized block with plain +=/++ arithmetic:
synchronized (lock) {
this.sum += value;
this.count++;
this.counts[bucketIndex]++;
// min/max tracking
}In uncontended (single-thread) benchmarks, HotSpot elides the uncontended lock and optimizes the plain arithmetic freely, beating the multi-CAS approach.
Suggestion
For classic-only histograms (where nativeInitialSchema == CLASSIC_HISTOGRAM), consider an alternative doObserve() implementation that uses a synchronized block with plain fields instead of multiple LongAdder/DoubleAdder instances. The buffer mechanism (needed for native histogram scale-down) could also be bypassed in classic-only mode.
This wouldn't affect native or hybrid histograms, which still need the current design.
Multi-threaded consideration
The LongAdder approach was chosen for multi-threaded scalability (striped cells reduce contention). A synchronized block would serialize threads. However:
- Most real-world
observe()calls happen on different label-value combinations (different data points), so contention on a single data point is rare - Even under contention, the critical section is very short (~5 ns of arithmetic), so lock hold time is minimal
- A benchmark with 4 threads would clarify the actual tradeoff
Not a high priority — 10.5 ns is already excellent. But worth considering if classic histogram performance matters.