Skip to content

perf: Add benchmarks for timeseries query (exemplars) performance#4665

Merged
marcsanmi merged 4 commits intomainfrom
marcsanmi/exemplars-benchmarks
Jan 14, 2026
Merged

perf: Add benchmarks for timeseries query (exemplars) performance#4665
marcsanmi merged 4 commits intomainfrom
marcsanmi/exemplars-benchmarks

Conversation

@marcsanmi
Copy link
Copy Markdown
Contributor

Adds benchmarks to measure and validate the performance of timeseries queries, particularly focusing on the exemplar collection overhead introduced in #4615.

Performance Results

Part 1: Refactoring Cost (NoExemplars vs weekly/f145)

We refactored profileEntryIterator to use a flexible options pattern. This section measures the cost of that refactoring even when exemplars are disabled.

Comparison Setup:

  • Baseline: weekly/f145 branch (old simple implementation, before exemplar PR)
  • Current: This branch with NoExemplars (new options pattern, exemplars disabled)

Commands Used:

# On weekly/f145 branch - copied test file and removed WithExemplars variant to make the benchmarks to work
go test -bench=BenchmarkTimeSeriesQuery -benchmem -count=10 ./pkg/querybackend/ > old.txt

# On exemplar branch with NoExemplars
git checkout marcsanmi/exemplars-benchmarks
go test -bench=BenchmarkTimeSeriesQuery -benchmem -count=10 ./pkg/querybackend/ > new.txt

# Compare
benchstat old.txt new.txt

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
                                                     │   old.txt    │               new.txt               │
                                                     │    sec/op    │    sec/op     vs base               │
TimeSeriesQuery/NoExemplars-11                         745.1µ ± 14%   753.3µ ± 12%       ~ (p=0.739 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       614.3µ ±  5%   620.5µ ±  4%       ~ (p=0.481 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      639.6µ ± 21%   639.3µ ± 15%       ~ (p=0.853 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     653.4µ ± 52%   655.6µ ± 12%       ~ (p=0.971 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         629.3µ ± 54%   631.6µ ± 17%       ~ (p=0.631 n=10)
TimeSeriesQuery/WithExemplars-11                                      831.5µ ± 22%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                    831.5µ ±  4%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                   848.3µ ± 91%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                  840.0µ ± 23%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                      836.2µ ± 22%
geomean                                                654.8µ         742.6µ        +0.55%

                                                     │   old.txt    │               new.txt                │
                                                     │     B/op     │     B/op      vs base                │
TimeSeriesQuery/NoExemplars-11                         5.708Mi ± 0%   6.479Mi ± 0%  +13.50% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       5.724Mi ± 0%   6.484Mi ± 0%  +13.26% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      5.718Mi ± 0%   6.485Mi ± 1%  +13.41% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     5.719Mi ± 1%   6.485Mi ± 0%  +13.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         5.730Mi ± 1%   6.484Mi ± 0%  +13.17% (p=0.000 n=10)
TimeSeriesQuery/WithExemplars-11                                      6.878Mi ± 0%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                    6.877Mi ± 0%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                   6.876Mi ± 1%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                  6.867Mi ± 0%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                      6.875Mi ± 0%
geomean                                                5.720Mi        6.676Mi       +13.35%

                                                     │   old.txt   │              new.txt               │
                                                     │  allocs/op  │  allocs/op   vs base               │
TimeSeriesQuery/NoExemplars-11                         11.45k ± 0%   11.29k ± 0%  -1.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       11.45k ± 0%   11.29k ± 0%  -1.39% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      11.45k ± 0%   11.29k ± 0%  -1.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     11.45k ± 0%   11.29k ± 0%  -1.39% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         11.45k ± 0%   11.29k ± 0%  -1.40% (p=0.000 n=10)
TimeSeriesQuery/WithExemplars-11                                     16.67k ± 0%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                   16.67k ± 0%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                  16.67k ± 0%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                 16.67k ± 0%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                     16.67k ± 0%
geomean                                                11.45k        13.72k       -1.39%

Analysis:

  • ✅ Time: No statistically significant regression (p > 0.05, shown as ~)
  • ⚠️ Memory: +13.5% increase (~770KB on 6MB queries)
  • ✅ Allocs: Slight improvement (-1.4%)

Potential memory trade-off explanation:

The refactored profileEntryIterator uses a flexible options pattern with:

  • Dynamic queryColumns slice (grows based on requested features)
  • Dynamic processor slice of closures
  • Column priority sorting logic

Part 2: Exemplar Feature Overhead (NoExemplars vs WithExemplars)

This section measures the additional cost of enabling exemplars on top of the refactored baseline.

Command Used:

#Run all timeseries benchmarks (base + time range variants)
go test -bench=BenchmarkTimeSeriesQuery -benchmem ./pkg/querybackend/

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
BenchmarkTimeSeriesQuery/NoExemplars-11             1870            607338 ns/op         6811441 B/op      11289 allocs/op
BenchmarkTimeSeriesQuery/WithExemplars-11                   1441            855327 ns/op         7220978 B/op     16670 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/NoExemplars-11                   1789            657579 ns/op        6808980 B/op       11289 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                 1422            842359 ns/op        7206427 B/op       16669 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11                  1874            728544 ns/op        6800535 B/op       11289 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                1383            823404 ns/op        7201524 B/op       16669 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11                 1940            624539 ns/op        6802911 B/op       11290 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11               1492            930987 ns/op        7199970 B/op       16668 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/NoExemplars-11                     1762            626126 ns/op        6807519 B/op       11290 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                   1461            825313 ns/op        7212226 B/op       16669 allocs/op
PASS
ok      github.com/grafana/pyroscope/pkg/querybackend   14.548s

Analysis:

  • ⚠️ Time: +40.8% overhead (607µs → 855µs baseline, ~30-40% across all time ranges)
  • ✅ Memory: +6.0% increase (~410KB on 6.5MB queries)
  • ⚠️ Allocs: +47.7% increase (11,289 → 16,670 allocations)

Overhead explanation:

When exemplars are enabled, additional data must be fetched:

  • Profile IDs: UUID column read + conversion
  • All labels: Complete label set instead of just groupBy subset
  • Additional processing: Profile ID matching and exemplar construction

The allocation increase comes primarily from fetching full label sets and processing profile IDs for each matching profile.

Overhead consistency across time ranges:

Overhead remains consistent:

  • 1 Minute: +28.0% time, +6% memory
  • 5 Minutes: +13.0% time, +6% memory
  • 15 Minutes: +49.0% time, +6% memory
  • 1 Hour: +31.8% time, +6% memory

Summary

Changes to default path (NoExemplars)

  • Time: No regression (p > 0.05 across all benchmarks)
  • ⚠️ Memory: +13.5% (~770KB) due to options pattern refactoring
  • Allocs: -1.4% (slight improvement)

Exemplar overhead

  • ⚠️ Time: +30-40% overhead
  • Memory: +6% (~410KB)
  • ⚠️ Allocs: +48% (fetching additional data)

Key findings

  • ✅ No time regression for users who don't enable exemplars
  • ✅ Memory overhead is minimal when exemplars are enabled (+6%)
  • ✅ Overhead scales linearly across time ranges (no performance cliffs)
  • ⚠️ Allocation overhead (+48%) - exemplars fetch profile IDs and complete label sets (room for improvement)
  • ⚠️ Time overhead (30-40%) (room for improvement)

Note: Exemplars feature will be opt-in. Users who don't request exemplars are unaffected by the time and allocation overhead. The overhead is inherent to fetching additional data (profile IDs and complete label sets), but future optimizations could reduce the impact if needed.

@marcsanmi marcsanmi requested a review from aleks-p as a code owner December 4, 2025 16:18
@marcsanmi marcsanmi requested a review from a team December 4, 2025 16:18
Base automatically changed from marcsanmi/profile-id-selector to main December 12, 2025 10:15
@marcsanmi marcsanmi requested review from a team, bryanhuhta and simonswine as code owners December 12, 2025 10:15
Copy link
Copy Markdown
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was quickly suspicious, as the different time ranges don't have different outcomes and rightly so: none of those benchmarks, hits points in time where the sample block contain data, therefore they only cover the cost for setting up a query.

I have experimented locally and this actually hits data:

diff --git a/pkg/querybackend/query_time_series_test.go b/pkg/querybackend/query_time_series_test.go
index 6d59509cf..aa40bdc44 100644
--- a/pkg/querybackend/query_time_series_test.go
+++ b/pkg/querybackend/query_time_series_test.go
@@ -158,10 +158,23 @@ func sanitizeMetadata(meta []*metastorev1.BlockMeta) {
 // runTimeSeriesQuery executes a timeseries query with the given parameters.
 func (f *benchmarkFixture) runTimeSeriesQuery(b *testing.B, req *queryv1.InvokeRequest) {
        b.Helper()
-       _, err := f.reader.Invoke(f.ctx, req)
+       resp, err := f.reader.Invoke(f.ctx, req)
        if err != nil {
                b.Fatalf("query failed: %v", err)
        }
+       for _, r := range resp.Reports {
+               if r.ReportType != queryv1.ReportType_REPORT_TIME_SERIES {
+                       continue
+               }
+               for _, s := range r.TimeSeries.TimeSeries {
+                       for _, p := range s.Points {
+                               if p.Value > 0 {
+                                       return
+                               }
+                       }
+               }
+       }
+       panic("no data found")
 }

 // makeTimeSeriesRequest creates a timeseries query request with the given parameters.
@@ -196,9 +209,6 @@ func (f *benchmarkFixture) makeTimeSeriesRequest(
 func BenchmarkTimeSeriesQuery(b *testing.B) {
        fixture := setupBenchmarkFixture(b)

-       now := time.Now()
-       oneHourAgo := now.Add(-1 * time.Hour)
-
        benchmarks := []struct {
                name         string
                exemplarType typesv1.ExemplarType
@@ -210,7 +220,7 @@ func BenchmarkTimeSeriesQuery(b *testing.B) {
        for _, bm := range benchmarks {
                b.Run(bm.name, func(b *testing.B) {
                        req := fixture.makeTimeSeriesRequest(
-                               oneHourAgo, now,
+                               startTime, startTime.Add(time.Hour),
                                "{}",
                                []string{"service_name"},
                                bm.exemplarType,
@@ -234,7 +244,6 @@ func BenchmarkTimeSeriesQuery(b *testing.B) {
 // Expected results: Overhead ratio should remain constant across time ranges.
 func BenchmarkTimeSeriesQuery_TimeRange(b *testing.B) {
        fixture := setupBenchmarkFixture(b)
-       now := time.Now()

        timeRanges := []struct {
                name     string
@@ -259,7 +268,7 @@ func BenchmarkTimeSeriesQuery_TimeRange(b *testing.B) {
                        for _, et := range exemplarTypes {
                                b.Run(et.name, func(b *testing.B) {
                                        req := fixture.makeTimeSeriesRequest(
-                                               now.Add(-tr.duration), now,
+                                               startTime, startTime.Add(tr.duration),
                                                "{}",
                                                []string{"service_name"},
                                                et.typ,

The slow down is higher, but it could be still in a acceptable range, we will find this better out with real data queriers.

In any case thanks for adding the benchmark they will help us optimising, when we are seeing the impact.

@marcsanmi
Copy link
Copy Markdown
Contributor Author

Performance Results

Now with the fix suggested by @simonswine

Command Used:

# Run all timeseries benchmarks (base + time range variants)
go test -bench=BenchmarkTimeSeriesQuery -benchmem ./pkg/querybackend/

Results:

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
BenchmarkTimeSeriesQuery/NoExemplars-11  	     394	   3033682 ns/op	11309543 B/op	   44800 allocs/op
BenchmarkTimeSeriesQuery/WithExemplars-11         	     159	   7406923 ns/op	21110159 B/op	  179542 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/NoExemplars-11         	    2103	    584807 ns/op	 6949807 B/op	   11888 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/WithExemplars-11       	    1550	    815741 ns/op	 7457302 B/op	   18729 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11        	     742	   1593618 ns/op	 8974527 B/op	   30157 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11      	     368	   3311989 ns/op	12882274 B/op	   86481 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11       	     406	   2915515 ns/op	11221687 B/op	   44297 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11     	     150	   9117493 ns/op	21095713 B/op	  177679 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/NoExemplars-11           	     373	   4062640 ns/op	11322721 B/op	   44808 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/WithExemplars-11         	     139	   8382624 ns/op	21175462 B/op	  179226 allocs/op
PASS
ok  	github.com/grafana/pyroscope/pkg/querybackend	18.313s

Basic Comparison (1 Hour Query)

Metric NoExemplars WithExemplars Overhead
Time 3.03ms 7.41ms +144% (2.4x)
Memory 10.79 MiB 20.13 MiB +87% (1.9x)
Allocations 44,800 179,542 +301% (4.0x)

@marcsanmi marcsanmi requested a review from simonswine December 16, 2025 11:13
@marcsanmi marcsanmi merged commit 0a5b09a into main Jan 14, 2026
20 checks passed
@marcsanmi marcsanmi deleted the marcsanmi/exemplars-benchmarks branch January 14, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants