perf: Add benchmarks for timeseries query (exemplars) performance by marcsanmi · Pull Request #4665 · grafana/pyroscope

marcsanmi · 2025-12-04T16:18:44Z

Adds benchmarks to measure and validate the performance of timeseries queries, particularly focusing on the exemplar collection overhead introduced in #4615.

Performance Results

Part 1: Refactoring Cost (NoExemplars vs weekly/f145)

We refactored profileEntryIterator to use a flexible options pattern. This section measures the cost of that refactoring even when exemplars are disabled.

Comparison Setup:

Baseline: weekly/f145 branch (old simple implementation, before exemplar PR)
Current: This branch with NoExemplars (new options pattern, exemplars disabled)

Commands Used:

# On weekly/f145 branch - copied test file and removed WithExemplars variant to make the benchmarks to work
go test -bench=BenchmarkTimeSeriesQuery -benchmem -count=10 ./pkg/querybackend/ > old.txt

# On exemplar branch with NoExemplars
git checkout marcsanmi/exemplars-benchmarks
go test -bench=BenchmarkTimeSeriesQuery -benchmem -count=10 ./pkg/querybackend/ > new.txt

# Compare
benchstat old.txt new.txt

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
                                                     │   old.txt    │               new.txt               │
                                                     │    sec/op    │    sec/op     vs base               │
TimeSeriesQuery/NoExemplars-11                         745.1µ ± 14%   753.3µ ± 12%       ~ (p=0.739 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       614.3µ ±  5%   620.5µ ±  4%       ~ (p=0.481 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      639.6µ ± 21%   639.3µ ± 15%       ~ (p=0.853 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     653.4µ ± 52%   655.6µ ± 12%       ~ (p=0.971 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         629.3µ ± 54%   631.6µ ± 17%       ~ (p=0.631 n=10)
TimeSeriesQuery/WithExemplars-11                                      831.5µ ± 22%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                    831.5µ ±  4%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                   848.3µ ± 91%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                  840.0µ ± 23%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                      836.2µ ± 22%
geomean                                                654.8µ         742.6µ        +0.55%

                                                     │   old.txt    │               new.txt                │
                                                     │     B/op     │     B/op      vs base                │
TimeSeriesQuery/NoExemplars-11                         5.708Mi ± 0%   6.479Mi ± 0%  +13.50% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       5.724Mi ± 0%   6.484Mi ± 0%  +13.26% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      5.718Mi ± 0%   6.485Mi ± 1%  +13.41% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     5.719Mi ± 1%   6.485Mi ± 0%  +13.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         5.730Mi ± 1%   6.484Mi ± 0%  +13.17% (p=0.000 n=10)
TimeSeriesQuery/WithExemplars-11                                      6.878Mi ± 0%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                    6.877Mi ± 0%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                   6.876Mi ± 1%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                  6.867Mi ± 0%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                      6.875Mi ± 0%
geomean                                                5.720Mi        6.676Mi       +13.35%

                                                     │   old.txt   │              new.txt               │
                                                     │  allocs/op  │  allocs/op   vs base               │
TimeSeriesQuery/NoExemplars-11                         11.45k ± 0%   11.29k ± 0%  -1.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       11.45k ± 0%   11.29k ± 0%  -1.39% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      11.45k ± 0%   11.29k ± 0%  -1.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     11.45k ± 0%   11.29k ± 0%  -1.39% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         11.45k ± 0%   11.29k ± 0%  -1.40% (p=0.000 n=10)
TimeSeriesQuery/WithExemplars-11                                     16.67k ± 0%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                   16.67k ± 0%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                  16.67k ± 0%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                 16.67k ± 0%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                     16.67k ± 0%
geomean                                                11.45k        13.72k       -1.39%

Analysis:

✅ Time: No statistically significant regression (p > 0.05, shown as ~)
⚠️ Memory: +13.5% increase (~770KB on 6MB queries)
✅ Allocs: Slight improvement (-1.4%)

Potential memory trade-off explanation:

The refactored profileEntryIterator uses a flexible options pattern with:

Dynamic queryColumns slice (grows based on requested features)
Dynamic processor slice of closures
Column priority sorting logic

Part 2: Exemplar Feature Overhead (NoExemplars vs WithExemplars)

This section measures the additional cost of enabling exemplars on top of the refactored baseline.

Command Used:

#Run all timeseries benchmarks (base + time range variants)
go test -bench=BenchmarkTimeSeriesQuery -benchmem ./pkg/querybackend/

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
BenchmarkTimeSeriesQuery/NoExemplars-11             1870            607338 ns/op         6811441 B/op      11289 allocs/op
BenchmarkTimeSeriesQuery/WithExemplars-11                   1441            855327 ns/op         7220978 B/op     16670 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/NoExemplars-11                   1789            657579 ns/op        6808980 B/op       11289 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                 1422            842359 ns/op        7206427 B/op       16669 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11                  1874            728544 ns/op        6800535 B/op       11289 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                1383            823404 ns/op        7201524 B/op       16669 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11                 1940            624539 ns/op        6802911 B/op       11290 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11               1492            930987 ns/op        7199970 B/op       16668 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/NoExemplars-11                     1762            626126 ns/op        6807519 B/op       11290 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                   1461            825313 ns/op        7212226 B/op       16669 allocs/op
PASS
ok      github.com/grafana/pyroscope/pkg/querybackend   14.548s

Analysis:

⚠️ Time: +40.8% overhead (607µs → 855µs baseline, ~30-40% across all time ranges)
✅ Memory: +6.0% increase (~410KB on 6.5MB queries)
⚠️ Allocs: +47.7% increase (11,289 → 16,670 allocations)

Overhead explanation:

When exemplars are enabled, additional data must be fetched:

Profile IDs: UUID column read + conversion
All labels: Complete label set instead of just groupBy subset
Additional processing: Profile ID matching and exemplar construction

The allocation increase comes primarily from fetching full label sets and processing profile IDs for each matching profile.

Overhead consistency across time ranges:

Overhead remains consistent:

1 Minute: +28.0% time, +6% memory
5 Minutes: +13.0% time, +6% memory
15 Minutes: +49.0% time, +6% memory
1 Hour: +31.8% time, +6% memory

Summary

Changes to default path (NoExemplars)

✅ Time: No regression (p > 0.05 across all benchmarks)
⚠️ Memory: +13.5% (~770KB) due to options pattern refactoring
✅ Allocs: -1.4% (slight improvement)

Exemplar overhead

⚠️ Time: +30-40% overhead
✅ Memory: +6% (~410KB)
⚠️ Allocs: +48% (fetching additional data)

Key findings

✅ No time regression for users who don't enable exemplars
✅ Memory overhead is minimal when exemplars are enabled (+6%)
✅ Overhead scales linearly across time ranges (no performance cliffs)
⚠️ Allocation overhead (+48%) - exemplars fetch profile IDs and complete label sets (room for improvement)
⚠️ Time overhead (30-40%) (room for improvement)

Note: Exemplars feature will be opt-in. Users who don't request exemplars are unaffected by the time and allocation overhead. The overhead is inherent to fetching additional data (profile IDs and complete label sets), but future optimizations could reduce the impact if needed.

simonswine

I was quickly suspicious, as the different time ranges don't have different outcomes and rightly so: none of those benchmarks, hits points in time where the sample block contain data, therefore they only cover the cost for setting up a query.

I have experimented locally and this actually hits data:

diff --git a/pkg/querybackend/query_time_series_test.go b/pkg/querybackend/query_time_series_test.go
index 6d59509cf..aa40bdc44 100644
--- a/pkg/querybackend/query_time_series_test.go
+++ b/pkg/querybackend/query_time_series_test.go
@@ -158,10 +158,23 @@ func sanitizeMetadata(meta []*metastorev1.BlockMeta) {
 // runTimeSeriesQuery executes a timeseries query with the given parameters.
 func (f *benchmarkFixture) runTimeSeriesQuery(b *testing.B, req *queryv1.InvokeRequest) {
        b.Helper()
-       _, err := f.reader.Invoke(f.ctx, req)
+       resp, err := f.reader.Invoke(f.ctx, req)
        if err != nil {
                b.Fatalf("query failed: %v", err)
        }
+       for _, r := range resp.Reports {
+               if r.ReportType != queryv1.ReportType_REPORT_TIME_SERIES {
+                       continue
+               }
+               for _, s := range r.TimeSeries.TimeSeries {
+                       for _, p := range s.Points {
+                               if p.Value > 0 {
+                                       return
+                               }
+                       }
+               }
+       }
+       panic("no data found")
 }

 // makeTimeSeriesRequest creates a timeseries query request with the given parameters.
@@ -196,9 +209,6 @@ func (f *benchmarkFixture) makeTimeSeriesRequest(
 func BenchmarkTimeSeriesQuery(b *testing.B) {
        fixture := setupBenchmarkFixture(b)

-       now := time.Now()
-       oneHourAgo := now.Add(-1 * time.Hour)
-
        benchmarks := []struct {
                name         string
                exemplarType typesv1.ExemplarType
@@ -210,7 +220,7 @@ func BenchmarkTimeSeriesQuery(b *testing.B) {
        for _, bm := range benchmarks {
                b.Run(bm.name, func(b *testing.B) {
                        req := fixture.makeTimeSeriesRequest(
-                               oneHourAgo, now,
+                               startTime, startTime.Add(time.Hour),
                                "{}",
                                []string{"service_name"},
                                bm.exemplarType,
@@ -234,7 +244,6 @@ func BenchmarkTimeSeriesQuery(b *testing.B) {
 // Expected results: Overhead ratio should remain constant across time ranges.
 func BenchmarkTimeSeriesQuery_TimeRange(b *testing.B) {
        fixture := setupBenchmarkFixture(b)
-       now := time.Now()

        timeRanges := []struct {
                name     string
@@ -259,7 +268,7 @@ func BenchmarkTimeSeriesQuery_TimeRange(b *testing.B) {
                        for _, et := range exemplarTypes {
                                b.Run(et.name, func(b *testing.B) {
                                        req := fixture.makeTimeSeriesRequest(
-                                               now.Add(-tr.duration), now,
+                                               startTime, startTime.Add(tr.duration),
                                                "{}",
                                                []string{"service_name"},
                                                et.typ,

The slow down is higher, but it could be still in a acceptable range, we will find this better out with real data queriers.

In any case thanks for adding the benchmark they will help us optimising, when we are seeing the impact.

marcsanmi · 2025-12-16T11:08:48Z

Performance Results

Now with the fix suggested by @simonswine

Command Used:

# Run all timeseries benchmarks (base + time range variants)
go test -bench=BenchmarkTimeSeriesQuery -benchmem ./pkg/querybackend/

Results:

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
BenchmarkTimeSeriesQuery/NoExemplars-11  	     394	   3033682 ns/op	11309543 B/op	   44800 allocs/op
BenchmarkTimeSeriesQuery/WithExemplars-11         	     159	   7406923 ns/op	21110159 B/op	  179542 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/NoExemplars-11         	    2103	    584807 ns/op	 6949807 B/op	   11888 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/WithExemplars-11       	    1550	    815741 ns/op	 7457302 B/op	   18729 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11        	     742	   1593618 ns/op	 8974527 B/op	   30157 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11      	     368	   3311989 ns/op	12882274 B/op	   86481 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11       	     406	   2915515 ns/op	11221687 B/op	   44297 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11     	     150	   9117493 ns/op	21095713 B/op	  177679 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/NoExemplars-11           	     373	   4062640 ns/op	11322721 B/op	   44808 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/WithExemplars-11         	     139	   8382624 ns/op	21175462 B/op	  179226 allocs/op
PASS
ok  	github.com/grafana/pyroscope/pkg/querybackend	18.313s

Basic Comparison (1 Hour Query)

Metric	NoExemplars	WithExemplars	Overhead
Time	3.03ms	7.41ms	+144% (2.4x)
Memory	10.79 MiB	20.13 MiB	+87% (1.9x)
Allocations	44,800	179,542	+301% (4.0x)

marcsanmi added 2 commits December 2, 2025 12:14

feat: Add profile_id_selector for individual profile retrieval

6c67b87

perf: Add benchmarks for timeseries query (exemplars) performance

68b2dc3

marcsanmi requested a review from aleks-p as a code owner December 4, 2025 16:18

marcsanmi requested a review from a team December 4, 2025 16:18

Base automatically changed from marcsanmi/profile-id-selector to main December 12, 2025 10:15

marcsanmi requested review from a team, bryanhuhta and simonswine as code owners December 12, 2025 10:15

simonswine reviewed Dec 12, 2025

View reviewed changes

address review

d5203ae

Address merge conflicts

f23fe38

marcsanmi requested a review from simonswine December 16, 2025 11:13

simonswine approved these changes Jan 13, 2026

View reviewed changes

marcsanmi merged commit 0a5b09a into main Jan 14, 2026
20 checks passed

marcsanmi deleted the marcsanmi/exemplars-benchmarks branch January 14, 2026 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add benchmarks for timeseries query (exemplars) performance#4665

perf: Add benchmarks for timeseries query (exemplars) performance#4665
marcsanmi merged 4 commits intomainfrom
marcsanmi/exemplars-benchmarks

marcsanmi commented Dec 4, 2025

Uh oh!

simonswine left a comment

Uh oh!

marcsanmi commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcsanmi commented Dec 4, 2025

Performance Results

Part 1: Refactoring Cost (NoExemplars vs weekly/f145)

Part 2: Exemplar Feature Overhead (NoExemplars vs WithExemplars)

Summary

Changes to default path (NoExemplars)

Exemplar overhead

Key findings

Uh oh!

simonswine left a comment

Choose a reason for hiding this comment

Uh oh!

marcsanmi commented Dec 16, 2025

Performance Results

Basic Comparison (1 Hour Query)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants