Add dynamic sampling frequency update capability #885

Haoning-Sun · 2025-10-24T01:18:22Z

This commit adds a new interface to the tracer, which can be used to dynamically update the sampling frequency.

linux-foundation-easycla · 2025-10-24T01:18:30Z

The committers listed above are authorized under a signed CLA.

✅ login: Haoning-Sun / name: Haoning Sun (2ae6553, 327901a, 562131b, 5d99fad, 6fa943c, 7a51bed, 894e4cf, 8e8e35f, a65bfcd, bb943e4)

florianl

Thanks for your contribution.
Do you have a particular use case for this?

florianl · 2025-10-24T05:50:11Z

tracer/tracer.go

+
+	// Calculate the new sampling period in nanoseconds
+	// period = 1 second / frequency
+	period := uint64(time.Second.Nanoseconds() / int64(newSamplesPerSecond))


This conversation is tricky, I think. Can we be sure that the event rate is always (on x86 and arm64) in nano seconds?

The method for calculating the period in the perf_event initialization process is the same. It is a software event, and perhaps its definition is independent of the hardware architecture.
https://github.com/torvalds/linux/blob/6fab32bb6508abbb8b7b1c5498e44f0c32320ed5/kernel/events/core.c#L11850

How did you verify the proposed changes?
When setting the sampling frequency initaly to 20Hz and printing the sampling frequency with every call in native_tracer_entry(), it shows me 50000000. Which is expected, as 1000000000/20 = 50000000.

But if updating the period some time later to 100 (samples per second) with the proposed function, I did expect to get a value like 10000000. Instead I got 50000000 (same value as for 20Hz) and the number of reported events did also not increase on the user space side.

It works for me locally (15 cores):

WARN[0004] 374 EV/s WARN[0005] 300 EV/s WARN[0006] 375 EV/s WARN[0007] 374 EV/s WARN[0008] 300 EV/s INFO[0008] Sampling frequency updated to 120 Hz WARN[0009] 1845 EV/s WARN[0011] 2085 EV/s WARN[0012] 1920 EV/s WARN[0013] 1875 EV/s WARN[0014] 1800 EV/s WARN[0015] 1845 EV/s WARN[0016] 2264 EV/s WARN[0017] 2408 EV/s WARN[0018] 2006 EV/s WARN[0020] 1889 EV/s

I used it in parça and used it to see the change in the number of samples.

Haoning-Sun · 2025-10-24T09:47:38Z

Thanks for your contribution. Do you have a particular use case for this?

We hope to change the sampling frequency without recreating the tracer.

christos68k · 2025-10-24T14:51:03Z

tracer/tracer.go

+
+	var updateErrors []error
+	for i, event := range *events {
+		if err := event.UpdatePeriod(period); err != nil {


This doesn't change the size of the per-CPU buffers where the events are written, so we can expect to see lots of dropped events with higher sampling frequencies and full CPU utilization.

It seems that considering the overall situation, it is not appropriate to use the event method directly, right?

One way is to introduce a maxSamplingFrequency configuration option and size the per-CPU buffers by it (worst-case). Then in the UpdateSamplingFrequency method you would not allow the frequency to exceed the configured max.

Keep in mind though that the profiler isn't currently designed with very high (e.g. in the thousands) sampling frequencies in mind and there will be unexpected issues when that's the case.

Good idea, I've added the configuration for it.

tracer/tracer.go

Use errors.Join to combine updateErrors Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>

florianl · 2025-10-28T10:44:24Z

Please also make sure, that package reporter also updates its SamplesPerSecond and reports the correct frequency.

Haoning-Sun · 2025-10-29T02:07:55Z

Please also make sure, that package reporter also updates its SamplesPerSecond and reports the correct frequency.

Added it in controller and reporter.

florianl · 2025-10-29T08:09:07Z

cli_flags.go

 const (
 	// Default values for CLI flags
 	defaultArgSamplesPerSecond    = 20
+	defaultArgMaxSamplesPerSecond = 40


We should not set such a low number of a maximum number of samples per second. Overall I think, it would be best to just not set a maximum number anyway. The maximum should be defined by the CPU core frequency and perf and not limited this way.

Suggested change

defaultArgMaxSamplesPerSecond = 40

On a regular base I'm running the profiler with 503Hz and it is totally fine. It generates, as expected, way more data than 19Hz but it also provides more insights.

We need a maximum as it will be used to size the maps (at least for now, without restarting tracer). We can have the default fallback be zero which means set the internal maximum to args.SamplesPerSecond, essentially disallowing runtime sampling frequency changes (identical behavior to what we have today).

If the user sets a greater than zero value, we need to validate that it is also greater than args.SamplesPerSecond.

Yeah, that makes sense.

Maybe a different approach:
What about taking the configured sampling frequency as upper limit and only allow to reduce the sampling frequency?

Allowing only the reduction of the sampling frequency also makes sure that all caches are correctly sized for the highest configured frequency.

Sizing the caches by maxFrequency as in this PR has the same effect: the caches are correctly sized for the highest possible frequency but in addition, we don't default to the more expensive CPU-wise mode of operation.

From a user perspective it might be easier to just allow down sampling as described. It also reduces logic complexity in various places.

tracer/events.go

christos68k · 2025-10-29T09:50:15Z

cli_flags.go

 const (
 	// Default values for CLI flags
 	defaultArgSamplesPerSecond    = 20
+	defaultArgMaxSamplesPerSecond = 40


We need a maximum as it will be used to size the maps (at least for now, without restarting tracer). We can have the default fallback be zero which means set the internal maximum to args.SamplesPerSecond, essentially disallowing runtime sampling frequency changes (identical behavior to what we have today).

If the user sets a greater than zero value, we need to validate that it is also greater than args.SamplesPerSecond.

cli_flags.go

Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>

florianl · 2025-10-30T08:31:13Z

reporter/internal/pdata/pdata.go

 }
+
+// UpdateSamplingFrequency updates the sampling frequency used in profile generation.
+func (p *Pdata) UpdateSamplingFrequency(samplesPerSecond int) {


If updating sampling frequency gets supported, some kind of report current frames now might be required.
Otherwise we come into the situation that package reporter already holds some data (with the old frequency), the sampling frequency gets updated, and with the next reporting cycle a mix of data with the old and new sampling frequency will be sent. This will introduce an unfortunate challenge for UIs and backends if they can't rely on the sampling frequency.

A Flush method was added to the reporter, but it doesn't completely solve the problem.

I don't think Flush is the right solution here. It introduces semantic complexity regarding who initiates reporting operations. Previously, there was a clear owner of reporting operation, the runloop goroutine.

Now, we're introducing another owner who may call reportProfile.

A better way to solve the sampling frequency mismatch is to annotate the samples with the sampling frequency in the kernel. But I don't think we need to expand the scope of this PR to include that, it can be done separately. In many cases, the temporary sampling frequency mismatch resulting from UpdateFrequency will be noise.

florianl · 2025-10-30T11:17:25Z

internal/controller/controller.go

+	if c.tracer == nil {
+		return fmt.Errorf("tracer is not initialized")
+	}
+


Should the controller be responsible to check the configuration before it is passed on to other parts, like tracer or reporter?

Added check condition.

github-actions · 2026-01-08T08:25:10Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Add dynamic sampling frequency update capability

8e8e35f

Haoning-Sun requested review from a team as code owners October 24, 2025 01:18

Merge branch 'main' into update-sampling-frequency

a65bfcd

florianl reviewed Oct 24, 2025

View reviewed changes

christos68k reviewed Oct 24, 2025

View reviewed changes

tracer/tracer.go Outdated Show resolved Hide resolved

christos68k mentioned this pull request Oct 27, 2025

Make tracer Close release all resources to prevent leaks #895

Open

Haoning-Sun and others added 2 commits October 28, 2025 17:42

Add max sampling frequency for tracer with sized event buffers

bb943e4

Update tracer/tracer.go

6fa943c

Use errors.Join to combine updateErrors Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>

Add UpdateSamplingFrequency method for reporter

2ae6553

florianl reviewed Oct 29, 2025

View reviewed changes

christos68k reviewed Oct 29, 2025

View reviewed changes

Haoning-Sun and others added 3 commits October 30, 2025 09:10

Update cli_flags.go

5d99fad

Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>

Update cli_flags.go

327901a

Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>

Update tracer/events.go

562131b

Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>

florianl reviewed Oct 30, 2025

View reviewed changes

Flush the data before updating the report

894e4cf

Haoning-Sun force-pushed the update-sampling-frequency branch from ae81668 to 894e4cf Compare November 3, 2025 15:22

Check newSamplesPerSecond in UpdateSamplingFrequency

7a51bed

Haoning-Sun force-pushed the update-sampling-frequency branch from 90e8e4f to 7a51bed Compare November 3, 2025 15:35

github-actions bot added the Stale label Jan 8, 2026

Add dynamic sampling frequency update capability #885

Are you sure you want to change the base?

Add dynamic sampling frequency update capability #885

Uh oh!

Conversation

Haoning-Sun commented Oct 24, 2025

Uh oh!

linux-foundation-easycla bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

florianl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Haoning-Sun Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Haoning-Sun commented Oct 24, 2025

Uh oh!

christos68k Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

florianl commented Oct 28, 2025

Uh oh!

Haoning-Sun commented Oct 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

florianl Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

linux-foundation-easycla bot commented Oct 24, 2025 •

edited

Loading

Haoning-Sun Oct 24, 2025 •

edited

Loading

christos68k Oct 24, 2025 •

edited

Loading

christos68k Oct 27, 2025 •

edited

Loading

florianl Oct 30, 2025 •

edited

Loading

christos68k Oct 30, 2025 •

edited

Loading