Skip to content

tracer: avoid panic for unknown origins#1046

Merged
fabled merged 11 commits intoopen-telemetry:mainfrom
florianl:tracer-unknown-origin
Jan 19, 2026
Merged

tracer: avoid panic for unknown origins#1046
fabled merged 11 commits intoopen-telemetry:mainfrom
florianl:tracer-unknown-origin

Conversation

@florianl
Copy link
Copy Markdown
Member

@florianl florianl commented Jan 7, 2026

If a trace with an unknown origin is reported, eBPF profiler currently panics:

level=WARN msg="Skip handling trace from unexpected 4 origin"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xb76c39]

goroutine 97 [running]:
go.opentelemetry.io/ebpf-profiler/tracer.(*Tracer).startTraceEventMonitor.func1()
	/agent/tracer/events.go:218 +0x359
created by go.opentelemetry.io/ebpf-profiler/tracer.(*Tracer).startTraceEventMonitor in goroutine 1
	/agent/tracer/events.go:153 +0x248

If a trace with an unknown origin is reported, eBPF profiler currently panics:

```
level=WARN msg="Skip handling trace from unexpected 4 origin"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xb76c39]

goroutine 97 [running]:
go.opentelemetry.io/ebpf-profiler/tracer.(*Tracer).startTraceEventMonitor.func1()
	/agent/tracer/events.go:218 +0x359
created by go.opentelemetry.io/ebpf-profiler/tracer.(*Tracer).startTraceEventMonitor in goroutine 1
	/agent/tracer/events.go:153 +0x248
```

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@florianl florianl requested review from a team as code owners January 7, 2026 08:32
Comment thread tracer/events.go Outdated

// Keep track of min KTime seen in this batch processing loop
trace := t.loadBpfTrace(data.RawSample, data.CPU)
if trace == nil {
Copy link
Copy Markdown
Member

@christos68k christos68k Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that loadBpfTrace may panic in multiple spots (a behavior which we want to remove), maybe we can start by having it return an error to the caller in this PR. In a future PR, we could remove the panics by returning additional errors that trigger receiver shutdown.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated loadBpfTrace() to return an error instead of panicing.

In a future PR, we could remove the panics by returning additional errors that trigger receiver shutdown.

This is something for the log.Fatal() cases in other places, I think.

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Comment thread tracer/tracer.go
frameListOffs := int(unsafe.Offsetof(support.Trace{}.Frame_data))

if len(raw) < frameListOffs {
panic("trace record too small")
Copy link
Copy Markdown
Member

@christos68k christos68k Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep these panics for now until we have a functional equivalent (cleanly shutting down the receiver).

If we remove them here, we're changing the behavior as we're effectively ignoring these errors that were previously deemed serious enough to panic (we're also hiding this behavior from a future refactoring).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing panic() with error returns in loadBpfTrace() is something I would like to avoid.
The panic() here was replaced with a dedicated error. This allows in a future step a graceful shutdown on this dedicated error. As this case should not :Tm: happen in regular setups, as far as I know there is not a single report of such a panic in the past ~5 years, this error is logged as error at the moment.

Copy link
Copy Markdown
Member

@christos68k christos68k Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is that we're hiding information with this change, the current code encodes certain semantics: that this error is serious enough to panic. So we at least need to keep the "serious enough" part of these semantics intact, otherwise we're forcing ourselves to remember that these conditions should trigger shutdown. See my other suggestions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mimic the current semantics I added dedicated error handling for these panic() cases with a10309c and stop receiving and handling traces in user space.

The next step, a graceful complete shutdown, then should only be the next step with the handling of log.Fatal() cases in package tracer.

Comment thread tracer/tracer.go
Comment thread tracer/tracer.go
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Comment thread tracer/events.go Outdated
Comment thread tracer/events.go Outdated
Comment thread tracer/tracer.go
Comment on lines +948 to +949
return nil, fmt.Errorf("%d < %d: %w", len(raw), frameListOffs+frameDataLen,
errRecordUnexpectedSize)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about log the error here, and return err... directly to caller to determine what to do? Same for other similar places.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the fmt.Errorf() here, as I think it is important to have these additional information with the error to act on them later on. As control/program flows can change, I didn't want to put a log here and keep the log, that indicates the termination, close to the place, where the termination actual happens.

@florianl florianl requested a review from fabled January 16, 2026 13:23
Comment thread tracer/events.go
florianl and others added 3 commits January 17, 2026 11:30
Co-authored-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@florianl florianl requested a review from fabled January 17, 2026 10:41
Comment thread tracer/events.go Outdated
Co-authored-by: Timo Teräs <timo.teras@iki.fi>
@florianl florianl requested a review from fabled January 19, 2026 07:06
@fabled fabled merged commit 5afc0e5 into open-telemetry:main Jan 19, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants