Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions processmanager/helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
package processmanager // import "go.opentelemetry.io/ebpf-profiler/processmanager"

import (
"sync"

lru "github.com/elastic/go-freelru"
log "github.com/sirupsen/logrus"

Expand Down Expand Up @@ -49,6 +51,7 @@ var _ FileIDMapper = (*lruFileIDMapper)(nil)
// MapFileIDMapper implements the FileIDMApper using a map (for testing)
type MapFileIDMapper struct {
fileMap map[host.FileID]libpf.FrameMappingFile
mu sync.Mutex
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this additional mutex might introduce a potential deadlock. Introduction the function releaseResources() and using the global processmanager lock should be sufficient, shouldn't it?

As far as I can tell, MapFileIDMapper is used only in the scope of processmanager - except for coredumps. So to prevent potential deadlocks, maybe just use the processmanager lock?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this can deadlock. Let me try understand this

Copy link
Copy Markdown
Contributor Author

@korniltsev-grafanista korniltsev-grafanista Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduction the function releaseResources() and using the global processmanager lock should be sufficient, shouldn't it?

It is not. FileIDMapper.Get is called from the trace handling goroutine under no lock. and FileIDMapper.Set is called from another goroutine, also under no lock.

In practice the synchronization happens inside of type lruFileIDMapper struct { cache *lru.SyncedLRU[host.FileID, libpf.FrameMappingFile]} .

The MapFileIDMapper is only used in tests and with this PR the synchronization also happens inside the implementation of the interface, same way as in LRU implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@florianl do you mind explaining a bit further what deadlock do you have in mind? I could not fathom this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the scenario, where this package runs as part of ebpf-profiler, the deadlock is limited (assuming we catch it in review). But as the FileIDMapper interface and its implementations are public, nothing stops external parties from interacting with it. So making the API more private, could be a first step.

And to make sure, accessing lurFileIDMapper is safe, we could just switch to ShardedLRU instead of SyncedLRU. With the type switch we would also make sure, that changes to the lruFileIDMapper implementation stay safe to use, and sync.Mutex does not need to be called explicitly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this too, I don't see a deadlock here

At the moment there is no deadlock in ebpf-profiler. But there is a limited risk for a deadlock if processmanager evolves and the FileIDMapper lock is hold before the procressmanager lock is fetched. But as stated, this could be hopefully identified in a review.
The risk for a deadlock is different for elements that import and use this package. They can directly interact (via Get and Set) with FileIDMapper. For that reason I suggest to limit the API and lock it down

So making the API more private, could be a first step.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do note that #749 will remove the fileidmapper completely.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also working to remove the ReleaseResources stage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And #909 removes the ReleaseResources stage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIce, I will just close this one

}

func NewMapFileIDMapper() *MapFileIDMapper {
Expand All @@ -58,13 +61,17 @@ func NewMapFileIDMapper() *MapFileIDMapper {
}

func (fm *MapFileIDMapper) Get(key host.FileID) (libpf.FrameMappingFile, bool) {
fm.mu.Lock()
defer fm.mu.Unlock()
if value, ok := fm.fileMap[key]; ok {
return value, true
}
return libpf.FrameMappingFile{}, true
}

func (fm *MapFileIDMapper) Set(key host.FileID, value libpf.FrameMappingFile) {
fm.mu.Lock()
defer fm.mu.Unlock()
fm.fileMap[key] = value
}

Expand Down
17 changes: 11 additions & 6 deletions processmanager/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -418,12 +418,7 @@ func (pm *ProcessManager) HandleTrace(bpfTrace *host.Trace) {
pm.frameCacheHit.Add(cacheHit)
}

// Release resources that were used to symbolize this stack.
for _, instance := range pm.interpreters[pid] {
if err := instance.ReleaseResources(); err != nil {
log.Warnf("Failed to release resources for %d: %v", pid, err)
}
}
pm.releaseResources(pid)

trace.Hash = traceutil.HashTrace(trace)
meta.APMServiceName = pm.maybeNotifyAPMAgent(bpfTrace, trace.Hash, 1)
Expand All @@ -432,3 +427,13 @@ func (pm *ProcessManager) HandleTrace(bpfTrace *host.Trace) {
log.Errorf("Failed to report trace event: %v", err)
}
}

func (pm *ProcessManager) releaseResources(pid libpf.PID) {
pm.mu.RLock()
defer pm.mu.RUnlock()
for _, instance := range pm.interpreters[pid] {
if err := instance.ReleaseResources(); err != nil {
log.Warnf("Failed to release resources for %d: %v", pid, err)
}
}
}
97 changes: 95 additions & 2 deletions processmanager/manager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ package processmanager

import (
"context"
"debug/elf"
"errors"
"fmt"
"math/rand/v2"
Expand All @@ -26,6 +27,8 @@ import (
"go.opentelemetry.io/ebpf-profiler/process"
pmebpf "go.opentelemetry.io/ebpf-profiler/processmanager/ebpfapi"
"go.opentelemetry.io/ebpf-profiler/remotememory"
"go.opentelemetry.io/ebpf-profiler/reporter"
"go.opentelemetry.io/ebpf-profiler/reporter/samples"
"go.opentelemetry.io/ebpf-profiler/support"
tracertypes "go.opentelemetry.io/ebpf-profiler/tracer/types"
"go.opentelemetry.io/ebpf-profiler/util"
Expand All @@ -36,7 +39,8 @@ import (

// dummyProcess implements pfelf.Process for testing purposes
type dummyProcess struct {
pid libpf.PID
pid libpf.PID
getMappings func() ([]process.Mapping, uint32, error)
}

func (d *dummyProcess) PID() libpf.PID {
Expand All @@ -56,7 +60,10 @@ func (d *dummyProcess) GetExe() (string, error) {
}

func (d *dummyProcess) GetMappings() ([]process.Mapping, uint32, error) {
return nil, 0, errors.New("not implemented")
if d.getMappings == nil {
return nil, 0, errors.New("not implemented")
}
return d.getMappings()
}

func (d *dummyProcess) GetThreads() ([]process.ThreadInfo, error) {
Expand Down Expand Up @@ -173,6 +180,8 @@ type ebpfMapsMockup struct {
deletePidPageMappingCount uint8
// expectedBias value for updatedPidPageToExeIDOffset calls
expectedBias uint64
// updatePidPageMappingInfo allows to completely mock UpdatePidPageMappingInfo
updatePidPageMappingInfo func(pid libpf.PID, prefix lpm.Prefix, fileID uint64, bias uint64) error
}

var _ interpreter.EbpfHandler = &ebpfMapsMockup{}
Expand Down Expand Up @@ -232,6 +241,9 @@ func (mockup *ebpfMapsMockup) DeleteStackDeltaPage(host.FileID, uint64) error {

func (mockup *ebpfMapsMockup) UpdatePidPageMappingInfo(pid libpf.PID, prefix lpm.Prefix,
fileID uint64, bias uint64) error {
if mockup.updatePidPageMappingInfo != nil {
return mockup.updatePidPageMappingInfo(pid, prefix, fileID, bias)
}
if prefix.Key == 0 && fileID == 0 && bias == 0 {
// If all provided values are 0 the hook was called to create
// a dummy entry.
Expand Down Expand Up @@ -515,3 +527,84 @@ func TestProcExit(t *testing.T) {
})
}
}

func TestConcurrentSyncTraceHandling(t *testing.T) {
const exe = "/proc/self/exe"
fid, err := libpf.FileIDFromExecutableFile(exe)
require.NoError(t, err)

nextProcess := func(pid int, exe string) *dummyProcess {
return &dummyProcess{
pid: libpf.PID(pid),
getMappings: func() ([]process.Mapping, uint32, error) {
return []process.Mapping{
{
Vaddr: 0xcafe00000,
Length: 0x3445c0,
Flags: elf.PF_X | elf.PF_R,
FileOffset: 0,
Device: 1,
Inode: 2,
Path: libpf.Intern(exe),
},
}, 0, nil
},
}
}

dummyProvider := &dummyStackDeltaProvider{}
ebpfMockup := &ebpfMapsMockup{}
ebpfMockup.updatePidPageMappingInfo = func(pid libpf.PID, prefix lpm.Prefix, fileID uint64, bias uint64) error {
return nil
}

interpreters, err := tracertypes.Parse("go")
require.NoError(t, err)

ctx, cancel := context.WithCancel(t.Context())
defer cancel()

manager, err := New(ctx,
interpreters,
1*time.Second,
ebpfMockup,
NewMapFileIDMapper(),
reporter.TraceReporterFunc(func(trace *libpf.Trace, meta *samples.TraceEventMeta) error {
return nil
}),
nil,
dummyProvider,
true,
libpf.Set[string]{})
require.NoError(t, err)

manager.metricsAddSlice = func(m []metrics.Metric) {}

ctx, cancel = context.WithTimeout(ctx, 100*time.Millisecond)
defer cancel()
go func() {
pid := 0
for ctx.Err() == nil {
pid++
manager.SynchronizeProcess(nextProcess(pid, exe))
}
}()
go func() {
pid := 0
for ctx.Err() == nil {
pid++
pid &= 0xff
trace := &host.Trace{
Frames: []host.Frame{{
File: host.FileIDFromLibpf(fid),
Type: libpf.NativeFrame,
}},
PID: libpf.PID(pid),
TID: libpf.PID(pid),
}
manager.HandleTrace(trace)
}
}()

<-ctx.Done()
}
6 changes: 6 additions & 0 deletions reporter/iface.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ type TraceReporter interface {
ReportTraceEvent(trace *libpf.Trace, meta *samples.TraceEventMeta) error
}

type TraceReporterFunc func(trace *libpf.Trace, meta *samples.TraceEventMeta) error

func (f TraceReporterFunc) ReportTraceEvent(trace *libpf.Trace, meta *samples.TraceEventMeta) error {
return f(trace, meta)
}

type ExecutableMetadata struct {
// MappingFile is the reference to mapping file data.
MappingFile libpf.FrameMappingFile
Expand Down