Skip to content

kallsyms: add update trigger on bpf_ksym_add#1019

Merged
christos68k merged 25 commits intoopen-telemetry:mainfrom
florianl:issue-938
Jan 26, 2026
Merged

kallsyms: add update trigger on bpf_ksym_add#1019
christos68k merged 25 commits intoopen-telemetry:mainfrom
florianl:issue-938

Conversation

@florianl
Copy link
Copy Markdown
Member

Notify Symbolizer for kallsyms about changes to eBPF programs.

Fixes #938

Notify Symbolizer for kallsyms about changes to eBPF programs.

Fixes open-telemetry#938

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@florianl florianl requested review from a team as code owners December 12, 2025 14:32
Comment thread kallsyms/kallsyms.go Outdated
Comment thread kallsyms/kallsyms.go Outdated
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Comment thread kallsyms/kallsyms.go Outdated
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Comment thread kallsyms/kallsyms.go Outdated
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this seems leanest implementation to me.

I think the reload triggering needs more work.

I'd also avoid the kernel version check (and related changes to package moving). Mostly because redhat (and some others) maintain their own stuff and pure version check can result in wrong behavior.

Comment thread kallsyms/kallsyms.go Outdated
Comment thread tracer/tracer.go Outdated
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Comment thread support/ebpf/kallsyms.ebpf.c
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Comment thread kallsyms/kallsyms.go Outdated
Comment thread tracer/tracer.go Outdated
Comment thread kallsyms/kallsyms.go Outdated
Comment thread tracer/tracer.go
@florianl florianl requested a review from fabled January 16, 2026 13:15
@fabled fabled changed the title kallsysms: add update trigger on bpf_ksym_add kallsyms: add update trigger on bpf_ksym_add Jan 20, 2026
Comment thread kallsyms/kallsyms.go Outdated
Comment thread support/ebpf/tracemgmt.h Outdated
florianl and others added 4 commits January 21, 2026 08:04
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@florianl florianl requested a review from christos68k January 22, 2026 09:01
Comment thread tracer/tracer.go
}

if err = t.attachToKallsymsUpdates(); err != nil {
return err
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should cleanup/disable the perfEvents here (as nobody is calling Tracer.Close if AttachTracer fails).

Copy link
Copy Markdown
Member Author

@florianl florianl Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracer.Close() is called via ctlr.Shutdown().
So, if AttachTracer() fails then ctlr.Start() fails and it will trigger the cleanup.

Copy link
Copy Markdown
Member

@christos68k christos68k Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT from reading this code, this is not executed when the profiler is running as a receiver inside OTel collector.

It's also better to deal with the in-between private state inside the method -as it's the method that created it-, rather than propagate this private state outside and depend on an external caller of a different method (as we're then introducing a time-window where there's invalid private state that depends on further action for cleanup).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path is executed by the OTel collector, as BuildProfilesReceiver() (and xreceiver.Profiles) satisfies the component.Factory interface with its Shutdown(context.Context) error function. Here we implement it.

Copy link
Copy Markdown
Member

@christos68k christos68k Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the interface docstring, I don't see anything that mentions Shutdown being called when Start fails.

Regardless of that, it's not right to delay cleanup in this case and introduce implicit dependencies on external components: it violates encapsulation and the principle of single responsibility, thus making the code harder to reason about.

It's this method that introduces this in-between state and it should be this method that cleans it up when it fails (this is similar to RAII).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addded terminatePerfEvents() as cleanup routine for failures with d391075

Comment thread tracer/events.go Outdated
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@christos68k christos68k merged commit db8925c into open-telemetry:main Jan 26, 2026
32 checks passed
@bobrik
Copy link
Copy Markdown
Contributor

bobrik commented Jan 27, 2026

Doing full kallsyms re-parsing might be a bit too heavy handed here:

image
ivan@748m17:~$ time cat /proc/kallsyms > /dev/null

real    0m0.878s
user    0m0.004s
sys     0m0.872s

Just running bpftool prog loads a bunch of programs:

ivan@748m17:~$ sudo strace -f -e bpf bpftool prog 2>&1 | grep BPF_PROG_LOAD
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffc88c97fc0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 152) = 3
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffc88c97be0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 148) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffc88c97df0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 148) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffc88c97a70, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="libbpf_nametest", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 148) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=4, insns=0x7ffc88c97b70, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="det_arg_ctx", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=4, func_info_rec_size=8, func_info=0x7ffc88c97b60, func_info_cnt=2, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 148) = 5
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=5, insns=0x7ffc88c97ba0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 148) = 6
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=6, insns=0x7ffc88c97c70, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 148) = 5
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACING, insn_cnt=137, insns=0x56257fd162e0, license="Dual BSD/GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(6, 12, 66), prog_flags=0, prog_name="iter", prog_ifindex=0, expected_attach_type=BPF_TRACE_ITER, prog_btf_fd=4, func_info_rec_size=8, func_info=0x56257fd15840, func_info_cnt=1, line_info_rec_size=16, line_info=0x56257fd22f40, line_info_cnt=44, attach_btf_id=30369, attach_prog_fd=0, fd_array=NULL}, 148) = 5
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffc88c97a70, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 148) = 7

We rely on this for bpf runtime time accounting pretty heavily.

We apparently also have a steady stream of bpf programs loaded in production.

Given that we have all the info in the kprobe arguments (start, end, name), should we maybe use that directly?

struct bpf_ksym {
	unsigned long		 start;
	unsigned long		 end;
	char			 name[KSYM_NAME_LEN];
	struct list_head	 lnode;
	struct latch_tree_node	 tnode;
	bool			 prog;
	u32			 fp_start;
	u32			 fp_end;
};

@fabled
Copy link
Copy Markdown
Contributor

fabled commented Jan 27, 2026

Yes, reading full kallsyms is very cpu heavy. Tracking and queueing the probes can be problematic too.

I wonder if kallsyms can be seeked, and we could just read the non-module portion of it from the end directly.

@bobrik
Copy link
Copy Markdown
Contributor

bobrik commented Jan 27, 2026

It's interesting that bpf_get_kallsym is the most expensive bit of it too.

I cross-checked it with just perf:

ivan@748m17:~$ sudo perf record -g -- cat /proc/kallsyms > /dev/null
ivan@748m17:~$ sudo perf script | grep 'cat 3828805' | wc -l
3644
ivan@748m17:~$ sudo perf script | grep 'bpf_get_kallsym' | wc -l
3285

I wonder if there's something quadratic going on there and we just need to fix the kernel itself as the first step.

@bobrik
Copy link
Copy Markdown
Contributor

bobrik commented Jan 27, 2026

Oh yeah, this is quadratic alright:

https://github.com/torvalds/linux/blob/7d0a66e4bb9081d75c82ec4957c50034cb0ea449/kernel/bpf/core.c#L781-L807

image

We do have a lot of symbols too:

ivan@748m17:~$ cat /proc/kallsyms | grep -F '[bpf]' | wc -l
10507

@bobrik
Copy link
Copy Markdown
Contributor

bobrik commented Jan 30, 2026

I took a stab at fixing it upstream:

Testing in production on v6.12 series (with ~10000 bpf ksyms as well):

  • On AMD EPYC 9684X (Zen4): ~870ms -> ~100ms
  • On Ampere Altra Max M128-30: ~4650ms -> ~70ms

@bobrik
Copy link
Copy Markdown
Contributor

bobrik commented Jan 31, 2026

Even with the patch applied, it's still a considerable amount of work to read and parse /proc/kallsyms:

image

For me about 72% of symbols are static. You can seek over them but that would not prevent any of the hot printf code from executing, so the utility of doing that is kind of limited (you only save a small portion of userspace work).

Having a separate /proc/kallsyms-dynamic is one way to reduce work, but that's not going to get into older kernels and it would still be a lot of work for a small change when a symbol is added.

Ultimately, I think having a ringbuf (or a map) for updates from a kprobe on bpf_ksym_add is probably the best way to reduce work here. We can detect dropped updates and fall back to full parsing if it comes to that.

@fabled
Copy link
Copy Markdown
Contributor

fabled commented Feb 2, 2026

Ultimately, I think having a ringbuf (or a map) for updates from a kprobe on bpf_ksym_add is probably the best way to reduce work here. We can detect dropped updates and fall back to full parsing if it comes to that.

Could you open an issue to track this instead of this merged PR. I suspect your use case is quite special from most other use cases so its not immediate issue. But the solution would require quite a bit of work to make it robust and work without too much overhead. I think the design and further points should be discussed in a new issue. Thanks.

@bobrik
Copy link
Copy Markdown
Contributor

bobrik commented Feb 3, 2026

I made #1151 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Kernel stack traces show stale or incorrect symbols for dynamically loaded BPF programs

5 participants