Mimic Ruby's backtrace logic in ruby interpreter and tracer, add support for ruby CMEs #907

dalehamel · 2025-10-31T19:54:40Z

What

This overhauls the ruby interpreter and unwinder to more closely mimic what ruby's own backtrace function does. In particular, it adds support for collecting and symbolizing Callable Method Entries (CMEs), which contain additional information.

This adds support for new features in symbolizing ruby frames:

Adds class information when possible, in ruby 3.3.0+, fixing [feat] Include class name for ruby method calls #714

It also fixes an existing bug in ruby #770

In addition to this, it changes how frames are symbolized:

For c func method frames, rather than skip the frame we attempt to symbolize it (supported for all versions of ruby)
- Presuming we will return to ruby symbolization (not true in ruby < 2.6.0), we save the cfunc and push the frame only after returning to ruby. This way the cfunc will "own" the native frames it calls as we interleave the stack
For iseq-backed method entry frames, we read the label from the correct location and do our best to add classpath information to it
For plain iseq-backed frames (which is all that was previously supported), we choose between base_label and label more appropriately

Real world example

This makes the bpf ruby interpreter closer to being usable and helpful with real production workloads.

These examples are taken from a production job server using our internal job system called "Hedwig".

Before:

After:

(Note: that these are aggregated by a custom backend, and the [rb]:: prefixes are added by that.)

Notice that the names are much more descriptive, including the class name or extended label information that was previously omitted.

Notice also how Kernel#fork owns the native frames calling rb_f_fork, and likewise for Array#each owning rb_ary_each. Previously these frames were elided.

These frames much more closely resemble what Ruby developers will be familiar with when looking at stackprof of vernier profiles, as well as backtracie. I recently submitted the same thing for rbspy following basically the same approach and algorithm but done in one process, in userspace. In fact, when placed side by side with profiles from stackprof, developers can't tell the difference between the new native profiles:

vs stackprof request profiles they are already familiar with:

But, it is also possible to get more context about what it is doing, since we can see kernel and userspace native frames in the context of the ruby:

Why

The current unwinding approach is not consistent with what ruby's own backtrace function does. This is confusing to users who are familiar with ruby profilers like stackprof and vernier, which uses's ruby's built-in profiling stack unwinding.

We also are currently missing information that makes the ruby unwinder useful for any real app in production. Knowing the class information makes it easier to see what is actually going on in the application.

The existing symbolization and recording of stack frames loses a lot of data, it doesn't handle callable method entries at all and doesn't navigate to the same instruction sequences and label values, so it can provide surprising results.

How

As much as possible, the code mimics the behaviour of ruby's own rb_profile_frames from ruby's vm_backtrace.c, both for collecting the frames:

Every frame is captured in the same way. The bulk of this is done by mimic'ing ruby's own check_method_entry to walk from the ep to the correct frame. We have to unroll and optimize this for it to work, since BPF doesn't allow a "while" loop.

And when symbolizing:

When symbolizing the frames, we mimic rb_profile_frame_full_label as much as possible, considering the class name to get the qualified method name, and using the label, base_label, and method_name labels to compose the most descriptive possible label regardless of it is a CME or bare ISEQ.
For c functions, we read the label from the global ID table, imitating ruby's id2str

I also added the support for symbolizing frames that are backed by a ruby CME, this adds new code for id2sym which is used to look up C method identifiers from the global symbol table.

For both types of CMEs, I've added the code to read the class data from ext.classpath on ruby 3.3.0+.

All frames also use the equivalent of rb_profile_frame_full_name, regardless of if the class information is available, as this can still provide a more contextful label.

Reviewer Notes

I'm aware this is a large changeset, especially after we just landed #170. I considered not including the class path and C symbolizing stuff, but decided it would be a bit silly to submit them separately since there isn't much of a point in collecting CMEs if we don't also use them. Because I'm both updating the BPF side (collection, building up frame buffer in the same way rb_profile_frames does by getting CMES), and Go side (symbolizing the results), there is a fair bit to change here.

I recommend using github's "side by side" view to review this, the intermixed one won't make much sense otherwise.

I'd recommend starting it in the BPF code

read_ruby_frame https://github.com/dalehamel/opentelemetry-ebpf-profiler/pull/5/files#diff-b61481ee49ef1ed454c5d7a7a13853d92f65f6fd56d2d38e475b45ca97a9ab9eR102
comparing this to the corresponding ruby code in vm_backtrace.c (thread_profile_Frames)
Review the rest of the C / bpf, most of which is supporting boilerplate stuff like constants for errors and metrics or shared constants with Go.

On then moving to the go side:

start in Symbolize https://github.com/dalehamel/opentelemetry-ebpf-profiler/pull/5/files#diff-c0a6a39f3d56f7bff3ba5d2a41360932a50b265067b0717cc67916f4ebaf7a62R930
Compare this with the logic starting in rb_profile_frame_full_name to see why these fields are being collected and to see the algorithm I am trying to copy
Then review the rest of the go, which is boilerplate stuff and some tests

My apologies for the large diff, I did hold back additional stuff which I want to submit in subsequent PRs:

Garbage collection state detection
Ruby JIT support
Ongoing work to address getting the EC from TLS

However, this change represents the most important, common base for all the additional work. I tried to keep the changes as small as I could, but adding tests, comments, and boiler plate has pushed the diff larger.

dalehamel · 2025-10-31T20:15:34Z

interpreter/ruby/ruby.go

+	rubyFlSingleton libpf.Address
+
+	// Is it possible to read the classpath
+	hasClassPath bool


@fabled i tried to go for the "feature" rather than "version" based approach, hopefully this is what you had in mind.

dalehamel · 2025-10-31T20:16:09Z

interpreter/ruby/ruby.go

+		return &rubyIseq{}, fmt.Errorf("failed to read iseq location data, %v", err)
+	}
+
+	sourceFileNamePtr := npsr.Ptr(dataBytes, uint(vms.iseq_location_struct.pathobj))


I saw another opportunity to use npsr to save readv calls.

dalehamel · 2025-10-31T20:16:45Z

interpreter/ruby/ruby.go

+		singletonObject := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.as_singleton_class_attached_object))
+		classpathPtr = r.rm.Ptr(singletonObject + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath))
+
+		// TODO (dalehamel) in future PR handle anonymous classes and modules


I can remove my handle from this but I basically promise to submit this later, as it is, i feel the PR is big enough and this is kinda an esoteric edge case.

I think this is ok.

dalehamel · 2025-10-31T20:18:27Z

support/ebpf/tracemgmt.h

+  record->phpUnwindState.zend_execute_data  = 0;
+  record->rubyUnwindState.stack_ptr         = 0;
+  record->rubyUnwindState.last_stack_frame  = 0;
+  record->rubyUnwindState.cfunc_saved_frame = 0;


NB this has to be reset or coredump tests fail due to state leakage it seems.

Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.

dalehamel · 2025-10-31T20:19:31Z

tools/coredump/testdata/amd64/ruby-2.7.8p225-loop.json

        "libruby.so.2.7.8+0x216781",
        "libruby.so.2.7.8+0x211485",
        "libruby.so.2.7.8+0x2212d2",
-        "is_prime+0 in /pwd/testsources/ruby/loop.rb:10",


The ordering of the frames changes in a few of these examples because i fixed the bug where didn't save our state when returning back to ruby interpreter.

I also push the saved cfunc when we return, so it properly "owns" the native frames it called.

dalehamel · 2025-10-31T20:20:24Z

tools/coredump/testdata/amd64/ruby25.20836.json

-        "<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30",
-        "<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:29",
+        "block (2 levels) in <main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30",
+        "UNKNOWN CFUNC+0 in <cfunc>:0",


I think these coredumps for older rubies are stripped and missing the symbol table, so we cannot lookup the location of the global symbol table within ruby. This is the fallback dummy frame for that.

dalehamel · 2025-10-31T20:20:59Z

tools/coredump/testdata/amd64/ruby-3.5.0-loop.json

-        "<main>+0 in /app/loop.rb:29",
-        "loop+0 in <internal:kernel>:168",
+        "Range#each+0 in <cfunc>:0",
+        "block in <main>+0 in /app/loop.rb:29",


Notice that a lot of these now have more context because we have conditional logic picking between "label", "base_label", and "method name" from the iseq, to match ruby's own logic for this.

dalehamel · 2025-10-31T20:21:27Z

tools/coredump/testdata/arm64/ruby-2.7.8p225-loop.json

        "libruby.so.2.7.8+0x202f17",
        "libruby.so.2.7.8+0x215a23",
-        "<main>+0 in /pwd/testsources/ruby/loop.rb:29",
+        "each+0 in <cfunc>:0",


This frame was previously elided but is now handled.

dalehamel · 2025-10-31T20:23:56Z

tools/coredump/testdata/amd64/ruby-3.5.0-loop.json

        "libruby.so.3.5.0+0x31e2ea",
        "libruby.so.3.5.0+0x325240",
        "libruby.so.3.5.0+0x333b86",
-        "<main>+0 in /app/loop.rb:29",


This is an example of the repeated frame bug (#770 ) which is now fixed, hence why it doesn't repeat anymore.

dalehamel · 2025-10-31T20:26:29Z

tools/coredump/testdata/amd64/ruby-3.3.9-loop.json

-        "is_prime+0 in /app/loop.rb:10",
-        "sum_of_primes+0 in /app/loop.rb:20",
-        "<main>+0 in /app/loop.rb:30",
+        "Object#is_prime+0 in /app/loop.rb:45481984",


~~I'm not sure the cause, but there seems to be a bug when computing the line numbers for the leaf ruby frame.~~ (see below, found the cause and a solution) Here it is an impossibly large value, and in other spots it is 0. It seems to be a consequence of moving to the CME backed iseqs rather than just the bare one which is directly accessible.

If it's ok, i'd like to just file a follow-up issue to fix that separately.

Looking at this a bit more, I believe the issue is that the line number is also encoded in the plain iseq directly accessible from the CFP:

https://github.com/ruby/ruby/blob/52a17bbe6d778d56cc600f73f107c1992350f877/vm_backtrace.c#L1763-L1767

I'll see if this can be passed from bpf as well to cover this case.

Since this PR is already quite large, and my solution for passing the additional bytes is probably controversial (since I need to use the padding on Frame to send another address), I'm not pushing my solution to this branch just yet. But, I do have a fix and it is available here to get an idea of how it can be solved:

Shopify@9c0f0f6

I'd like to suggest we move forward with this branch as-is, and I submit this fix as a separate PR if/when this branch lands.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

It would be idea, yeah, but I haven't actually changed the line number calculation function at all. It seems to work correctly.

If anything, it might help to move the guard clause in C (which previously did exist in BPF but has now been removed, since we always want to push the frame) to the top of the go function. I can give that a try, it would be good to have and should be pretty simple?

Ie, this check:

https://github.com/ruby/ruby/blob/52a17bbe6d778d56cc600f73f107c1992350f877/vm_backtrace.c#L1777-L1778

EDIT - this doesn't seem to work, perhaps because we have no way of knowing if the frame is the "top" frame (they are sent to us individually, we don't know where they are in the sequence). I could perhaps flag this by jamming more information into the File bits, but i'd rather just leave this for now since there already is a fix on the table with the correct logic.

I decided to file #931 separately, if that lands then i'll pull the code to fix the line numbers right into this PR.

I fixed all of this in dalehamel#12 which builds on #946 which i rebased on today

ivoanjo

I'm not a contributor on this repo, but I've spent a lot of time looking at Ruby's stack trace unwinding code in backtracie and datadog's ruby profiler: All of the extra work to collect the class names + C function names in this PR looks quite reasonable.

ivoanjo · 2025-11-04T10:53:25Z

support/ebpf/tracemgmt.h

+  record->phpUnwindState.zend_execute_data  = 0;
+  record->rubyUnwindState.stack_ptr         = 0;
+  record->rubyUnwindState.last_stack_frame  = 0;
+  record->rubyUnwindState.cfunc_saved_frame = 0;


Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.

ivoanjo · 2025-11-04T10:56:25Z

support/ebpf/types.h

  // number of attempts to read Go custom labels
  metricID_UnwindGoLabelsAttempts,

  // number of failures to read Go custom labels


⬆️ Minor: I think you forgot to update the metricID_UnwindRubyErrReadIseqEncoded/metricID_UnwindRubyErrReadIseqSize above as well.

(Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?)

good catch, i'll fix this.

Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?

I figured since the metrics (and errors) rely on unique IDs, leaving them there increases the potential that they get reused, and it is possible that metrics (or errors) could be misidentified. It might make sense to reuse them in the future, but for now leaving them as-is and deprecating them avoids this possibility

ivoanjo · 2025-11-04T11:45:09Z

tools/coredump/testdata/amd64/ruby-3.3.9-loop.json

-        "is_prime+0 in /app/loop.rb:10",
-        "sum_of_primes+0 in /app/loop.rb:20",
-        "<main>+0 in /app/loop.rb:30",
+        "Object#is_prime+0 in /app/loop.rb:45481984",


Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

ivoanjo · 2025-11-04T11:51:55Z

tools/coredump/testdata/amd64/ruby-2.7.8p225-loop.json

Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".

E.g. without that info it looks to me we can only tell the current stack looks better/matches what seems to be happening in gdb, but it's unclear if there's some weird corner case that's still missing or not.

Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".

Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)

Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.

Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)

It may be harder on macOS than on Linux; I just got a new laptop running Ubuntu 24.04 and I was able to get even Ruby 2.2 going without too much mucking (although I did need some).

So I guess if this would be an interesting (separate) contribution, I could look into it...

Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.

Yes, I think ideally we'd have both sets of stacks -- the Ruby ones I'd expect would be a strict subset of the complete symbolized stacks.

ivoanjo · 2025-11-04T11:59:51Z

support/ebpf/ruby_tracer.ebpf.c

+typedef struct rb_control_frame_struct {
+  const void *pc;         // cfp[0]
+  void *sp;               // cfp[1]
+  const void *iseq;       // cfp[2]
+  void *self;             // cfp[3] / block[0]
+  const void *ep;         // cfp[4] / block[1]
+  const void *block_code; // cfp[5] / block[2] -- iseq, ifunc, or forwarded block handler
+  void *jit_return;       // cfp[6] -- return address for JIT code
+} rb_control_frame_t;


Minor: I wonder if it's worth introducing a typedef for VALUE, to keep the closer alignment with the types used on the Ruby VM (rather than replacing them with a void *)

interpreter/ruby/ruby.go

felixge

Thank you so much for this contribution @dalehamel . I don't have the expertise to review this, but I've added an agenda item to the next profiling SIG meeting notes to discuss a planning for getting this reviewed and landed.

dalehamel · 2025-11-06T14:57:23Z

profiling SIG meeting notes

Hey that is awesome, thanks @felixge . Perhaps I can attend the meeting. FYI I am opening a couple of additional issues today to reference some work that depends on this, and give an overall view to the Ruby interpreter improvements we are using internally that I would like to contribute back :)

EDIT: here it is #941, see also #936 and #937 which build on this PR.

dalehamel · 2025-11-21T20:41:41Z

I've rebased with #946 in dalehamel#12 which I've staged to update so that once that PR lands, I can update this PR to include the line number fixes.

fabled

Thanks for this! And apologies for the delay. First glance done, mostly mechanical. I need to study the Ruby VM also a bit to understand the split what needs to be in ebpf, and what can be done in Go side. Can you also rebase now that the variable length frame PR is merged?

fabled · 2026-01-05T11:02:31Z

interpreter/ruby/ruby.go

 	// regex to extract a version from a string
 	rubyVersionRegex = regexp.MustCompile(`^(\d)\.(\d)\.(\d)$`)

+	unknownCfunc   = libpf.Intern("UNKNOWN CFUNC")


Typically these have been in the style <unknown> or similar. Is the string suggested here also used by Ruby in its backtraces?

Is the string suggested here also used by Ruby in its backtraces?

No, ruby should always be able to read this label internally so there is no case that needs to be handled for not having a cfunc id.

We should also be able to reliably determine this, unless if:

We are on a super old version of ruby

We are on a new version of ruby that the bpf profiler doesn't handle yet

Ruby was built with non-standard settings and the offsets we use to find the values in the id table are scewed.

This value here is mostly to indicate specifically that we know it was a c function ,but couldn't resolve its id.

Typically these have been in the style or similar.

I'll switch it to use this convention

fabled · 2026-01-05T11:05:43Z

interpreter/ruby/ruby.go

+	classFlags := r.rm.Ptr(classAddr)
+	classMask := classFlags & rubyTMask
+
+	classpathPtr = r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath))
+	if classMask == rubyTIClass {
+		//https://github.com/ruby/ruby/blob/b627532/vm_backtrace.c#L1931-L1933
+
+		if klassAddr := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rbasic_struct.klass)); klassAddr != 0 {


There are several reads based on classAddr. Could these be merged into one read to buffer sized to hold the full data?

I think we can reasonably do this for the flags and klass as they are reliably at the start of the struct and at fixed offsets.

For singleton and classpath though they are much further down in the struct (currently ~100 bytes) so unless we read the whole object i'm not sure that makes sense. How expensive is a large read vs multiple smaller reads?

I decided to do a large read that will encompass rbasic + classext + classpath + size of value, so that we can get the entire object up to and including classpath pointer in the first read.

We'll still need to do additional reads if the object is an iclass or a singleton, as we need to get a different base classaddr to the classpath in those cases.

fabled · 2026-01-05T11:06:31Z

interpreter/ruby/ruby.go

+		singletonObject := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.as_singleton_class_attached_object))
+		classpathPtr = r.rm.Ptr(singletonObject + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath))
+
+		// TODO (dalehamel) in future PR handle anonymous classes and modules


I think this is ok.

fabled · 2026-01-05T11:07:55Z

interpreter/ruby/ruby.go

+	// RUBY_ID_SCOPE_SHIFT = 4
+	// https://github.com/ruby/ruby/blob/797a4115bbb249c4f5f11e1b4bacba7781c68cee/template/id.h.tmpl#L30
+	rubyIdScopeShift := 4
+
+	// ID_ENTRY_UNIT
+	// https://github.com/ruby/ruby/blob/v3_4_5/symbol.c#L77
+	idEntryUnit := uint64(512)
+
+	// ID_ENTRY_SIZE
+	// https://github.com/ruby/ruby/blob/980e18496e1aafc642b199d24c81ab4a8afb3abb/symbol.c#L93
+	idEntrySize := uint64(2)


Should be const. Potentially in the vmStruct to make these configurable if there's risk that these change.

fabled · 2026-01-05T11:09:08Z

interpreter/ruby/ruby.go

-	// rb_iseq_constant_body
-	// https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/vm_core.h#L311
-	iseqBody := libpf.Address(frame.File)
+	lastId := r.rm.Uint32(r.globalSymbolsAddr)


Is the lastId here volatile? Perhaps we could still cache it, and re-read it only if the serial is larger then the cached lastId?

Any time a string is interned, this value can change https://github.com/ruby/ruby/blob/20cda200d3ce092571d0b5d342dadca69636cb0f/symbol.c#L795, so yes I'd say that this is potentially volatile.

Perhaps we could still cache it, and re-read it only if the serial is larger then the cached lastId?

Yes this is probably ok, since it is only being used for the check. We would need to do the check twice but I guess that is cheaper than unconditionally re-reading the same value.

This is now cached

fabled · 2026-01-05T11:59:13Z

support/ebpf/ruby_tracer.ebpf.c

          &stack_ptr_current,
          sizeof(stack_ptr_current),
          (void *)(current_ctx_addr + rubyinfo->vm_stack))) {
-      DEBUG_PRINT("ruby: failed to read current stack pointer");


Are these removed to just reduce ebpf byte code size?

Yes because the debug prints add a lot of byte code, and we can communicate the same thing with return codes / metrics.

fabled · 2026-01-05T12:03:56Z

support/ebpf/ruby_tracer.ebpf.c

+    error = read_ruby_frame(record, rubyinfo, stack_ptr, next_unwinder);
+    if (error != ERR_OK)
      return error;


Semi related, we are trying to adjust the unwinders to not abort on non-fatal errors. Would it be reasonable to do it in this PR while at it?
The idea would be to emit error frame (and potentially mark the HLL unwinder as "done") if an error is encountered, and then continue with the native unwinder.

I noticed the PR that added this in the last couple of months and think it mostly makes sense. However, in the case of ruby jit (a subsequent PR), we know that if we have a JIT PC, we explicitly DO want to end unwinding when we are done unwinding ruby, as the native unwinder cannot proceed further. So there is some nuance here we'll need to deal with.

We can probably just emit the error frame as you say though and use a better mechanism to say to stop unwinding further, i'll just need to study what the other interpreters are doing.

I couldn't find an example of other unwinders doing this, but maybe i'm looking for the wrong thing.

Either way, perhaps I could follow up with another PR to make continuing with unwinding more robust once this and the other ruby PRs i have queued up have been reviewed and hopefully merged.

fabled · 2026-01-05T12:05:16Z

support/ebpf/ruby_tracer.ebpf.c

+  if (rubyinfo->version < 0x30100)
+    cf_size -= sizeof(control_frame.jit_return);
+
+read_cfp:


unused label?

yes it was used an in earlier version. Replacing the label with a comment.

fabled · 2026-01-05T12:07:03Z

support/ebpf/ruby_tracer.ebpf.c

+// Next iteration of the loop, or error out if we have hit the maximum as we
+// couldn't find the method entry
+next_ep:
+  if (ep_check++ < MAX_EP_CHECKS && (!((u64)vm_env.flags & VM_ENV_FLAG_LOCAL))) {


I would really prefer this loop to be a for with the proper UNROLL annotation. This might end up generating unrolled code at some point (and fail verifier in the old kernels).

In general gotos like this suggest this probably should be split into more functions. While there is no rule to not use goto, I personally think they should not be used in complex way like this where it becomes readability issue.

I would really prefer this loop to be a for with the proper UNROLL annotation. This might end up generating unrolled code at some point (and fail verifier in the old kernels).

Yeah i can try that, I think the verifier will probably treat it the same. It will bloat the number of instructions for this program, but ultimately it is basically already evaluating that much at load time anyways so shouldn't change the behaviour.

In general gotos like this suggest this probably should be split into more functions. While there is no rule to not use goto, I personally think they should not be used in complex way like this where it becomes readability issue.

I prefer not to use them to, but had to resort to it in order to try to minimize the number of instructions in the inner loop here in order to satisfy the verifier, since the algorithm in ruby itself uses a "while" loop, so the go-to seems to help the verifier understand where the execution can be short circuited. This is definitely the most finnicky part of the code and maybe could be refactored as you say, but I'd rather not change the logic in this PR as it would risk breaking things.

Added the unroll and eliminated most of the gotos ("continue" and "break" do basically the same flow control more idiomatically here)

fabled · 2026-01-05T12:09:47Z

support/ebpf/ruby_tracer.ebpf.c

+    frame_addr = me_or_cref;
+
+    if (cfunc) {
+      if (rubyinfo->version < 0x20600) {


Can this be also a feature flag in rubyinfo?

Can this be also a feature flag in rubyinfo?

I can add it, but note that this is how the code already runs the check on main for this branch, see

opentelemetry-ebpf-profiler/support/ebpf/ruby_tracer.ebpf.c

Lines 145 to 151 in 2e2be5a

if (rubyinfo->version < 0x20600) {

// With Ruby version 2.6 the scope of our entry symbol ruby_current_execution_context_ptr

// got extended. We need this extension to jump back unwinding Ruby VM frames if we

// continue at this point with unwinding native frames.

// As this is not available for Ruby versions < 2.6 we just skip this indicator frame and

// continue unwinding Ruby VM frames. Due to this issue, the ordering of Ruby and native

// frames might not be correct for Ruby versions < 2.6.

dalehamel · 2026-01-07T20:18:45Z

@fabled I believe I've addressed all your first batch of comments with either code or conversation, apologies if I missed anything. I believe this should be ready for another look when you have the time, thanks!

EDIT: woops, github hid some of them. Addressing those now.
EDIT2: Addressed them all again.

…ting classpath

Co-authored-by: Timo Teräs <timo.teras@iki.fi>

…, and which are unused/unstable

interpreter/ruby/ruby.go

florianl · 2026-01-09T09:48:22Z

interpreter/ruby/ruby.go

 	sourceFileName, err := r.getStringCached(sourceFileNamePtr, r.readPathObjRealPath)
 	if err != nil {
-		return err
+		log.Debugf("Failed to get source file name %v", err)


Is it really safe to continue at this point? Later, when reading labels, we return.

Yes, the string read for the source file name is allowed to fail. We fail if ANY of the label reads fails, as this would lead to calculating an incorrect label since the algorithm would treat them as empty, but they are actually an "error".

The source file name isn't as critical as the method label, and we need three components to correctly compute the full label.

metrics/metrics.json

support/ebpf/ruby_tracer.h

florianl · 2026-01-09T09:56:22Z

tools/coredump/testdata/amd64/ruby-2.7.8p225-loop.json

        "libruby.so.2.7.8+0x211485",
        "libruby.so.2.7.8+0x2212d2",
-        "<main>+0 in /pwd/testsources/ruby/loop.rb:29",
+        "<=>+0 in <cfunc>:0",


Do you know what has happened here?

the previous label calculation was overly simplistic. <main> here is actually block (2 levels) in <main>, it was also looking at the wrong iseq so it got the wrong line number.

In ruby <=> is the label for "compare" operations, it is used throughout ruby but here is an example in string.c https://github.com/ruby/ruby/blob/c794a97940a36269cffcb6ad35ef7ff209fe2720/string.c#L12193-L12195

The stack frames also moved around a bit as I noted in the comment above.

Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>

dalehamel requested review from a team as code owners October 31, 2025 19:54

dalehamel mentioned this pull request Oct 31, 2025

Cme upstreaming dalehamel/opentelemetry-ebpf-profiler#5

Closed

florianl added the interpreter/ruby label Oct 31, 2025

dalehamel mentioned this pull request Oct 31, 2025

Increase frame buffer to max 1024 frames per trace #908

Closed

dalehamel commented Oct 31, 2025

View reviewed changes

dalehamel force-pushed the ruby-read-cmes branch from 972cfde to a99ac63 Compare November 4, 2025 01:25

ivoanjo approved these changes Nov 4, 2025

View reviewed changes

This was referenced Nov 5, 2025

Allow pushing an "extra" value in Frame in padding dalehamel/opentelemetry-ebpf-profiler#11

Draft

Allow pushing an "extra" value in Frame in padding #931

Closed

felixge reviewed Nov 6, 2025

View reviewed changes

dalehamel force-pushed the ruby-read-cmes branch from e4e1ffb to 5374137 Compare November 6, 2025 15:08

This was referenced Nov 6, 2025

[feat][Ruby] Support detecting GC state and handle it accordingly #936

Open

[feat][Ruby] Support for ruby JIT frames #937

Open

[feat][Ruby][meta] - Overall Ruby interpreter improvements #941

Open

dalehamel mentioned this pull request Nov 17, 2025

Refactor the ebpf frame data to be variable length #943

Merged

dalehamel force-pushed the ruby-read-cmes branch from 5374137 to ed21fd2 Compare November 21, 2025 19:22

fabled reviewed Jan 5, 2026

View reviewed changes

dalehamel mentioned this pull request Jan 5, 2026

Introduce per-interpreter code ownership #961

Open

dalehamel added 4 commits January 7, 2026 12:55

Fix lineno decoding by passing cfp->iseq

d8e7c9e

Cleanup unused label and variable initialization

2c3e67c

Move id related constants to const

86a65e5

Replace gotos in detecting method entry with unrolled loop

b2a8c4d

dalehamel force-pushed the ruby-read-cmes branch from 1f09c90 to 2c9e061 Compare January 7, 2026 19:36

dalehamel mentioned this pull request Jan 8, 2026

Fix FP+RA handling on aarch64 #1048

Open

dalehamel and others added 11 commits January 8, 2026 15:41

Update dummy value for unknown cfunc to match convention

878aaa2

Cache lastId and reread only if serial is larger

517539e

Store size of iseq location to buffer full value for npsr

8e4e238

Buffer rbasic + classext + classpath + value to reduce reads when get…

d7f545b

…ting classpath

Update interpreter/ruby/ruby.go

d2460ff

Co-authored-by: Timo Teräs <timo.teras@iki.fi>

Update interpreter/ruby/ruby.go

50ba817

Co-authored-by: Timo Teräs <timo.teras@iki.fi>

Update interpreter/ruby/ruby.go

6d276ae

Co-authored-by: Timo Teräs <timo.teras@iki.fi>

Id table lookup improvements

7b7bcbf

Reduce log verbosity when failing to read iseq data

a532a72

Fix unknown function label format

e310523

Add source code docs

2d1d786

dalehamel force-pushed the ruby-read-cmes branch from 6d78f87 to 6c20f21 Compare January 8, 2026 20:48

Read size of control frame struct from procinfo

5f9d934

dalehamel force-pushed the ruby-read-cmes branch from 6c20f21 to 14bde4d Compare January 8, 2026 23:53

dalehamel added 3 commits January 8, 2026 19:14

add safety check for reading control frame struct

c042b69

Indicate what control frame fields are used and expected to be stable…

ef9902b

…, and which are unused/unstable

update bpf blobs

3b0f653

dalehamel force-pushed the ruby-read-cmes branch from 14bde4d to 3b0f653 Compare January 9, 2026 00:32

florianl reviewed Jan 9, 2026

View reviewed changes

dalehamel and others added 3 commits January 9, 2026 06:40

Apply suggestions from code review

5b56d30

Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>

Update coredump labels for unknown cfuncs

2899b5b

Remove obsolete metric ids

7665239

florianl approved these changes Jan 9, 2026

View reviewed changes

	if (rubyinfo->version < 0x20600) {
	// With Ruby version 2.6 the scope of our entry symbol ruby_current_execution_context_ptr
	// got extended. We need this extension to jump back unwinding Ruby VM frames if we
	// continue at this point with unwinding native frames.
	// As this is not available for Ruby versions < 2.6 we just skip this indicator frame and
	// continue unwinding Ruby VM frames. Due to this issue, the ordering of Ruby and native
	// frames might not be correct for Ruby versions < 2.6.

Mimic Ruby's backtrace logic in ruby interpreter and tracer, add support for ruby CMEs #907

Are you sure you want to change the base?

Mimic Ruby's backtrace logic in ruby interpreter and tracer, add support for ruby CMEs #907

Conversation

dalehamel commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Real world example

Why

How

Reviewer Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivoanjo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dalehamel commented Oct 31, 2025 •

edited

Loading

dalehamel Oct 31, 2025 •

edited

Loading

dalehamel Oct 31, 2025 •

edited

Loading

dalehamel Nov 3, 2025 •

edited

Loading

dalehamel Nov 3, 2025 •

edited

Loading

dalehamel Nov 4, 2025 •

edited

Loading

dalehamel commented Nov 6, 2025 •

edited

Loading