Skip to content

Conversation

@dalehamel
Copy link
Contributor

@dalehamel dalehamel commented Oct 31, 2025

What

This overhauls the ruby interpreter and unwinder to more closely mimic what ruby's own backtrace function does. In particular, it adds support for collecting and symbolizing Callable Method Entries (CMEs), which contain additional information.

This adds support for new features in symbolizing ruby frames:

It also fixes an existing bug in ruby #770

In addition to this, it changes how frames are symbolized:

  • For c func method frames, rather than skip the frame we attempt to symbolize it (supported for all versions of ruby)
    • Presuming we will return to ruby symbolization (not true in ruby < 2.6.0), we save the cfunc and push the frame only after returning to ruby. This way the cfunc will "own" the native frames it calls as we interleave the stack
  • For iseq-backed method entry frames, we read the label from the correct location and do our best to add classpath information to it
  • For plain iseq-backed frames (which is all that was previously supported), we choose between base_label and label more appropriately

Real world example

This makes the bpf ruby interpreter closer to being usable and helpful with real production workloads.

These examples are taken from a production job server using our internal job system called "Hedwig".

Before:

Screenshot 2025-10-31 at 11 54 38 AM

After:

Screenshot 2025-10-31 at 11 54 16 AM

(Note: that these are aggregated by a custom backend, and the [rb]:: prefixes are added by that.)

Notice that the names are much more descriptive, including the class name or extended label information that was previously omitted.

Notice also how Kernel#fork owns the native frames calling rb_f_fork, and likewise for Array#each owning rb_ary_each. Previously these frames were elided.

These frames much more closely resemble what Ruby developers will be familiar with when looking at stackprof of vernier profiles, as well as backtracie. I recently submitted the same thing for rbspy following basically the same approach and algorithm but done in one process, in userspace. In fact, when placed side by side with profiles from stackprof, developers can't tell the difference between the new native profiles:

Screenshot 2025-10-31 at 3 19 51 PM

vs stackprof request profiles they are already familiar with:

Screenshot 2025-10-31 at 3 19 08 PM

But, it is also possible to get more context about what it is doing, since we can see kernel and userspace native frames in the context of the ruby:

image

Why

The current unwinding approach is not consistent with what ruby's own backtrace function does. This is confusing to users who are familiar with ruby profilers like stackprof and vernier, which uses's ruby's built-in profiling stack unwinding.

We also are currently missing information that makes the ruby unwinder useful for any real app in production. Knowing the class information makes it easier to see what is actually going on in the application.

The existing symbolization and recording of stack frames loses a lot of data, it doesn't handle callable method entries at all and doesn't navigate to the same instruction sequences and label values, so it can provide surprising results.

How

As much as possible, the code mimics the behaviour of ruby's own rb_profile_frames from ruby's vm_backtrace.c, both for collecting the frames:

  • Every frame is captured in the same way. The bulk of this is done by mimic'ing ruby's own check_method_entry to walk from the ep to the correct frame. We have to unroll and optimize this for it to work, since BPF doesn't allow a "while" loop.

And when symbolizing:

  • When symbolizing the frames, we mimic rb_profile_frame_full_label as much as possible, considering the class name to get the qualified method name, and using the label, base_label, and method_name labels to compose the most descriptive possible label regardless of it is a CME or bare ISEQ.
  • For c functions, we read the label from the global ID table, imitating ruby's id2str

I also added the support for symbolizing frames that are backed by a ruby CME, this adds new code for id2sym which is used to look up C method identifiers from the global symbol table.

For both types of CMEs, I've added the code to read the class data from ext.classpath on ruby 3.3.0+.

All frames also use the equivalent of rb_profile_frame_full_name, regardless of if the class information is available, as this can still provide a more contextful label.

Reviewer Notes

I'm aware this is a large changeset, especially after we just landed #170. I considered not including the class path and C symbolizing stuff, but decided it would be a bit silly to submit them separately since there isn't much of a point in collecting CMEs if we don't also use them. Because I'm both updating the BPF side (collection, building up frame buffer in the same way rb_profile_frames does by getting CMES), and Go side (symbolizing the results), there is a fair bit to change here.

I recommend using github's "side by side" view to review this, the intermixed one won't make much sense otherwise.

I'd recommend starting it in the BPF code

On then moving to the go side:

My apologies for the large diff, I did hold back additional stuff which I want to submit in subsequent PRs:

  • Garbage collection state detection
  • Ruby JIT support
  • Ongoing work to address getting the EC from TLS

However, this change represents the most important, common base for all the additional work. I tried to keep the changes as small as I could, but adding tests, comments, and boiler plate has pushed the diff larger.

rubyFlSingleton libpf.Address

// Is it possible to read the classpath
hasClassPath bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabled i tried to go for the "feature" rather than "version" based approach, hopefully this is what you had in mind.

return &rubyIseq{}, fmt.Errorf("failed to read iseq location data, %v", err)
}

sourceFileNamePtr := npsr.Ptr(dataBytes, uint(vms.iseq_location_struct.pathobj))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw another opportunity to use npsr to save readv calls.

singletonObject := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.as_singleton_class_attached_object))
classpathPtr = r.rm.Ptr(singletonObject + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath))

// TODO (dalehamel) in future PR handle anonymous classes and modules
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove my handle from this but I basically promise to submit this later, as it is, i feel the PR is big enough and this is kinda an esoteric edge case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok.

record->phpUnwindState.zend_execute_data = 0;
record->rubyUnwindState.stack_ptr = 0;
record->rubyUnwindState.last_stack_frame = 0;
record->rubyUnwindState.cfunc_saved_frame = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB this has to be reset or coredump tests fail due to state leakage it seems.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.

"libruby.so.2.7.8+0x216781",
"libruby.so.2.7.8+0x211485",
"libruby.so.2.7.8+0x2212d2",
"is_prime+0 in /pwd/testsources/ruby/loop.rb:10",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering of the frames changes in a few of these examples because i fixed the bug where didn't save our state when returning back to ruby interpreter.

I also push the saved cfunc when we return, so it properly "owns" the native frames it called.

"<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30",
"<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:29",
"block (2 levels) in <main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30",
"UNKNOWN CFUNC+0 in <cfunc>:0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these coredumps for older rubies are stripped and missing the symbol table, so we cannot lookup the location of the global symbol table within ruby. This is the fallback dummy frame for that.

"<main>+0 in /app/loop.rb:29",
"loop+0 in <internal:kernel>:168",
"Range#each+0 in <cfunc>:0",
"block in <main>+0 in /app/loop.rb:29",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that a lot of these now have more context because we have conditional logic picking between "label", "base_label", and "method name" from the iseq, to match ruby's own logic for this.

"libruby.so.2.7.8+0x202f17",
"libruby.so.2.7.8+0x215a23",
"<main>+0 in /pwd/testsources/ruby/loop.rb:29",
"each+0 in <cfunc>:0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This frame was previously elided but is now handled.

"libruby.so.3.5.0+0x31e2ea",
"libruby.so.3.5.0+0x325240",
"libruby.so.3.5.0+0x333b86",
"<main>+0 in /app/loop.rb:29",
Copy link
Contributor Author

@dalehamel dalehamel Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of the repeated frame bug (#770 ) which is now fixed, hence why it doesn't repeat anymore.

"is_prime+0 in /app/loop.rb:10",
"sum_of_primes+0 in /app/loop.rb:20",
"<main>+0 in /app/loop.rb:30",
"Object#is_prime+0 in /app/loop.rb:45481984",
Copy link
Contributor Author

@dalehamel dalehamel Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the cause, but there seems to be a bug when computing the line numbers for the leaf ruby frame. (see below, found the cause and a solution) Here it is an impossibly large value, and in other spots it is 0. It seems to be a consequence of moving to the CME backed iseqs rather than just the bare one which is directly accessible.

If it's ok, i'd like to just file a follow-up issue to fix that separately.

Copy link
Contributor Author

@dalehamel dalehamel Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this a bit more, I believe the issue is that the line number is also encoded in the plain iseq directly accessible from the CFP:

https://github.com/ruby/ruby/blob/52a17bbe6d778d56cc600f73f107c1992350f877/vm_backtrace.c#L1763-L1767

I'll see if this can be passed from bpf as well to cover this case.

Copy link
Contributor Author

@dalehamel dalehamel Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is already quite large, and my solution for passing the additional bytes is probably controversial (since I need to use the padding on Frame to send another address), I'm not pushing my solution to this branch just yet. But, I do have a fix and it is available here to get an idea of how it can be solved:

Shopify@9c0f0f6

I'd like to suggest we move forward with this branch as-is, and I submit this fix as a separate PR if/when this branch lands.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

Copy link
Contributor Author

@dalehamel dalehamel Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

It would be idea, yeah, but I haven't actually changed the line number calculation function at all. It seems to work correctly.

If anything, it might help to move the guard clause in C (which previously did exist in BPF but has now been removed, since we always want to push the frame) to the top of the go function. I can give that a try, it would be good to have and should be pretty simple?

Ie, this check:

https://github.com/ruby/ruby/blob/52a17bbe6d778d56cc600f73f107c1992350f877/vm_backtrace.c#L1777-L1778

EDIT - this doesn't seem to work, perhaps because we have no way of knowing if the frame is the "top" frame (they are sent to us individually, we don't know where they are in the sequence). I could perhaps flag this by jamming more information into the File bits, but i'd rather just leave this for now since there already is a fix on the table with the correct logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to file #931 separately, if that lands then i'll pull the code to fix the line numbers right into this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed all of this in dalehamel#12 which builds on #946 which i rebased on today

Copy link

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a contributor on this repo, but I've spent a lot of time looking at Ruby's stack trace unwinding code in backtracie and datadog's ruby profiler: All of the extra work to collect the class names + C function names in this PR looks quite reasonable.

record->phpUnwindState.zend_execute_data = 0;
record->rubyUnwindState.stack_ptr = 0;
record->rubyUnwindState.last_stack_frame = 0;
record->rubyUnwindState.cfunc_saved_frame = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.

// number of attempts to read Go custom labels
metricID_UnwindGoLabelsAttempts,

// number of failures to read Go custom labels
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⬆️ Minor: I think you forgot to update the metricID_UnwindRubyErrReadIseqEncoded/metricID_UnwindRubyErrReadIseqSize above as well.

(Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, i'll fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?

I figured since the metrics (and errors) rely on unique IDs, leaving them there increases the potential that they get reused, and it is possible that metrics (or errors) could be misidentified. It might make sense to reuse them in the future, but for now leaving them as-is and deprecating them avoids this possibility

"is_prime+0 in /app/loop.rb:10",
"sum_of_primes+0 in /app/loop.rb:20",
"<main>+0 in /app/loop.rb:30",
"Object#is_prime+0 in /app/loop.rb:45481984",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".

E.g. without that info it looks to me we can only tell the current stack looks better/matches what seems to be happening in gdb, but it's unclear if there's some weird corner case that's still missing or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".

Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)

Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)

It may be harder on macOS than on Linux; I just got a new laptop running Ubuntu 24.04 and I was able to get even Ruby 2.2 going without too much mucking (although I did need some).

So I guess if this would be an interesting (separate) contribution, I could look into it...

Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.

Yes, I think ideally we'd have both sets of stacks -- the Ruby ones I'd expect would be a strict subset of the complete symbolized stacks.

Comment on lines 45 to 59
typedef struct rb_control_frame_struct {
const void *pc; // cfp[0]
void *sp; // cfp[1]
const void *iseq; // cfp[2]
void *self; // cfp[3] / block[0]
const void *ep; // cfp[4] / block[1]
const void *block_code; // cfp[5] / block[2] -- iseq, ifunc, or forwarded block handler
void *jit_return; // cfp[6] -- return address for JIT code
} rb_control_frame_t;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if it's worth introducing a typedef for VALUE, to keep the closer alignment with the types used on the Ruby VM (rather than replacing them with a void *)

Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this contribution @dalehamel . I don't have the expertise to review this, but I've added an agenda item to the next profiling SIG meeting notes to discuss a planning for getting this reviewed and landed.

@dalehamel
Copy link
Contributor Author

dalehamel commented Nov 6, 2025

profiling SIG meeting notes

Hey that is awesome, thanks @felixge . Perhaps I can attend the meeting. FYI I am opening a couple of additional issues today to reference some work that depends on this, and give an overall view to the Ruby interpreter improvements we are using internally that I would like to contribute back :)

EDIT: here it is #941, see also #936 and #937 which build on this PR.

@dalehamel
Copy link
Contributor Author

I've rebased with #946 in dalehamel#12 which I've staged to update so that once that PR lands, I can update this PR to include the line number fixes.

Copy link
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! And apologies for the delay. First glance done, mostly mechanical. I need to study the Ruby VM also a bit to understand the split what needs to be in ebpf, and what can be done in Go side. Can you also rebase now that the variable length frame PR is merged?

// regex to extract a version from a string
rubyVersionRegex = regexp.MustCompile(`^(\d)\.(\d)\.(\d)$`)

unknownCfunc = libpf.Intern("UNKNOWN CFUNC")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically these have been in the style <unknown> or similar. Is the string suggested here also used by Ruby in its backtraces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the string suggested here also used by Ruby in its backtraces?

No, ruby should always be able to read this label internally so there is no case that needs to be handled for not having a cfunc id.

We should also be able to reliably determine this, unless if:

  • We are on a super old version of ruby
  • We are on a new version of ruby that the bpf profiler doesn't handle yet
  • Ruby was built with non-standard settings and the offsets we use to find the values in the id table are scewed.

This value here is mostly to indicate specifically that we know it was a c function ,but couldn't resolve its id.

Typically these have been in the style or similar.

I'll switch it to use this convention

Comment on lines 734 to 749
classFlags := r.rm.Ptr(classAddr)
classMask := classFlags & rubyTMask

classpathPtr = r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath))
if classMask == rubyTIClass {
//https://github.com/ruby/ruby/blob/b627532/vm_backtrace.c#L1931-L1933

if klassAddr := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rbasic_struct.klass)); klassAddr != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several reads based on classAddr. Could these be merged into one read to buffer sized to hold the full data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reasonably do this for the flags and klass as they are reliably at the start of the struct and at fixed offsets.

For singleton and classpath though they are much further down in the struct (currently ~100 bytes) so unless we read the whole object i'm not sure that makes sense. How expensive is a large read vs multiple smaller reads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to do a large read that will encompass rbasic + classext + classpath + size of value, so that we can get the entire object up to and including classpath pointer in the first read.

We'll still need to do additional reads if the object is an iclass or a singleton, as we need to get a different base classaddr to the classpath in those cases.

singletonObject := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.as_singleton_class_attached_object))
classpathPtr = r.rm.Ptr(singletonObject + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath))

// TODO (dalehamel) in future PR handle anonymous classes and modules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok.

Comment on lines 781 to 787
// RUBY_ID_SCOPE_SHIFT = 4
// https://github.com/ruby/ruby/blob/797a4115bbb249c4f5f11e1b4bacba7781c68cee/template/id.h.tmpl#L30
rubyIdScopeShift := 4

// ID_ENTRY_UNIT
// https://github.com/ruby/ruby/blob/v3_4_5/symbol.c#L77
idEntryUnit := uint64(512)

// ID_ENTRY_SIZE
// https://github.com/ruby/ruby/blob/980e18496e1aafc642b199d24c81ab4a8afb3abb/symbol.c#L93
idEntrySize := uint64(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be const. Potentially in the vmStruct to make these configurable if there's risk that these change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// rb_iseq_constant_body
// https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/vm_core.h#L311
iseqBody := libpf.Address(frame.File)
lastId := r.rm.Uint32(r.globalSymbolsAddr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lastId here volatile? Perhaps we could still cache it, and re-read it only if the serial is larger then the cached lastId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time a string is interned, this value can change https://github.com/ruby/ruby/blob/20cda200d3ce092571d0b5d342dadca69636cb0f/symbol.c#L795, so yes I'd say that this is potentially volatile.

Perhaps we could still cache it, and re-read it only if the serial is larger then the cached lastId?

Yes this is probably ok, since it is only being used for the check. We would need to do the check twice but I guess that is cheaper than unconditionally re-reading the same value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now cached

&stack_ptr_current,
sizeof(stack_ptr_current),
(void *)(current_ctx_addr + rubyinfo->vm_stack))) {
DEBUG_PRINT("ruby: failed to read current stack pointer");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these removed to just reduce ebpf byte code size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because the debug prints add a lot of byte code, and we can communicate the same thing with return codes / metrics.

Comment on lines +403 to 420
error = read_ruby_frame(record, rubyinfo, stack_ptr, next_unwinder);
if (error != ERR_OK)
return error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi related, we are trying to adjust the unwinders to not abort on non-fatal errors. Would it be reasonable to do it in this PR while at it?
The idea would be to emit error frame (and potentially mark the HLL unwinder as "done") if an error is encountered, and then continue with the native unwinder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the PR that added this in the last couple of months and think it mostly makes sense. However, in the case of ruby jit (a subsequent PR), we know that if we have a JIT PC, we explicitly DO want to end unwinding when we are done unwinding ruby, as the native unwinder cannot proceed further. So there is some nuance here we'll need to deal with.

We can probably just emit the error frame as you say though and use a better mechanism to say to stop unwinding further, i'll just need to study what the other interpreters are doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find an example of other unwinders doing this, but maybe i'm looking for the wrong thing.

Either way, perhaps I could follow up with another PR to make continuing with unwinding more robust once this and the other ruby PRs i have queued up have been reviewed and hopefully merged.

if (rubyinfo->version < 0x30100)
cf_size -= sizeof(control_frame.jit_return);

read_cfp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it was used an in earlier version. Replacing the label with a comment.

// Next iteration of the loop, or error out if we have hit the maximum as we
// couldn't find the method entry
next_ep:
if (ep_check++ < MAX_EP_CHECKS && (!((u64)vm_env.flags & VM_ENV_FLAG_LOCAL))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer this loop to be a for with the proper UNROLL annotation. This might end up generating unrolled code at some point (and fail verifier in the old kernels).

In general gotos like this suggest this probably should be split into more functions. While there is no rule to not use goto, I personally think they should not be used in complex way like this where it becomes readability issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer this loop to be a for with the proper UNROLL annotation. This might end up generating unrolled code at some point (and fail verifier in the old kernels).

Yeah i can try that, I think the verifier will probably treat it the same. It will bloat the number of instructions for this program, but ultimately it is basically already evaluating that much at load time anyways so shouldn't change the behaviour.

In general gotos like this suggest this probably should be split into more functions. While there is no rule to not use goto, I personally think they should not be used in complex way like this where it becomes readability issue.

I prefer not to use them to, but had to resort to it in order to try to minimize the number of instructions in the inner loop here in order to satisfy the verifier, since the algorithm in ruby itself uses a "while" loop, so the go-to seems to help the verifier understand where the execution can be short circuited. This is definitely the most finnicky part of the code and maybe could be refactored as you say, but I'd rather not change the logic in this PR as it would risk breaking things.

Copy link
Contributor Author

@dalehamel dalehamel Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the unroll and eliminated most of the gotos ("continue" and "break" do basically the same flow control more idiomatically here)

frame_addr = me_or_cref;

if (cfunc) {
if (rubyinfo->version < 0x20600) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be also a feature flag in rubyinfo?

Copy link
Contributor Author

@dalehamel dalehamel Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be also a feature flag in rubyinfo?

I can add it, but note that this is how the code already runs the check on main for this branch, see

if (rubyinfo->version < 0x20600) {
// With Ruby version 2.6 the scope of our entry symbol ruby_current_execution_context_ptr
// got extended. We need this extension to jump back unwinding Ruby VM frames if we
// continue at this point with unwinding native frames.
// As this is not available for Ruby versions < 2.6 we just skip this indicator frame and
// continue unwinding Ruby VM frames. Due to this issue, the ordering of Ruby and native
// frames might not be correct for Ruby versions < 2.6.

@dalehamel
Copy link
Contributor Author

dalehamel commented Jan 7, 2026

@fabled I believe I've addressed all your first batch of comments with either code or conversation, apologies if I missed anything. I believe this should be ready for another look when you have the time, thanks!

EDIT: woops, github hid some of them. Addressing those now.
EDIT2: Addressed them all again.

sourceFileName, err := r.getStringCached(sourceFileNamePtr, r.readPathObjRealPath)
if err != nil {
return err
log.Debugf("Failed to get source file name %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really safe to continue at this point? Later, when reading labels, we return.

Copy link
Contributor Author

@dalehamel dalehamel Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the string read for the source file name is allowed to fail. We fail if ANY of the label reads fails, as this would lead to calculating an incorrect label since the algorithm would treat them as empty, but they are actually an "error".

The source file name isn't as critical as the method label, and we need three components to correctly compute the full label.

"libruby.so.2.7.8+0x211485",
"libruby.so.2.7.8+0x2212d2",
"<main>+0 in /pwd/testsources/ruby/loop.rb:29",
"<=>+0 in <cfunc>:0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what has happened here?

Copy link
Contributor Author

@dalehamel dalehamel Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous label calculation was overly simplistic. <main> here is actually block (2 levels) in <main>, it was also looking at the wrong iseq so it got the wrong line number.

In ruby <=> is the label for "compare" operations, it is used throughout ruby but here is an example in string.c https://github.com/ruby/ruby/blob/c794a97940a36269cffcb6ad35ef7ff209fe2720/string.c#L12193-L12195

The stack frames also moved around a bit as I noted in the comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants