-
Notifications
You must be signed in to change notification settings - Fork 362
Description
I have submitted a number of issues and PRs related to improving Ruby profiling, and have additional ones queued up.
I thought it would be helpful to create this meta issue to describe the deficiencies that I'm addressing, and paint a more complete picture of the desired end state.
- The largest single improvement is Mimic Ruby's backtrace logic in ruby interpreter and tracer, add support for ruby CMEs #907 which brings the ruby unwinding and symbolization algorithm more in-line with ruby's own approach used by
rb_profile_frameswhich other ruby profilers such as stackprof and vernier use, this addresses [feat] Include class name for ruby method calls #714- This introduces an issue with line numbers, which I have opened Allow pushing an "extra" value in Frame in padding #931 to address so that we can push an additional value to fix it (as is done in Fix leaf lineno Shopify/opentelemetry-ebpf-profiler#4), (superseded by Convert ebpf trace
struct frameto variable length data #940) - Building on this, we need to detect and mark GC frames so that we don't erroneously charge whatever is on the ruby stack for the GC that the ruby VM has decided to run [feat][Ruby] Support detecting GC state and handle it accordingly #936
- Building on this, support for handling of ruby JIT frames is necessary for many production use cases as ruby jit provides a substantial speedup, but currently breaks the unwinding strategy used by the ruby tracer [feat][Ruby] Support for ruby JIT frames #937
- This introduces an issue with line numbers, which I have opened Allow pushing an "extra" value in Frame in padding #931 to address so that we can push an additional value to fix it (as is done in Fix leaf lineno Shopify/opentelemetry-ebpf-profiler#4), (superseded by Convert ebpf trace
- Ruby stores its execution context in thread local storage. We need a reliable way to read this, currently it is only supported if TLS Descriptors are used
- feature: strategy for reading TLS when not using TLS descriptors #883 , and Extract DTV info from __tls_get_addr, add to LibcInfo #929 which partially addresses this
- [feat] Support static bin/ruby builds for ruby interpreter #884 is still needed for static ruby (ie, no libruby.so) support, nothing implemented here yet
- There is also an issue where ruby stacks are typically a lot deeper than other runtimes, hence why I've opened Increase frame buffer to max 1024 frames per trace #908 to address Max stack depth of 128 is insufficient for a variety of cases #760 .
Demonstration branch
At Shopify we've been using a branch composed of a bunch of these fixes and a few other tweaks, it is available at Shopify#8 and should be more or less suitable for production use (assuming production is x86_64, and ruby is compiled with --enable-shared for now). We're using it internally, and probably have one of the largest and most demanding ruby stacks in the world, so I'd expect if it works for us it is sufficient for others.
Here is an example of it working with JIT, detecting GC, and appropriately charging the leaf frames to the appropriate native functions, while also providing frame labels that match what is seen in stackprof or other ruby profilers:
End to end Ruby to native to kernel:
Thanks to #907 we can see the cfunc call that initiated the native calls and it makes sense (here is the native code servicing request to look up memory usage, which is ultimately served by the kernel doing smaps_rollup)
Here is the overall ruby process with native profiling unwinding:
It looks identical to the stackprof request profiles for the same thing, albeit with more detail about the native calls:
Zooming in on GC, we can see exactly what is happening during the marking and sweeping stages:
And zooming in on the leaf nodes, we can see where time is being spent in native functions to service cfunc calls:
