Skip to content

Conversation

@dalehamel
Copy link
Contributor

What

This adds support to pfelf package to be able to read struct field, offset, and size information from DWARF off of the ruby elf sections if they are available.

If this process fails, it falls back to the current version-specific and hardcoded offsets

Why

The current approach is brittle for a number of reasons:

  • Whenever a new version of ruby is released, we need to check if they added any struct members or changed member types as this can break the offsets we are checking at
  • It does not take into account different configure options, which means that the approach of going by version number is fundamentally flawed, and offset information can be affected by macros specified at compile time. For example, if ruby is built with "jit" support, the total size of various structs may change to add extra members for jit, and offsets may shift as a new field is popped in the middle of the structs
  • It does not take into account arch-specific offsets type sizes vary by architecture, the hardcoded types don't hold true between x86 and arm64 architectures

Someone has to go and re-run the offset calculations and update accordingly to try and hit this moving target based on platform, architecture, and config flags. This means we need basically a huge table of offsets to try and get this right.

This approach here will potentially add some slight, but avoidable and temporary, memory overhead when parsing DWARF symbols to get ruby type information. However, it should make it much more robust at detecting and supporting ruby versions, and reduce the maintenance overhead in the long term.

We will still need to routinely check that the fundamental way Ruby is representing stacks, and that the context we are capturing the stack from, is conceptually correct. However, the busy-work of ensuring we are getting the right memory offsets as we walk through the stack frames should be mostly eliminated, and if we do need to add new field offsets or structs in the future, it should be easy to do so. With the addition of debuginfod, there should be absolutely no reason why this information shouldn't be available.

Further, attention to the memory overhead has been carefully managed. The current DWARF parsing for Ruby's .debug section fits within the existing maximum 16MB Data size for existing "large sections", even when decompressing it. And even better, if the .debug sections are not compressed, we don't introduce any memory overhead, as we read from the existing slice memory map.

How

The existing golang stdlib debug/dwarf package is used to provide access to the dwarf data, but the dwarf.Data's constructor is called directly, not via debug/elf's DWARF() call.

We first check the file for DWARF sections and then try to read all needed vmStruct values from the type information. Only if this fails, fall back to the current hardcoded values.

In order to do this i added a new function, TypeData to elfFile, which will read type information from the pfelf File for an list of types. It will return a list of TypeData struct of equal size. It takes a list of arguments to minimize the overhead, as we will only walk the DWARF once and until all the requested types are found, so it bounds overhead at O(n), where n is the number of types requested.

This can return either a struct's type, or a type of something like a simple typedef'd type.

The returned type data can provide FieldOffset or FieldSize for struct members, as well as a Size at minimum.

A test case for basic struct parsing is provided, and a test case with a number of more complicated ruby types and structs as of Ruby 3.4 is added. This ensures that the same values currently provided by https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/078ae4d6ded761b513038440bc8525014fa6c016/tools/coredump/testsources/ruby/gdb-dump-offsets.py are used.

Note that the ruby structures needed also vary by ruby version, so this is accounted for in the logic of populating vmStruct struct in the Loader.

Memory Overhead

We do not call debug/elf's DWARF() builder directly, as it is extremely memory inefficient. We only load the .debug_ sections minimally necessary to create the dwarf.Data and does less processing around them to avoid unnecessary allocations.

However, we do need to deal with the possibility that DWARF information may be zlib encoded and handle this.

So there are three scenarios:

  • No DWARF information available, or our DWARF parsing logic is flawed for the available DWARF

    • Fallback to the current offset values, we are no worse off than the status quo before this PR
  • DWARF is available, and it is not compressed

    • This is ideal, and will be the default for binaries built without --enable-shared
    • We should also ideally be able to fetch this from debuginfod - there is no reason to store the DWARF information compressed on the symbol server
    • In this case, we don't allocate anything more as the current loading of the file in pfelf is already loading the DWARF sections into memory, it just isn't using them
  • DWARF is available, but it is compressed

    • This is a bummer as it incurs additional memory allocations and by default with --enable-shared, ruby's build process adds DWARF compression for some reason unless we specify --with-compress-debug-sections=none as part of the build process
    • This can increase the memory image of the agent briefly, until the pfelf file handle is GC'd, and this will happen each time a new ruby process is discovered (ideally, we could cache the vmStruct information by build id somewhere?)

    Basic Benchmarks

    Using the following program, I've benchmarked loading all needed ruby structs, using an ubuntu colima linux VM on MacOS aarch64:

package main

import (
       "go.opentelemetry.io/ebpf-profiler/libpf/pfelf"
       "os"
       "fmt"
)

func main() {
       if len(os.Args) < 3 {
               fmt.Fprintf(os.Stderr, "Usage: %s <elf-file> <--dwarf|--no-dwarf> [type-name]\n", os.Args[0])
               os.Exit(1)
       }

       elfFile := os.Args[1]
       parseDwarfStr := os.Args[2]
       parseDwarf := false
       specificTypes := []string{}

       switch parseDwarfStr {
       case "--dwarf":
               parseDwarf = true
       case "--no-dwarf":
               parseDwarf = false
       default:
               fmt.Fprintf(os.Stderr, "Must specify --dwarf or --no-dwarf, not %s\n", parseDwarfStr)
               os.Exit(1)
       }

       if len(os.Args) > 3 {
               specificTypes = os.Args[3:]
       }

       pf, err := pfelf.Open(elfFile)
       if err != nil {
               fmt.Fprintf(os.Stderr, "Error opening ELF file for %s: %v\n", specificTypes, err)
               os.Exit(1)
       }
       defer pf.Close()

       if parseDwarf {
               data, err := pf.TypeData(specificTypes)
               if err != nil {
                       fmt.Fprintf(os.Stderr, "Error opening ELF file: %v\n", err)
                       os.Exit(1)
               }

               for _, s := range data {
                       fmt.Printf("%s\n", s)
               }

               for _, requested := range specificTypes {
                       found := false
                       for _, d := range data {
                               if d.Name == requested {
                                       found = true
                               }
                       }
                       if !found {
                               fmt.Printf("WARNING: Couldn't find %s\n", requested)
                       }
               }
               if len(specificTypes) != len(data) {
                       fmt.Printf("Requested %d types, found %d\n", len(specificTypes), len(data))
               }
       }
}

Now, calling with:

$ /usr/bin/time -v go run memory_benchmark.go  /home/dalehamel.linux/.rubies/ruby-3.4.4/lib/libruby.so.3.4 --dwarf  rb_execution_context_struct rb_control_frame_struct rb_iseq_struct rb_iseq_constant_body rb_iseq_location_struct iseq_insn_info_entry RString RArray succ_index_table succ_dict_block VALUE

I can get the struct information:


VALUE  // total size: 8 bytes

struct RString {
  RBasic basic; // offset: 0, size: 16
  long int len; // offset: 16, size: 8
  union {heap struct {ptr *char@0; aux union {capa long int@0; shared VALUE@0}@8}@0; embed struct {ary [1]char@0}@0} as; // offset: 24, size: 16
} // total size: 40 bytes


struct RArray {
  RBasic basic; // offset: 0, size: 16
  union {heap struct {len long int@0; aux union {capa long int@0; shared_root const VALUE@0}@8; ptr *const VALUE@16}@0; ary const [1]const VALUE@0} as; // offset: 16, size: 24
} // total size: 40 bytes


struct rb_control_frame_struct {
  *const VALUE pc; // offset: 0, size: 8
  *VALUE sp; // offset: 8, size: 8
  *const rb_iseq_t iseq; // offset: 16, size: 8
  VALUE self; // offset: 24, size: 8
  *const VALUE ep; // offset: 32, size: 8
  *const void block_code; // offset: 40, size: 8
  *void jit_return; // offset: 48, size: 8
} // total size: 56 bytes


struct rb_iseq_struct {
  VALUE flags; // offset: 0, size: 8
  VALUE wrapper; // offset: 8, size: 8
  *struct rb_iseq_constant_body body; // offset: 16, size: 8
  union {compile_data *struct iseq_compile_data@0; loader struct {obj VALUE@0; index int@8}@0; exec struct {local_hooks *struct rb_hook_list_struct@0; global_trace_events rb_event_flag_t@8}@0} aux; // offset: 24, size: 16
} // total size: 40 bytes


struct rb_iseq_location_struct {
  VALUE pathobj; // offset: 0, size: 8
  VALUE base_label; // offset: 8, size: 8
  VALUE label; // offset: 16, size: 8
  int first_lineno; // offset: 24, size: 4
  int node_id; // offset: 28, size: 4
  rb_code_location_t code_location; // offset: 32, size: 16
} // total size: 48 bytes


struct rb_execution_context_struct {
  *VALUE vm_stack; // offset: 0, size: 8
  size_t vm_stack_size; // offset: 8, size: 8
  *rb_control_frame_t cfp; // offset: 16, size: 8
  *struct rb_vm_tag tag; // offset: 24, size: 8
  rb_atomic_t interrupt_flag; // offset: 32, size: 4
  rb_atomic_t interrupt_mask; // offset: 36, size: 4
  *rb_fiber_t fiber_ptr; // offset: 40, size: 8
  *struct rb_thread_struct thread_ptr; // offset: 48, size: 8
  *struct rb_id_table local_storage; // offset: 56, size: 8
  VALUE local_storage_recursive_hash; // offset: 64, size: 8
  VALUE local_storage_recursive_hash_for_trace; // offset: 72, size: 8
  VALUE storage; // offset: 80, size: 8
  *const VALUE root_lep; // offset: 88, size: 8
  VALUE root_svar; // offset: 96, size: 8
  *struct rb_trace_arg_struct trace_arg; // offset: 104, size: 8
  VALUE errinfo; // offset: 112, size: 8
  VALUE passed_block_handler; // offset: 120, size: 8
  uint8_t raised_flag; // offset: 128, size: 1
  enum method_missing_reason {MISSING_NOENTRY=0; MISSING_PRIVATE=1; MISSING_PROTECTED=2; MISSING_FCALL=4; MISSING_VCALL=8; MISSING_SUPER=16; MISSING_MISSING=32; MISSING_NONE=64} method_missing_reason; // offset: 0, size: 4
  VALUE private_const_reference; // offset: 136, size: 8
  {stack_start *VALUE@0; stack_end *VALUE@8; stack_maxsize size_t@16; regs jmp_buf@24} machine; // offset: 144, size: 336
} // total size: 480 bytes


struct rb_iseq_constant_body {
  enum rb_iseq_type {ISEQ_TYPE_TOP=0; ISEQ_TYPE_METHOD=1; ISEQ_TYPE_BLOCK=2; ISEQ_TYPE_CLASS=3; ISEQ_TYPE_RESCUE=4; ISEQ_TYPE_ENSURE=5; ISEQ_TYPE_EVAL=6; ISEQ_TYPE_MAIN=7; ISEQ_TYPE_PLAIN=8} type; // offset: 0, size: 4
  unsigned int iseq_size; // offset: 4, size: 4
  *VALUE iseq_encoded; // offset: 8, size: 8
  {flags struct {has_lead unsigned int@0 : 1@0; has_opt unsigned int@0 : 1@1; has_rest unsigned int@0 : 1@2; has_post unsigned int@0 : 1@3; has_kw unsigned int@0 : 1@4; has_kwrest unsigned int@0 : 1@5; has_block unsigned int@0 : 1@6; ambiguous_param0 unsigned int@0 : 1@7; accepts_no_kwarg unsigned int@0 : 1@8; ruby2_keywords unsigned int@0 : 1@9; anon_rest unsigned int@0 : 1@10; anon_kwrest unsigned int@0 : 1@11; use_block unsigned int@0 : 1@12; forwardable unsigned int@0 : 1@13}@0; size unsigned int@4; lead_num int@8; opt_num int@12; rest_start int@16; post_start int@20; post_num int@24; block_start int@28; opt_table *const VALUE@32; keyword *const struct rb_iseq_param_keyword@40} param; // offset: 16, size: 48
  rb_iseq_location_t location; // offset: 64, size: 48
  iseq_insn_info insns_info; // offset: 112, size: 32
  *const ID local_table; // offset: 144, size: 8
  *struct iseq_catch_table catch_table; // offset: 152, size: 8
  *const struct rb_iseq_struct parent_iseq; // offset: 160, size: 8
  *struct rb_iseq_struct local_iseq; // offset: 168, size: 8
  *union iseq_inline_storage_entry is_entries; // offset: 176, size: 8
  *struct rb_call_data call_data; // offset: 184, size: 8
  {flip_count rb_snum_t@0; script_lines VALUE@8; coverage VALUE@16; pc2branchindex VALUE@24; original_iseq *VALUE@32} variable; // offset: 192, size: 40
  unsigned int local_table_size; // offset: 232, size: 4
  unsigned int ic_size; // offset: 236, size: 4
  unsigned int ise_size; // offset: 240, size: 4
  unsigned int ivc_size; // offset: 244, size: 4
  unsigned int icvarc_size; // offset: 248, size: 4
  unsigned int ci_size; // offset: 252, size: 4
  unsigned int stack_max; // offset: 256, size: 4
  unsigned int builtin_attrs; // offset: 260, size: 4
  _Bool prism; // offset: 264, size: 1
  union {list *iseq_bits_t@0; single iseq_bits_t@0} mark_bits; // offset: 272, size: 8
  *struct rb_id_table outer_variables; // offset: 280, size: 8
  *const rb_iseq_t mandatory_only_iseq; // offset: 288, size: 8
  rb_jit_func_t jit_entry; // offset: 296, size: 8
  long unsigned int jit_entry_calls; // offset: 304, size: 8
  VALUE rjit_blocks; // offset: 312, size: 8
} // total size: 320 bytes


struct iseq_insn_info_entry {
  int line_no; // offset: 0, size: 4
  int node_id; // offset: 4, size: 4
  rb_event_flag_t events; // offset: 8, size: 4
} // total size: 12 bytes


struct succ_index_table {
  [6]uint64_t imm_part; // offset: 0, size: 48
  [0]struct succ_dict_block succ_part; // offset: 48, size: 0
} // total size: 48 bytes


struct succ_dict_block {
  unsigned int rank; // offset: 0, size: 4
  uint64_t small_block_ranks; // offset: 8, size: 8
  [8]uint64_t bits; // offset: 16, size: 64
} // total size: 80 bytes

Note that we should be able to use pahole /home/dalehamel.linux/.rubies/ruby-3.4.4/lib/libruby.so.3.4 to verify these struct offsets are correct, as well as gdb, in particular this gdb script.

And /usr/bin/time gives some information on the overhead, specifically Maximum resident set size (kbytes): 49240:

        Command being timed: "go run memory_benchmark.go /home/dalehamel.linux/.rubies/ruby-3.4.4/lib/libruby.so.3.4 --dwarf rb_execution_context_struct rb_control_frame_struct rb_iseq_struct rb_iseq_constant_body rb_iseq_location_struct iseq_insn_info_entry RString RArray succ_index_table succ_dict_block VALUE"
        User time (seconds): 0.17
        System time (seconds): 0.08
        Percent of CPU this job got: 108%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.23
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 49192
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 18025
        Voluntary context switches: 1102
        Involuntary context switches: 457
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Now, lets compare different senarios.

No DWARF parsing

Maximum resident set size (kbytes): 31488
User time (seconds): 0.06

$ /usr/bin/time -v go run memory_benchmark.go  /home/dalehamel.linux/.rubies/ruby-3.4.4/lib/libruby.so.3.4 --no-dwarf  rb_execution_context_struct rb_control_frame_struct rb_iseq_struct rb_iseq_constant_body rb_iseq_location_struct iseq_insn_info_entry RString RArray succ_index_table succ_dict_block VALUE
        Command being timed: "go run memory_benchmark.go /home/dalehamel.linux/.rubies/ruby-3.4.4/lib/libruby.so.3.4 --no-dwarf rb_execution_context_struct rb_control_frame_struct rb_iseq_struct rb_iseq_constant_body rb_iseq_location_struct iseq_insn_info_entry RString RArray succ_index_table succ_dict_block VALUE"
        User time (seconds): 0.06
        System time (seconds): 0.04
        Percent of CPU this job got: 88%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.11
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 31488
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 7697
        Voluntary context switches: 1038
        Involuntary context switches: 304
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

DWARF parsing, no compression

ruby-install ruby 3.4.4 -- --enable-shared --with-compress-debug-sections=none

Or simply, using bin/ruby rather than libruby.so:

ruby-install ruby 3.4.4

Then chruby 3.4.4

We can see that essentially no overhead is added over doing no DWARF processing at all, and there is only a slight bit of CPU overhead:

Maximum resident set size (kbytes): 27328
User time (seconds): 0.11

        Command being timed: "go run memory_benchmark.go /home/dalehamel.linux/.rubies/ruby-3.4.4/lib/libruby.so.3.4 --dwarf rb_execution_context_struct rb_control_frame_struct rb_iseq_struct rb_iseq_constant_body rb_iseq_location_struct iseq_insn_info_entry RString RArray succ_index_table succ_dict_block VALUE"
        User time (seconds): 0.11
        System time (seconds): 0.10
        Percent of CPU this job got: 71%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.31
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 27328
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 557
        Minor (reclaiming a frame) page faults: 9920
        Voluntary context switches: 6518
        Involuntary context switches: 624
        Swaps: 0
        File system inputs: 147288
        File system outputs: 24
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

DWARF parsing, compressed sections

Compile ruby with:

ruby-install ruby 3.4.4 -- --enable-shared

Then chruby 3.4.4

Maximum resident set size (kbytes): 48620
User time (seconds): 0.19

        Command being timed: "go run memory_benchmark.go /home/dalehamel.linux/.rubies/ruby-3.4.4/lib/libruby.so.3.4 --dwarf rb_execution_context_struct rb_control_frame_struct rb_iseq_struct rb_iseq_constant_body rb_iseq_location_struct iseq_insn_info_entry RString RArray succ_index_table succ_dict_block VALUE"
        User time (seconds): 0.19
        System time (seconds): 0.09
        Percent of CPU this job got: 114%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.24
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 48620
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 19777
        Voluntary context switches: 1389
        Involuntary context switches: 377
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

// Currently only supported for little endian 64 bit ELF Files
var chdr64 elf.Chdr64
section := io.NewSectionReader(mapping, int64(sh.Offset), int64(sh.FileSize))
err := binary.Read(section, binary.LittleEndian, &chdr64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm assuming all the files we'll encounter are elf64 and little endian. This info is available on the elf file, but not from the Section struct here, so we technically have access to what these values actually are just not from here...

if elf.CompressionType(chdr64.Type) != elf.COMPRESS_ZLIB {
return nil, fmt.Errorf("unsupported compression type %d", elf.CompressionType(chdr64.Type))
}
if chdr64.Size > uint64(maxSize) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't even attempt decompression if it would exceed the maximum section size, this is a safety to prevent memory bloating

return nil, err
}
defer zlibReader.Close()
return io.ReadAll(io.LimitReader(zlibReader, int64(maxSize)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

further enforce the maxSize limit here, in case the reported decompressed size was a lie somehow

@fabled
Copy link
Contributor

fabled commented Aug 19, 2025

We generally have not wanted to do this kind of DWARF based mechanism for this purpose on ebpf profiler runtime. We have some similar code but it is used to update the build-time constants only.

The reasoning is:

  • typically debug information is not included in standard installations, e.g. it is stripped on normal packages, and the debug package are typically not installed
  • the Go DWARF reader has issues, and trying to use it may result in panics. That is, its not designed for system daemon usage, but for using as non-system user on a debugger / compiler / development tools
  • parsing the DWARF may consume large amounts of memory causing unwanted runtime memory usage overhead

In short, it does not solve the typical Linux distribution use case, and may cause stability issues.

Instead we generally prefer the approach of either to ask upstream to add the needed symbols and metadata. Or if we need to support versions lacking that, we have written disassembler code to manually introspect the machine code and extract the needed date form it.

As a more general solution, we also have the issue #191 to build a better mechanism to store automatically extracted DWARF data and have framework for intepreter plugin code to lookup it from there.

@dalehamel
Copy link
Contributor Author

dalehamel commented Aug 19, 2025

@fabled thanks for taking a look, I'll try and address your points.

We generally have not wanted to do this kind of DWARF based mechanism for this purpose on ebpf profiler runtime. We have some similar code but it is used to update the build-time constants only.

Yes, if you are referring to the gdb python script, I used that as the design inspiration for this.

I also know that DWARF has a certain reputation for being slow and potentially extremely bloated - in the case of Ruby however I think the handling is manageable and fits within the constraints set by this project.

typically debug information is not included in standard installations, e.g. it is stripped on normal packages, and the debug package are typically not installed

This is true of a lot of distro packages, but ruby is a bit unique in that the ruby community has embraced compiling ruby from source more than relying on linux distributions. ruby-install is a more standard way of installing ruby, and you should usually have access to DWARF info.

the Go DWARF reader has issues, and trying to use it may result in panics. That is, its not designed for system daemon usage, but for using as non-system user on a debugger / compiler / development tools

I haven't seen any yet, the usage here is fairly restricted. I'm sure if there were a segfault, upstream would be open to fixing it as regardless of the intended tool use, it should be considered improper code?

Further, a similar argument could be made of debug/elf, which is already in use by this project for the pfelf package?

The only code that is actually being called here is loading the DWARF data into the constructor, and seeking with the reader to get type data at the listed Offsets. I would expect this code to be quite robust, at at least throw an error rather than panic. In the case of a non-panicking error, it would fall back to the current behaviour of hardcoded offsets.

parsing the DWARF may consume large amounts of memory causing unwanted runtime memory usage overhead
In short, it does not solve the typical Linux distribution use case, and may cause stability issues.

As my PR description has has indicated, I have been sensitive to this issue and gone to great lengths to either eliminate this overhead (if the dwarf sections are decompressed, there should not be any memory overhead), or minimize it (if they are compressed, we do not use any more memory than other large ELF section handling).

Instead we generally prefer the approach of either to ask upstream to add the needed symbols and metadata.

This work came out of talking to core Ruby developers, who were very hesitant to make such changes. The argument on their end was that:

  • Exposing core vm structure offsets through any means pushes more load onto the core ruby developers, and that this approach to stack walking is already going into "private" implementation details, so no stability can really be gained. Either people making external profilers have to deal with it, or the core ruby devs do
  • The pointed out that they already ship with DWARF by default and don't strip (though distro package maintainers may do so without provide a dbg symbol package)

Or if we need to support versions lacking that, we have written disassembler code to manually introspect the machine code and extract the needed date form it.

As I have pointed out in my PR description, this is problematic as it does not account for all possible configuration flags, which can shift offsets, does not account for arch-specific type size differences, and must be re-done for new versions. Currently the code does not support Ruby 3.4, and Ruby 3.5 is on the way.

As a more general solution, we also have the issue #191 to build a better mechanism to store automatically extracted DWARF data and have framework for intepreter plugin code to lookup it from there.

If this can be offloaded to the database and looked up by ELF build ID, great the less that is done on the daemon, the better. In that scenario, perhaps some of the code here could be recycled.

@gnurizen
Copy link
Contributor

Or if we need to support versions lacking that, we have written disassembler code to manually introspect the machine code and extract the needed date form it.

As I have pointed out in my PR description, this is problematic as it does not account for all possible configuration flags, which can shift offsets, does not account for arch-specific type size differences, and must be re-done for new versions. Currently the code does not support Ruby 3.4, and Ruby 3.5 is on the way.

For what its worth the LuaJIT unwinder uses the disassembler approach and it works pretty well. I fail to see why the disassembler approach couldn't account for configuration flags, arch-specific stuff etc. Also we've seen the LuaJIT approach work without change as new versions come out (although admittedly LuaJIT is pretty baked and not much changes but openresty will fiddle with offsets and our disassembler code so far has worked to pick up those changes).

It can be tricky and a bit of a whack-a-mole game to get the disassembly approach right. I think a better approach woulbe to have the upstream project expose unwinder information in some clear well defined manner that requires neither disassembly nor dwarf debug information parsing. So far the only idea I've come up with but haven't pursued is to jam these offsets into the stap notes section (used for usdt/stap probes). Might be a bit of a square block/round hole situation but avoiding disassembly and having a clear defined path for the interpreter to pass information to the unwinder would be pretty sweet.

@dalehamel
Copy link
Contributor Author

For what its worth the #419 uses the disassembler approach and it works pretty well. I fail to see why the disassembler approach couldn't account for configuration flags, arch-specific stuff etc.

That is great, but i think this is not really the case with Ruby. Even between arm and x86 we see different struct offsets. Other projects like rbperf and rbspy have taken the approach of hardcoding offsets or embedding full structs, but it is similarly problematic as while it does work ok maybe 90% of the time, when you are debugging against ruby HEAD if the maintainer has lagged at tracking upstream changes in the structs, it falls behind.

Ruby is undergoing a lot of development, with ractors, different jits, and optimization to the instruction sequences all in flux that seem likely to continue to make header parsing / hardcoded struct offsets a pain in subsequent versions.

So far the only idea I've come up with but haven't pursued is to jam these offsets into the stap notes section (used for usdt/stap probes). Might be a bit of a square block/round hole situation but avoiding disassembly and having a clear defined path for the interpreter to pass information to the unwinder would be pretty sweet.

I had the same idea, and did a rough PoC in #202 (in their own sections, not stap notes). I shared this with ruby maintainers and they rejected it - they are generally opposed to having to expose this internal state, as that would turn private things "public" and now they have to "worry about breaking profiling" any time they touch those structs.

It is also a bit of an abuse to use stap notes for this, as this isn't really USDT probes... it is full on struct field offsets, which again is what DWARF is more.

The maintainers stated that they have DWARF, it isn't stripped by default (though package maintainers may do this, as I have said above, it is rare for people to use a distro provided ruby package in production for a number of reasons anyways)

It can be tricky and a bit of a whack-a-mole game to get the disassembly approach right.

I don't quite get this - we should be able to rely on the struct offset information from DWARF, it has the actual type information at compile time, and doesn't have the problems associated with header parsing.

BTF used by the kernel for struct offset information is basically just a minimal version of the DWARF info, parsed from pahole, and jammed into the binary iirc (since the kernel doesn't have DWARF, as linus hates it with vitriol).

I can understand the resistance to parsing DWARF on the agent, but i don't get why doing it server side and storing the struct field offsets by build-id wouldn't be a viable solution. The debug info should be readily available with debuginfod as an option.

@gnurizen
Copy link
Contributor

For what its worth the #419 uses the disassembler approach and it works pretty well. I fail to see why the disassembler approach couldn't account for configuration flags, arch-specific stuff etc.

That is great, but i think this is not really the case with Ruby. Even between arm and x86 we see different struct offsets. Other projects like rbperf and rbspy have taken the approach of hardcoding offsets or embedding full structs, but it is similarly problematic as while it does work ok maybe 90% of the time, when you are debugging against ruby HEAD if the maintainer has lagged at tracking upstream changes in the structs, it falls behind.

Ruby is undergoing a lot of development, with ractors, different jits, and optimization to the instruction sequences all in flux that seem likely to continue to make header parsing / hardcoded struct offsets a pain in subsequent versions.

I don't see how any of this applies to the disassembler approach, parsing the actual machine code of the program can never get the wrong offsets, is immune to arch differences is immune to failing behind. If the code changes so substantially that the disassembler solutions fail to work in newer versions that is definitely a problem.

So far the only idea I've come up with but haven't pursued is to jam these offsets into the stap notes section (used for usdt/stap probes). Might be a bit of a square block/round hole situation but avoiding disassembly and having a clear defined path for the interpreter to pass information to the unwinder would be pretty sweet.

I had the same idea, and did a rough PoC in #202 (in their own sections, not stap notes). I shared this with ruby maintainers and they rejected it - they are generally opposed to having to expose this internal state, as that would turn private things "public" and now they have to "worry about breaking profiling" any time they touch those structs.

That's a bummer, for what its worth the v8 approach relies on some build time artifacts that are tacitly understood to be best effort and not well maintained or updated by the v8 core developers, @umanwizard knows more about it than I do.

It can be tricky and a bit of a whack-a-mole game to get the disassembly approach right.

I don't quite get this - we should be able to rely on the struct offset information from DWARF, it has the actual type information at compile time, and doesn't have the problems associated with header parsing.

I don't know what header parsing means, disassembling means reading the actual machine code to find offsets, this was required for LuaJIT because it is commonly stripped in production envs. The game of whack-a-mole I referred to was compiler options/code changes can break disassembler analysis and writing them to be immune to inlining/optimizer levels etc can be challenging.

I can understand the resistance to parsing DWARF on the agent, but i don't get why doing it server side and storing the struct field offsets by build-id wouldn't be a viable solution. The debug info should be readily available with debuginfod as an option.

This may fly with ruby, but I know with luajit a lot of deployments build it custom on the fly with docker build files and throw away the debug info.

Another thing that has helped the LuaJIT unwinder stabilize is that we use the dwarf layout information to test our disassembler analyses so that if they break the CI breaks, but at runtime it only relies on the disassembler extractions.

Don't know what the right path forward is for Ruby, just offering up some perspective. A client server buildid approach certainly sounds like it could work.

@dalehamel
Copy link
Contributor Author

I don't know what header parsing means, disassembling means reading the actual machine code to find offsets

Yeah my bad, crossed my wires on the different approaches here and thought you were referring to the gdb scripts. Literally disassembling the compiled machine code is one i'd forgotten about, but that seems like an even more manual process, no?

I don't know what header parsing means

FWIW header parsing is basically using something like LLVM, pointing it at the actual source code headers and calculating the offsets based on the original struct definitions. It's probably the most error prone and I'd argue even worse than the DWARF approach here, but is kind a crapshoot for when you don't have debug info and i guess when disassembly isn't an option. rbspy does something similar where they basically import the ruby structs to make rust structs i guess using some code generation, eg https://github.com/rbspy/rbspy/blob/main/ruby-structs/src/ruby_3_4_5.rs

This may fly with ruby, but I know with luajit a lot of deployments build it custom on the fly with docker build files and throw away the debug info.

FWIW I'm not suggesting we do DWARF parsing for anything but Ruby at this point - just because i'm adding a helper for it to the agent doesn't mean it should necessarily be used willy-nilly. It potentially could be, but would require some vetting to see what the memory implications are. In the case where the DWARF symbols are not compressed, my argument is there is no memory impact here as we already mmap the whole ELF file, including the DWARF sections.

A client server buildid approach certainly sounds like it could work.

Yeah that seems like the best of both worlds, as it offloads the work from the agent to the thing that ostensibly should already have ready access to debug info, and can handle a bit more memory overhead.

However, as far as i am aware, nothing beyond a proposal exists for that at this point?

@dalehamel dalehamel closed this Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants