-
Notifications
You must be signed in to change notification settings - Fork 362
Obtain Ruby struct offset and type information from DWARF if available #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Currently only supported for little endian 64 bit ELF Files | ||
| var chdr64 elf.Chdr64 | ||
| section := io.NewSectionReader(mapping, int64(sh.Offset), int64(sh.FileSize)) | ||
| err := binary.Read(section, binary.LittleEndian, &chdr64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm assuming all the files we'll encounter are elf64 and little endian. This info is available on the elf file, but not from the Section struct here, so we technically have access to what these values actually are just not from here...
| if elf.CompressionType(chdr64.Type) != elf.COMPRESS_ZLIB { | ||
| return nil, fmt.Errorf("unsupported compression type %d", elf.CompressionType(chdr64.Type)) | ||
| } | ||
| if chdr64.Size > uint64(maxSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't even attempt decompression if it would exceed the maximum section size, this is a safety to prevent memory bloating
| return nil, err | ||
| } | ||
| defer zlibReader.Close() | ||
| return io.ReadAll(io.LimitReader(zlibReader, int64(maxSize))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
further enforce the maxSize limit here, in case the reported decompressed size was a lie somehow
|
We generally have not wanted to do this kind of DWARF based mechanism for this purpose on ebpf profiler runtime. We have some similar code but it is used to update the build-time constants only. The reasoning is:
In short, it does not solve the typical Linux distribution use case, and may cause stability issues. Instead we generally prefer the approach of either to ask upstream to add the needed symbols and metadata. Or if we need to support versions lacking that, we have written disassembler code to manually introspect the machine code and extract the needed date form it. As a more general solution, we also have the issue #191 to build a better mechanism to store automatically extracted DWARF data and have framework for intepreter plugin code to lookup it from there. |
|
@fabled thanks for taking a look, I'll try and address your points.
Yes, if you are referring to the gdb python script, I used that as the design inspiration for this. I also know that DWARF has a certain reputation for being slow and potentially extremely bloated - in the case of Ruby however I think the handling is manageable and fits within the constraints set by this project.
This is true of a lot of distro packages, but ruby is a bit unique in that the ruby community has embraced compiling ruby from source more than relying on linux distributions.
I haven't seen any yet, the usage here is fairly restricted. I'm sure if there were a segfault, upstream would be open to fixing it as regardless of the intended tool use, it should be considered improper code? Further, a similar argument could be made of The only code that is actually being called here is loading the DWARF data into the constructor, and seeking with the reader to get type data at the listed Offsets. I would expect this code to be quite robust, at at least throw an error rather than panic. In the case of a non-panicking error, it would fall back to the current behaviour of hardcoded offsets.
As my PR description has has indicated, I have been sensitive to this issue and gone to great lengths to either eliminate this overhead (if the dwarf sections are decompressed, there should not be any memory overhead), or minimize it (if they are compressed, we do not use any more memory than other large ELF section handling).
This work came out of talking to core Ruby developers, who were very hesitant to make such changes. The argument on their end was that:
As I have pointed out in my PR description, this is problematic as it does not account for all possible configuration flags, which can shift offsets, does not account for arch-specific type size differences, and must be re-done for new versions. Currently the code does not support Ruby 3.4, and Ruby 3.5 is on the way.
If this can be offloaded to the database and looked up by ELF build ID, great the less that is done on the daemon, the better. In that scenario, perhaps some of the code here could be recycled. |
For what its worth the LuaJIT unwinder uses the disassembler approach and it works pretty well. I fail to see why the disassembler approach couldn't account for configuration flags, arch-specific stuff etc. Also we've seen the LuaJIT approach work without change as new versions come out (although admittedly LuaJIT is pretty baked and not much changes but openresty will fiddle with offsets and our disassembler code so far has worked to pick up those changes). It can be tricky and a bit of a whack-a-mole game to get the disassembly approach right. I think a better approach woulbe to have the upstream project expose unwinder information in some clear well defined manner that requires neither disassembly nor dwarf debug information parsing. So far the only idea I've come up with but haven't pursued is to jam these offsets into the stap notes section (used for usdt/stap probes). Might be a bit of a square block/round hole situation but avoiding disassembly and having a clear defined path for the interpreter to pass information to the unwinder would be pretty sweet. |
That is great, but i think this is not really the case with Ruby. Even between arm and x86 we see different struct offsets. Other projects like rbperf and rbspy have taken the approach of hardcoding offsets or embedding full structs, but it is similarly problematic as while it does work ok maybe 90% of the time, when you are debugging against ruby HEAD if the maintainer has lagged at tracking upstream changes in the structs, it falls behind. Ruby is undergoing a lot of development, with ractors, different jits, and optimization to the instruction sequences all in flux that seem likely to continue to make header parsing / hardcoded struct offsets a pain in subsequent versions.
I had the same idea, and did a rough PoC in #202 (in their own sections, not stap notes). I shared this with ruby maintainers and they rejected it - they are generally opposed to having to expose this internal state, as that would turn private things "public" and now they have to "worry about breaking profiling" any time they touch those structs. It is also a bit of an abuse to use stap notes for this, as this isn't really USDT probes... it is full on struct field offsets, which again is what DWARF is more. The maintainers stated that they have DWARF, it isn't stripped by default (though package maintainers may do this, as I have said above, it is rare for people to use a distro provided ruby package in production for a number of reasons anyways)
I don't quite get this - we should be able to rely on the struct offset information from DWARF, it has the actual type information at compile time, and doesn't have the problems associated with header parsing. BTF used by the kernel for struct offset information is basically just a minimal version of the DWARF info, parsed from pahole, and jammed into the binary iirc (since the kernel doesn't have DWARF, as linus hates it with vitriol). I can understand the resistance to parsing DWARF on the agent, but i don't get why doing it server side and storing the struct field offsets by build-id wouldn't be a viable solution. The debug info should be readily available with debuginfod as an option. |
I don't see how any of this applies to the disassembler approach, parsing the actual machine code of the program can never get the wrong offsets, is immune to arch differences is immune to failing behind. If the code changes so substantially that the disassembler solutions fail to work in newer versions that is definitely a problem.
That's a bummer, for what its worth the v8 approach relies on some build time artifacts that are tacitly understood to be best effort and not well maintained or updated by the v8 core developers, @umanwizard knows more about it than I do.
I don't know what header parsing means, disassembling means reading the actual machine code to find offsets, this was required for LuaJIT because it is commonly stripped in production envs. The game of whack-a-mole I referred to was compiler options/code changes can break disassembler analysis and writing them to be immune to inlining/optimizer levels etc can be challenging.
This may fly with ruby, but I know with luajit a lot of deployments build it custom on the fly with docker build files and throw away the debug info. Another thing that has helped the LuaJIT unwinder stabilize is that we use the dwarf layout information to test our disassembler analyses so that if they break the CI breaks, but at runtime it only relies on the disassembler extractions. Don't know what the right path forward is for Ruby, just offering up some perspective. A client server buildid approach certainly sounds like it could work. |
Yeah my bad, crossed my wires on the different approaches here and thought you were referring to the gdb scripts. Literally disassembling the compiled machine code is one i'd forgotten about, but that seems like an even more manual process, no?
FWIW header parsing is basically using something like LLVM, pointing it at the actual source code headers and calculating the offsets based on the original struct definitions. It's probably the most error prone and I'd argue even worse than the DWARF approach here, but is kind a crapshoot for when you don't have debug info and i guess when disassembly isn't an option. rbspy does something similar where they basically import the ruby structs to make rust structs i guess using some code generation, eg https://github.com/rbspy/rbspy/blob/main/ruby-structs/src/ruby_3_4_5.rs
FWIW I'm not suggesting we do DWARF parsing for anything but Ruby at this point - just because i'm adding a helper for it to the agent doesn't mean it should necessarily be used willy-nilly. It potentially could be, but would require some vetting to see what the memory implications are. In the case where the DWARF symbols are not compressed, my argument is there is no memory impact here as we already mmap the whole ELF file, including the DWARF sections.
Yeah that seems like the best of both worlds, as it offloads the work from the agent to the thing that ostensibly should already have ready access to debug info, and can handle a bit more memory overhead. However, as far as i am aware, nothing beyond a proposal exists for that at this point? |
What
This adds support to pfelf package to be able to read struct field, offset, and size information from DWARF off of the ruby elf sections if they are available.
If this process fails, it falls back to the current version-specific and hardcoded offsets
Why
The current approach is brittle for a number of reasons:
Someone has to go and re-run the offset calculations and update accordingly to try and hit this moving target based on platform, architecture, and config flags. This means we need basically a huge table of offsets to try and get this right.
This approach here will potentially add some slight, but avoidable and temporary, memory overhead when parsing DWARF symbols to get ruby type information. However, it should make it much more robust at detecting and supporting ruby versions, and reduce the maintenance overhead in the long term.
We will still need to routinely check that the fundamental way Ruby is representing stacks, and that the context we are capturing the stack from, is conceptually correct. However, the busy-work of ensuring we are getting the right memory offsets as we walk through the stack frames should be mostly eliminated, and if we do need to add new field offsets or structs in the future, it should be easy to do so. With the addition of debuginfod, there should be absolutely no reason why this information shouldn't be available.
Further, attention to the memory overhead has been carefully managed. The current DWARF parsing for Ruby's
.debugsection fits within the existing maximum 16MB Data size for existing "large sections", even when decompressing it. And even better, if the.debugsections are not compressed, we don't introduce any memory overhead, as we read from the existing slice memory map.How
The existing golang stdlib
debug/dwarfpackage is used to provide access to the dwarf data, but thedwarf.Data's constructor is called directly, not viadebug/elf'sDWARF()call.We first check the file for DWARF sections and then try to read all needed vmStruct values from the type information. Only if this fails, fall back to the current hardcoded values.
In order to do this i added a new function,
TypeDatato elfFile, which will read type information from the pfelf File for an list of types. It will return a list of TypeData struct of equal size. It takes a list of arguments to minimize the overhead, as we will only walk the DWARF once and until all the requested types are found, so it bounds overhead atO(n), where n is the number of types requested.This can return either a struct's type, or a type of something like a simple
typedef'd type.The returned type data can provide
FieldOffsetorFieldSizefor struct members, as well as aSizeat minimum.A test case for basic struct parsing is provided, and a test case with a number of more complicated ruby types and structs as of Ruby 3.4 is added. This ensures that the same values currently provided by https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/078ae4d6ded761b513038440bc8525014fa6c016/tools/coredump/testsources/ruby/gdb-dump-offsets.py are used.
Note that the ruby structures needed also vary by ruby version, so this is accounted for in the logic of populating vmStruct struct in the Loader.
Memory Overhead
We do not call
debug/elf'sDWARF()builder directly, as it is extremely memory inefficient. We only load the.debug_sections minimally necessary to create thedwarf.Dataand does less processing around them to avoid unnecessary allocations.However, we do need to deal with the possibility that DWARF information may be zlib encoded and handle this.
So there are three scenarios:
No DWARF information available, or our DWARF parsing logic is flawed for the available DWARF
DWARF is available, and it is not compressed
--enable-sharedpfelfis already loading the DWARF sections into memory, it just isn't using themDWARF is available, but it is compressed
--enable-shared, ruby's build process adds DWARF compression for some reason unless we specify--with-compress-debug-sections=noneas part of the build processBasic Benchmarks
Using the following program, I've benchmarked loading all needed ruby structs, using an ubuntu
colimalinux VM on MacOS aarch64:Now, calling with:
I can get the struct information:
Note that we should be able to use
pahole /home/dalehamel.linux/.rubies/ruby-3.4.4/lib/libruby.so.3.4to verify these struct offsets are correct, as well as gdb, in particular this gdb script.And
/usr/bin/timegives some information on the overhead, specifically Maximum resident set size (kbytes): 49240:Now, lets compare different senarios.
No DWARF parsing
Maximum resident set size (kbytes): 31488
User time (seconds): 0.06
DWARF parsing, no compression
Or simply, using bin/ruby rather than libruby.so:
Then
chruby 3.4.4We can see that essentially no overhead is added over doing no DWARF processing at all, and there is only a slight bit of CPU overhead:
Maximum resident set size (kbytes): 27328
User time (seconds): 0.11
DWARF parsing, compressed sections
Compile ruby with:
Then
chruby 3.4.4Maximum resident set size (kbytes): 48620
User time (seconds): 0.19