feat: add gem for auto-instrumentation in otel-operator#1384
feat: add gem for auto-instrumentation in otel-operator#1384xuan-cao-swi wants to merge 99 commits intoopen-telemetry:mainfrom
Conversation
|
We don't need to push to make the gem ready/release, but the script file need to get some review first. |
|
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the |
|
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the |
…lemetry-ruby-contrib into auto-instrumentation
kaylareopelle
left a comment
There was a problem hiding this comment.
This is so close! I noticed a few things in my review today.
| module OTelInitializer | ||
| @initialized = false | ||
|
|
||
| OTEL_INSTRUMENTATION_MAP = { |
There was a problem hiding this comment.
Are there any new instrumentations we should add here?
There was a problem hiding this comment.
Updated. I will create a workflow that keep track of this OTEL_INSTRUMENTATION_MAP later.
There was a problem hiding this comment.
I think it's worth to explore what James comment to have separate repo for this gem.
When I look at the github org I see the following repos for auto instrumentation:
https://github.com/open-telemetry/opentelemetry-go-instrumentation
https://github.com/open-telemetry/opentelemetry-java-instrumentation
https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation
https://github.com/open-telemetry/opentelemetry-php-instrumentation
Since it will have dependabot that need to track the latest sdk and instrumentation gem, I don't want to pollute the PR pages.
| require 'version' | ||
|
|
||
| Gem::Specification.new do |spec| | ||
| spec.name = 'auto-instrumentation' |
There was a problem hiding this comment.
What do you think about calling the gem opentelemetry-auto-instrumentation? Without the opentelemetry prefix it feels very broad.
I remember talking about naming a long time ago, but haven't found that conversation.
@thompson-tomo made a similar comment in the changelog about updating the name.
There was a problem hiding this comment.
Updated to opentelemetry-auto-instrumentation. Let me know if you want the folder name also changed to opentelemetry-auto-instrumentation
| opentelemetry-instrumentation-all | ||
| opentelemetry-exporter-otlp | ||
| opentelemetry-helpers-mysql | ||
| opentelemetry-helpers-sql-obfuscation |
There was a problem hiding this comment.
I think we want to replace the sql-obfuscation gem with opentelemetry-helpers-sql-processor
| opentelemetry-helpers-sql-obfuscation | ||
| opentelemetry-resource-detector-azure | ||
| opentelemetry-resource-detector-container | ||
| opentelemetry-resource-detector-aws |
There was a problem hiding this comment.
We also have a GCP resource detector. Does that make sense to include?
There was a problem hiding this comment.
The GCP resource detector depends on the extra google-cloud-env gem, which can introduce version conflicts. I’m leaning toward only copying the OTEL‑related gems into the operator.
That gem mainly pulls GCP metadata, whereas the other resource detectors use more self‑contained methods like fetching metadata from a local URL.
|
|
||
| ## Configuration | ||
|
|
||
| The following environment variables are specific to this gem (not standard OpenTelemetry variables): |
There was a problem hiding this comment.
What do you think about mentioning that all other OTEL environment variables can still be used for configuration with this gem?
|
|
||
| require 'test_helper' | ||
|
|
||
| describe 'AutoInstrumentation' do |
There was a problem hiding this comment.
| describe 'AutoInstrumentation' do | |
| describe 'OpenTelemetry::AutoInstrumentation' do |
| require 'opentelemetry-sdk' | ||
| require 'opentelemetry-instrumentation-all' |
There was a problem hiding this comment.
If we require these and opentelemetry/resource/detector here, could we be getting false positives? (just thinking about open-telemetry/opentelemetry-ruby#1956)
The tests seem to pass even if we remove those requires.
There was a problem hiding this comment.
yeah, these gem are not need to require here (removed).
| warn '[OpenTelemetry] WARNING: Detected OpenTelemetry gems in your Gemfile: ' \ | ||
| "#{gem_names}. When using auto-instrumentation, OpenTelemetry gems are loaded " \ | ||
| 'from the auto-instrumentation gem path, NOT from your bundle. The gem versions ' \ | ||
| 'in your Gemfile/Gemfile.lock are not used and may cause version conflicts or ' \ | ||
| 'unexpected behavior. Please remove these gems from your Gemfile when using ' \ | ||
| 'auto-instrumentation.' |
There was a problem hiding this comment.
Does this mean that people who want to use metrics or logs can't use auto instrumentation? If that's the case, what do you think about adding a note to the README?
There was a problem hiding this comment.
I added the metrics and logs sdk support in the autoinstrumentation (also include the example in rails-example)
| gem 'rake', '13.0.6' | ||
| gem 'bigdecimal', '3.1.3' | ||
| gem 'logger', '1.5.3' |
There was a problem hiding this comment.
Why are these versions locked?
There was a problem hiding this comment.
This was the main problem I encountered with the rails example. I was using Ruby 3.4 and these versions are older than the latest versions. If I commented these versions out and regenerated the Gemfile.lock, everything worked fine.
When I tried to follow the instructions in the example README and installed the versions it asked for specifically, I kept getting the same error.
There was a problem hiding this comment.
This is the issue of version conflict between user gem env and the auto-instrumentation env.
I added the note that this won't be an issue in operator because we only copy otel-related gem to operator.
To make this example run, you may need to manually update the system gem (e.g. gem update rake/gem install rake -v 13.4.1) when you see something like
You have already activated rake 13.4.2, but your Gemfile requires rake 13.4.1. Prepending `bundle exec` to your command may solve this. (Gem::LoadError)
…lemetry-ruby-contrib into auto-instrumentation
|
@xuan-cao-swi can I ask you to resolve conflicts? |
There was a problem hiding this comment.
Pull request overview
Adds a new opentelemetry-auto-instrumentation Ruby gem/package intended to support OpenTelemetry Operator-style zero-code auto-instrumentation (via RUBYOPT), along with examples, tests, and CI wiring.
Changes:
- Introduces the
opentelemetry-auto-instrumentationgem (gemspec, loader/initializer, versioning, docs, rake tasks, rubocop config). - Adds Minitest coverage that runs initialization in a forked subprocess and validates traces/metrics/logs and Bundler warning behavior.
- Updates repo CI to run this new package’s checks and adjusts Docker build dependencies/base image.
Show a summary per file
| File | Description |
|---|---|
| packages/auto-instrumentation/test/test_helper.rb | Adds subprocess test harness for isolated auto-instrumentation initialization. |
| packages/auto-instrumentation/test/opentelemetry-auto-instrumentation_test.rb | Adds Minitest specs for initialization, signal emission, and Bundler Gemfile warning behavior. |
| packages/auto-instrumentation/opentelemetry-auto-instrumentation.gemspec | Defines new gem metadata and runtime dependencies. |
| packages/auto-instrumentation/lib/version.rb | Introduces gem version constant. |
| packages/auto-instrumentation/lib/opentelemetry-auto-instrumentation.rb | Implements Bundler hook + SDK/config initialization + resource detection wiring. |
| packages/auto-instrumentation/example/simple-example/app.rb | Adds minimal script demonstrating HTTP client instrumentation. |
| packages/auto-instrumentation/example/simple-example/Gemfile | Adds Gemfile scaffold for the simple example. |
| packages/auto-instrumentation/example/rails-example/config.ru | Adds rackup entrypoint for Rails example. |
| packages/auto-instrumentation/example/rails-example/app.rb | Adds Rails example application emitting metrics/logs. |
| packages/auto-instrumentation/example/rails-example/Gemfile | Adds dependencies for Rails example. |
| packages/auto-instrumentation/example/README.md | Documents how to run the included examples. |
| packages/auto-instrumentation/Rakefile | Adds test/rubocop/yard tasks for the new package. |
| packages/auto-instrumentation/README.md | Adds package-level documentation and configuration reference. |
| packages/auto-instrumentation/LICENSE | Adds license file for the new package. |
| packages/auto-instrumentation/Gemfile | Adds development/test dependencies for the new package. |
| packages/auto-instrumentation/CHANGELOG.md | Adds changelog scaffold. |
| packages/auto-instrumentation/.rubocop.yml | Adds package-specific RuboCop config/exclusions. |
| Dockerfile | Updates base image and adds yaml-dev dependency. |
| .github/workflows/ci-contrib.yml | Adds CI job and path filters for packages, including this new gem. |
| .cspell.yml | Removes an extraneous entry/line. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
packages/auto-instrumentation/example/rails-example/app.rb:55
- The RuboCop directive comment is malformed (
# rubocop enable:). RuboCop expects# rubocop:enable ..., so this line won't actually re-enable the cop (and the disable above won’t work either).
# rubocop enable:Style/OneClassPerFile
- Files reviewed: 20/20 changed files
- Comments generated: 9
…lemetry-ruby-contrib into auto-instrumentation
|
@xuan-cao-swi there appears to be an issue calculating code coverage https://github.com/open-telemetry/opentelemetry-ruby-contrib/actions/runs/24790433512/job/72546108811#step:4:1 it is getting 0 lines hence classifying it as 100% coverage. |
| def self._otel_detect_resource_from_env | ||
| env = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s | ||
| additional_resource = ::OpenTelemetry::SDK::Resources::Resource.create({}) | ||
|
|
||
| resource_map = { | ||
| 'container' => (defined?(::OpenTelemetry::Resource::Detector::Container) ? ::OpenTelemetry::Resource::Detector::Container : nil), | ||
| 'azure' => (defined?(::OpenTelemetry::Resource::Detector::Azure) ? ::OpenTelemetry::Resource::Detector::Azure : nil), | ||
| 'aws' => (defined?(::OpenTelemetry::Resource::Detector::AWS) ? ::OpenTelemetry::Resource::Detector::AWS : nil) | ||
| } | ||
|
|
||
| env.split(',').each do |detector| | ||
| additional_resource = additional_resource.merge(resource_map[detector].detect) if resource_map[detector] | ||
| end | ||
|
|
||
| additional_resource | ||
| end |
There was a problem hiding this comment.
We should ensure that we are setting the telemetry.distro.* attributes: https://opentelemetry.io/docs/specs/semconv/registry/entities/telemetry/#telemetry-distro
There was a problem hiding this comment.
This is in development, so I will wait for stable
There was a problem hiding this comment.
It is already in progress see open-telemetry/semantic-conventions#3650 & given how many other unstable we have I think it would be better to have it from the start.
There was a problem hiding this comment.
|
|
||
| def self._otel_detect_resource_from_env | ||
| env = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s | ||
| additional_resource = ::OpenTelemetry::SDK::Resources::Resource.create({}) |
There was a problem hiding this comment.
Shouldn't this have been added by the sdk automatically?
There was a problem hiding this comment.
No, resource detector is not set for sdk by default; also this function allow user to have option to choose what resource detector to use.
There was a problem hiding this comment.
Ok, that is a bug then based on https://opentelemetry.io/docs/concepts/resources/#semantic-attributes-with-sdk-provided-default-value and also https://opentelemetry.io/docs/specs/otel/resource/sdk/#sdk-provided-resource-attributes.
I have no issue with the user choosing the additional resource detectors.
There was a problem hiding this comment.
Have raised open-telemetry/opentelemetry-ruby#2109 to address it.
arielvalentin
left a comment
There was a problem hiding this comment.
Note
🤖 This review was generated with AI assistance (GitHub Copilot CLI). Source code references were verified against the linked repositories.
I looked at how New Relic, Datadog, Elastic APM, and Honeycomb handle zero-code / auto-instrumentation to see how this approach compares. Overall, the Bundler prepend pattern is well-established — New Relic uses the same approach and it is the only pattern in the Ruby ecosystem that achieves truly zero-code instrumentation.
I have a few observations below. I do not want to block progress on this PR but I would like you to strongly consider these changes before we merge.
Thread Safety
The @_otel_initialized flag is checked without synchronization. In a multi-threaded server like Puma, two threads could both read false and double-initialize the SDK.
Both Elastic APM and Datadog protect their initialization with a Mutex. Would it make sense to add one here?
Hardcoded Instrumentation Map
The OTEL_INSTRUMENTATION_MAP duplicates what the SDK registry already knows. Datadog's approach iterates over a self-registering registry instead of maintaining a parallel map. Since this code already calls c.use_all when no filter is set, could we query the SDK's instrumentation registry for the selective case too? This would avoid the map drifting out of sync as new instrumentations are added.
Code Coverage
@thompson-tomo noted that SimpleCov is reporting 0 lines / 100% coverage. The subprocess fork testing is excellent for isolation but the coverage merging needs to be fixed before we merge.
✅ What I liked
- The warning about bundled OTel gems is great defensive programming — none of the other APM libraries do this
- The subprocess fork test architecture is really thorough
- Following the proven New Relic bootstrap pattern was a good choice
| module OTelBundlerPatch | ||
| # Nested module to handle OpenTelemetry initialization logic | ||
| module OTelInitializer | ||
| @_otel_initialized = false |
There was a problem hiding this comment.
Both Elastic APM and Datadog use a Mutex for their initialization guards. Would something like this work here?
@_otel_mutex = Mutex.new
@_otel_initialized = false
def self._otel_require_otel
@_otel_mutex.synchronize do
return if @_otel_initialized
@_otel_initialized = true
# ... rest of init
end
end| module OTelInitializer | ||
| @_otel_initialized = false | ||
|
|
||
| OTEL_INSTRUMENTATION_MAP = { |
There was a problem hiding this comment.
This map will need to be updated every time a new instrumentation is added. Datadog avoids this by iterating over a self-registering registry. Since we already call c.use_all when no filter is set, could we query OpenTelemetry::Instrumentation.registry for the selective case too?
|
|
||
| # Load OpenTelemetry components and their dependencies | ||
| # googleapis-common-protos-types and google-protobuf are dependencies for otlp exporters | ||
| loaded_library_file_path = Dir.glob("#{gem_path}/gems/*").select do |file_path| |
There was a problem hiding this comment.
This writes to $stdout which could pollute application output or break pipe-based workflows. Other APM libraries use loggers or stderr — Honeycomb, Elastic APM. Should we use warn or $stderr.puts instead?
| file_path.include?('opentelemetry') || | ||
| file_path.include?('googleapis-common-protos-types') || | ||
| file_path.include?('google-protobuf') | ||
| end |
There was a problem hiding this comment.
A couple of thoughts here:
- The allowlist only covers OTel + protobuf gems. If any OTel gem has transitive dependencies (e.g.,
faradayfor OTLP HTTP export), they won't be on the load path. file_path.include?('opentelemetry')is a substring match that could match unintended paths.
New Relic's approach is simpler because they only manage their own gem. Since we need to manage a whole ecosystem here, would it make sense to add ALL gems from the operator's gem path instead of an allowlist? Or could we use Gem.use_paths to properly register the operator gem path?
| require 'opentelemetry-exporter-otlp-logs' | ||
| require 'opentelemetry-instrumentation-all' | ||
|
|
||
| resource_detectors = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s |
There was a problem hiding this comment.
String#include? is a substring match — 'my-container-app'.include?('container') would also be true. The _otel_detect_resource_from_env method already handles this correctly by splitting on ,. Should we be consistent here?
| resource_detectors = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s | |
| resource_detectors = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s.split(',').map(&:strip) | |
| require 'opentelemetry-resource-detector-container' if resource_detectors.include?('container') | |
| require 'opentelemetry-resource-detector-azure' if resource_detectors.include?('azure') | |
| require 'opentelemetry-resource-detector-aws' if resource_detectors.include?('aws') |
| spec.require_paths = ['lib'] | ||
| spec.required_ruby_version = '>= 3.3' | ||
|
|
||
| spec.add_dependency 'opentelemetry-api', '~> 1.9.0' |
There was a problem hiding this comment.
Do you want these to always use strict versions vs pessimistic versioning to allow for bug fixes in dependencies without requiring a new version of this gem? ~> 1.9.0 only allows patch releases (>= 1.9.0, < 1.10.0). Any minor version bump of the SDK will require a coordinated release of this gem.
Is there going to be a process that bumps these versions when the API and SDK versions are updated?
Description
This PR aims to contribute a script/gem that can load the necessary OpenTelemetry-related gems and initialize the SDK for the opentelemetry-operator. The script/gem also includes a resource detector as part of its features.
The main idea was inspired by new_relic, which prepends the bundler.require function and places the loading script at the end of the loading procedure. For apps or scripts that don't use Bundler or a Gemfile, more details on variations of app loading can be found in the README. Currently, the goal is to ensure that Rails apps can work, with future improvements in mind.
The reason behind creating this gem is to facilitate easy installation and dependency enforcement. Although for the OpenTelemetry-Operator Node.js, it puts the loading script directly in the src folder (and then copy over to binding volume), it still relies on the auto-instrumentation-node package to install all required packages (e.g., API/SDK and instrumentation packages).
The main purpose of the gem is to serve the OpenTelemetry-Operator, which requires more configuration changes (see the configuration section in the README). Users don't need to modify their Ruby app codebase while using the instrumentation from OpenTelemetry-Ruby, but anyone can still use it in any environment.
The gem name, file structure, and implementation need more suggestions to make them more reasonable. I have tested it with rails and sinatra (through rackup), and I currently still in the process of testing against the opentelemetry operator cluster locally. Once the operator testing done, I will mark the pr as ready.
Related to open-telemetry/opentelemetry-operator#3756