Skip to content

feat: add gem for auto-instrumentation in otel-operator#1384

Open
xuan-cao-swi wants to merge 99 commits intoopen-telemetry:mainfrom
xuan-cao-swi:auto-instrumentation
Open

feat: add gem for auto-instrumentation in otel-operator#1384
xuan-cao-swi wants to merge 99 commits intoopen-telemetry:mainfrom
xuan-cao-swi:auto-instrumentation

Conversation

@xuan-cao-swi
Copy link
Copy Markdown
Contributor

@xuan-cao-swi xuan-cao-swi commented Jan 30, 2025

Description

This PR aims to contribute a script/gem that can load the necessary OpenTelemetry-related gems and initialize the SDK for the opentelemetry-operator. The script/gem also includes a resource detector as part of its features.

The main idea was inspired by new_relic, which prepends the bundler.require function and places the loading script at the end of the loading procedure. For apps or scripts that don't use Bundler or a Gemfile, more details on variations of app loading can be found in the README. Currently, the goal is to ensure that Rails apps can work, with future improvements in mind.

The reason behind creating this gem is to facilitate easy installation and dependency enforcement. Although for the OpenTelemetry-Operator Node.js, it puts the loading script directly in the src folder (and then copy over to binding volume), it still relies on the auto-instrumentation-node package to install all required packages (e.g., API/SDK and instrumentation packages).

The main purpose of the gem is to serve the OpenTelemetry-Operator, which requires more configuration changes (see the configuration section in the README). Users don't need to modify their Ruby app codebase while using the instrumentation from OpenTelemetry-Ruby, but anyone can still use it in any environment.

The gem name, file structure, and implementation need more suggestions to make them more reasonable. I have tested it with rails and sinatra (through rackup), and I currently still in the process of testing against the opentelemetry operator cluster locally. Once the operator testing done, I will mark the pr as ready.

Related to open-telemetry/opentelemetry-operator#3756

@xuan-cao-swi
Copy link
Copy Markdown
Contributor Author

We don't need to push to make the gem ready/release, but the script file need to get some review first.

@github-actions
Copy link
Copy Markdown
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions Bot added the stale Marks an issue/PR stale label Mar 29, 2025
@kaylareopelle kaylareopelle removed the stale Marks an issue/PR stale label Apr 2, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2025

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions Bot added the stale Marks an issue/PR stale label May 3, 2025
Copy link
Copy Markdown
Contributor

@kaylareopelle kaylareopelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so close! I noticed a few things in my review today.

module OTelInitializer
@initialized = false

OTEL_INSTRUMENTATION_MAP = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any new instrumentations we should add here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I will create a workflow that keep track of this OTEL_INSTRUMENTATION_MAP later.

Copy link
Copy Markdown
Contributor Author

@xuan-cao-swi xuan-cao-swi Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth to explore what James comment to have separate repo for this gem.

When I look at the github org I see the following repos for auto instrumentation:
https://github.com/open-telemetry/opentelemetry-go-instrumentation
https://github.com/open-telemetry/opentelemetry-java-instrumentation
https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation
https://github.com/open-telemetry/opentelemetry-php-instrumentation

Since it will have dependabot that need to track the latest sdk and instrumentation gem, I don't want to pollute the PR pages.

require 'version'

Gem::Specification.new do |spec|
spec.name = 'auto-instrumentation'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about calling the gem opentelemetry-auto-instrumentation? Without the opentelemetry prefix it feels very broad.

I remember talking about naming a long time ago, but haven't found that conversation.

@thompson-tomo made a similar comment in the changelog about updating the name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to opentelemetry-auto-instrumentation. Let me know if you want the folder name also changed to opentelemetry-auto-instrumentation

Comment thread packages/auto-instrumentation/README.md Outdated
opentelemetry-instrumentation-all
opentelemetry-exporter-otlp
opentelemetry-helpers-mysql
opentelemetry-helpers-sql-obfuscation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to replace the sql-obfuscation gem with opentelemetry-helpers-sql-processor

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

opentelemetry-helpers-sql-obfuscation
opentelemetry-resource-detector-azure
opentelemetry-resource-detector-container
opentelemetry-resource-detector-aws
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have a GCP resource detector. Does that make sense to include?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GCP resource detector depends on the extra google-cloud-env gem, which can introduce version conflicts. I’m leaning toward only copying the OTEL‑related gems into the operator.
That gem mainly pulls GCP metadata, whereas the other resource detectors use more self‑contained methods like fetching metadata from a local URL.

Comment thread packages/auto-instrumentation/README.md Outdated

## Configuration

The following environment variables are specific to this gem (not standard OpenTelemetry variables):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about mentioning that all other OTEL environment variables can still be used for configuration with this gem?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


require 'test_helper'

describe 'AutoInstrumentation' do
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
describe 'AutoInstrumentation' do
describe 'OpenTelemetry::AutoInstrumentation' do

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines +10 to +11
require 'opentelemetry-sdk'
require 'opentelemetry-instrumentation-all'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we require these and opentelemetry/resource/detector here, could we be getting false positives? (just thinking about open-telemetry/opentelemetry-ruby#1956)

The tests seem to pass even if we remove those requires.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, these gem are not need to require here (removed).

Comment on lines +88 to +93
warn '[OpenTelemetry] WARNING: Detected OpenTelemetry gems in your Gemfile: ' \
"#{gem_names}. When using auto-instrumentation, OpenTelemetry gems are loaded " \
'from the auto-instrumentation gem path, NOT from your bundle. The gem versions ' \
'in your Gemfile/Gemfile.lock are not used and may cause version conflicts or ' \
'unexpected behavior. Please remove these gems from your Gemfile when using ' \
'auto-instrumentation.'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that people who want to use metrics or logs can't use auto instrumentation? If that's the case, what do you think about adding a note to the README?

Copy link
Copy Markdown
Contributor Author

@xuan-cao-swi xuan-cao-swi Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the metrics and logs sdk support in the autoinstrumentation (also include the example in rails-example)

Comment on lines +8 to +10
gem 'rake', '13.0.6'
gem 'bigdecimal', '3.1.3'
gem 'logger', '1.5.3'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these versions locked?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the main problem I encountered with the rails example. I was using Ruby 3.4 and these versions are older than the latest versions. If I commented these versions out and regenerated the Gemfile.lock, everything worked fine.

When I tried to follow the instructions in the example README and installed the versions it asked for specifically, I kept getting the same error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the issue of version conflict between user gem env and the auto-instrumentation env.
I added the note that this won't be an issue in operator because we only copy otel-related gem to operator.
To make this example run, you may need to manually update the system gem (e.g. gem update rake/gem install rake -v 13.4.1) when you see something like

You have already activated rake 13.4.2, but your Gemfile requires rake 13.4.1. Prepending `bundle exec` to your command may solve this. (Gem::LoadError)

Comment thread Dockerfile Outdated
@arielvalentin
Copy link
Copy Markdown
Contributor

@xuan-cao-swi can I ask you to resolve conflicts?

@arielvalentin arielvalentin requested review from Copilot and removed request for marcotc April 18, 2026 14:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new opentelemetry-auto-instrumentation Ruby gem/package intended to support OpenTelemetry Operator-style zero-code auto-instrumentation (via RUBYOPT), along with examples, tests, and CI wiring.

Changes:

  • Introduces the opentelemetry-auto-instrumentation gem (gemspec, loader/initializer, versioning, docs, rake tasks, rubocop config).
  • Adds Minitest coverage that runs initialization in a forked subprocess and validates traces/metrics/logs and Bundler warning behavior.
  • Updates repo CI to run this new package’s checks and adjusts Docker build dependencies/base image.
Show a summary per file
File Description
packages/auto-instrumentation/test/test_helper.rb Adds subprocess test harness for isolated auto-instrumentation initialization.
packages/auto-instrumentation/test/opentelemetry-auto-instrumentation_test.rb Adds Minitest specs for initialization, signal emission, and Bundler Gemfile warning behavior.
packages/auto-instrumentation/opentelemetry-auto-instrumentation.gemspec Defines new gem metadata and runtime dependencies.
packages/auto-instrumentation/lib/version.rb Introduces gem version constant.
packages/auto-instrumentation/lib/opentelemetry-auto-instrumentation.rb Implements Bundler hook + SDK/config initialization + resource detection wiring.
packages/auto-instrumentation/example/simple-example/app.rb Adds minimal script demonstrating HTTP client instrumentation.
packages/auto-instrumentation/example/simple-example/Gemfile Adds Gemfile scaffold for the simple example.
packages/auto-instrumentation/example/rails-example/config.ru Adds rackup entrypoint for Rails example.
packages/auto-instrumentation/example/rails-example/app.rb Adds Rails example application emitting metrics/logs.
packages/auto-instrumentation/example/rails-example/Gemfile Adds dependencies for Rails example.
packages/auto-instrumentation/example/README.md Documents how to run the included examples.
packages/auto-instrumentation/Rakefile Adds test/rubocop/yard tasks for the new package.
packages/auto-instrumentation/README.md Adds package-level documentation and configuration reference.
packages/auto-instrumentation/LICENSE Adds license file for the new package.
packages/auto-instrumentation/Gemfile Adds development/test dependencies for the new package.
packages/auto-instrumentation/CHANGELOG.md Adds changelog scaffold.
packages/auto-instrumentation/.rubocop.yml Adds package-specific RuboCop config/exclusions.
Dockerfile Updates base image and adds yaml-dev dependency.
.github/workflows/ci-contrib.yml Adds CI job and path filters for packages, including this new gem.
.cspell.yml Removes an extraneous entry/line.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

packages/auto-instrumentation/example/rails-example/app.rb:55

  • The RuboCop directive comment is malformed (# rubocop enable:). RuboCop expects # rubocop:enable ..., so this line won't actually re-enable the cop (and the disable above won’t work either).
# rubocop enable:Style/OneClassPerFile
  • Files reviewed: 20/20 changed files
  • Comments generated: 9

Comment thread packages/auto-instrumentation/example/rails-example/app.rb
Comment thread packages/auto-instrumentation/opentelemetry-auto-instrumentation.gemspec Outdated
Comment thread packages/auto-instrumentation/example/simple-example/app.rb
Comment thread Dockerfile
Comment thread .github/workflows/ci-contrib.yml Outdated
Comment thread packages/auto-instrumentation/lib/opentelemetry-auto-instrumentation.rb Outdated
Comment thread packages/auto-instrumentation/README.md Outdated
Comment thread packages/auto-instrumentation/lib/opentelemetry-auto-instrumentation.rb Outdated
Comment thread packages/auto-instrumentation/opentelemetry-auto-instrumentation.gemspec Outdated
@thompson-tomo
Copy link
Copy Markdown
Contributor

@xuan-cao-swi there appears to be an issue calculating code coverage https://github.com/open-telemetry/opentelemetry-ruby-contrib/actions/runs/24790433512/job/72546108811#step:4:1 it is getting 0 lines hence classifying it as 100% coverage.

Comment on lines +61 to +76
def self._otel_detect_resource_from_env
env = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s
additional_resource = ::OpenTelemetry::SDK::Resources::Resource.create({})

resource_map = {
'container' => (defined?(::OpenTelemetry::Resource::Detector::Container) ? ::OpenTelemetry::Resource::Detector::Container : nil),
'azure' => (defined?(::OpenTelemetry::Resource::Detector::Azure) ? ::OpenTelemetry::Resource::Detector::Azure : nil),
'aws' => (defined?(::OpenTelemetry::Resource::Detector::AWS) ? ::OpenTelemetry::Resource::Detector::AWS : nil)
}

env.split(',').each do |detector|
additional_resource = additional_resource.merge(resource_map[detector].detect) if resource_map[detector]
end

additional_resource
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure that we are setting the telemetry.distro.* attributes: https://opentelemetry.io/docs/specs/semconv/registry/entities/telemetry/#telemetry-distro

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in development, so I will wait for stable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already in progress see open-telemetry/semantic-conventions#3650 & given how many other unstable we have I think it would be better to have it from the start.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def self._otel_detect_resource_from_env
env = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s
additional_resource = ::OpenTelemetry::SDK::Resources::Resource.create({})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this have been added by the sdk automatically?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, resource detector is not set for sdk by default; also this function allow user to have option to choose what resource detector to use.

Copy link
Copy Markdown
Contributor

@thompson-tomo thompson-tomo Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have raised open-telemetry/opentelemetry-ruby#2109 to address it.

Copy link
Copy Markdown
Contributor

@arielvalentin arielvalentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 This review was generated with AI assistance (GitHub Copilot CLI). Source code references were verified against the linked repositories.

I looked at how New Relic, Datadog, Elastic APM, and Honeycomb handle zero-code / auto-instrumentation to see how this approach compares. Overall, the Bundler prepend pattern is well-established — New Relic uses the same approach and it is the only pattern in the Ruby ecosystem that achieves truly zero-code instrumentation.

I have a few observations below. I do not want to block progress on this PR but I would like you to strongly consider these changes before we merge.

Thread Safety

The @_otel_initialized flag is checked without synchronization. In a multi-threaded server like Puma, two threads could both read false and double-initialize the SDK.

Both Elastic APM and Datadog protect their initialization with a Mutex. Would it make sense to add one here?

Hardcoded Instrumentation Map

The OTEL_INSTRUMENTATION_MAP duplicates what the SDK registry already knows. Datadog's approach iterates over a self-registering registry instead of maintaining a parallel map. Since this code already calls c.use_all when no filter is set, could we query the SDK's instrumentation registry for the selective case too? This would avoid the map drifting out of sync as new instrumentations are added.

Code Coverage

@thompson-tomo noted that SimpleCov is reporting 0 lines / 100% coverage. The subprocess fork testing is excellent for isolation but the coverage merging needs to be fixed before we merge.

✅ What I liked

  • The warning about bundled OTel gems is great defensive programming — none of the other APM libraries do this
  • The subprocess fork test architecture is really thorough
  • Following the proven New Relic bootstrap pattern was a good choice

module OTelBundlerPatch
# Nested module to handle OpenTelemetry initialization logic
module OTelInitializer
@_otel_initialized = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Elastic APM and Datadog use a Mutex for their initialization guards. Would something like this work here?

@_otel_mutex = Mutex.new
@_otel_initialized = false

def self._otel_require_otel
  @_otel_mutex.synchronize do
    return if @_otel_initialized
    @_otel_initialized = true
    # ... rest of init
  end
end

module OTelInitializer
@_otel_initialized = false

OTEL_INSTRUMENTATION_MAP = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map will need to be updated every time a new instrumentation is added. Datadog avoids this by iterating over a self-registering registry. Since we already call c.use_all when no filter is set, could we query OpenTelemetry::Instrumentation.registry for the selective case too?


# Load OpenTelemetry components and their dependencies
# googleapis-common-protos-types and google-protobuf are dependencies for otlp exporters
loaded_library_file_path = Dir.glob("#{gem_path}/gems/*").select do |file_path|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This writes to $stdout which could pollute application output or break pipe-based workflows. Other APM libraries use loggers or stderr — Honeycomb, Elastic APM. Should we use warn or $stderr.puts instead?

file_path.include?('opentelemetry') ||
file_path.include?('googleapis-common-protos-types') ||
file_path.include?('google-protobuf')
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts here:

  1. The allowlist only covers OTel + protobuf gems. If any OTel gem has transitive dependencies (e.g., faraday for OTLP HTTP export), they won't be on the load path.
  2. file_path.include?('opentelemetry') is a substring match that could match unintended paths.

New Relic's approach is simpler because they only manage their own gem. Since we need to manage a whole ecosystem here, would it make sense to add ALL gems from the operator's gem path instead of an allowlist? Or could we use Gem.use_paths to properly register the operator gem path?

require 'opentelemetry-exporter-otlp-logs'
require 'opentelemetry-instrumentation-all'

resource_detectors = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String#include? is a substring match — 'my-container-app'.include?('container') would also be true. The _otel_detect_resource_from_env method already handles this correctly by splitting on ,. Should we be consistent here?

Suggested change
resource_detectors = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s
resource_detectors = ENV['OTEL_RUBY_RESOURCE_DETECTORS'].to_s.split(',').map(&:strip)
require 'opentelemetry-resource-detector-container' if resource_detectors.include?('container')
require 'opentelemetry-resource-detector-azure' if resource_detectors.include?('azure')
require 'opentelemetry-resource-detector-aws' if resource_detectors.include?('aws')

spec.require_paths = ['lib']
spec.required_ruby_version = '>= 3.3'

spec.add_dependency 'opentelemetry-api', '~> 1.9.0'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want these to always use strict versions vs pessimistic versioning to allow for bug fixes in dependencies without requiring a new version of this gem? ~> 1.9.0 only allows patch releases (>= 1.9.0, < 1.10.0). Any minor version bump of the SDK will require a coordinated release of this gem.

Is there going to be a process that bumps these versions when the API and SDK versions are updated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci keep Ensures stale-bot keeps this issue/PR open

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants