Skip to content

Added optional OpenTelemetry tracing for RPC loader calls#699

Draft
Codingisinmyblud wants to merge 3 commits intometacall:developfrom
Codingisinmyblud:feature/rpc_tracing_v3
Draft

Added optional OpenTelemetry tracing for RPC loader calls#699
Codingisinmyblud wants to merge 3 commits intometacall:developfrom
Codingisinmyblud:feature/rpc_tracing_v3

Conversation

@Codingisinmyblud
Copy link
Copy Markdown

Description

Added optional OpenTelemetry tracing for the RPC loader to improve observability across distributed systems paths.

When built with -DOPTION_RPC_TRACING=ON, the RPC loader wraps both synchronous (function_rpc_interface_invoke) and asynchronous (function_rpc_interface_await) calls in an OpenTelemetry rpc_trace_scope. This emits spans that capture function name, target URLs, latency and also the success/failure resolution. I used a simple C++ RAII abstraction. When disabled (this is the default), a no-op implementation is linked which means there is absolute zero runtime overhead.

Dependencies added:

  • OpenTelemetry C++ SDK (I believe this is fetched automatically via CMake FetchContent when the tracing is enabled)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests/screenshots (if any) that prove my fix is effective or that my feature works.
  • I have tested the tests implicated (if any) by my own code and they pass (make test or ctest -VV -R <test-name>).
  • If my change is significant or breaking, I have passed all tests with ./docker-compose.sh test &> output and attached the output.
  • I have tested my code with OPTION_BUILD_ADDRESS_SANITIZER or ./docker-compose.sh test-address-sanitizer &> output and OPTION_TEST_MEMORYCHECK.
  • I have tested my code with OPTION_BUILD_THREAD_SANITIZER or ./docker-compose.sh test-thread-sanitizer &> output.
  • I have tested with Helgrind in case my code works with threading.
  • I have run make clang-format in order to format my code and my code follows the style guidelines.

If you are unclear about any of the above checks, have a look at our documentation here.

@viferga
Copy link
Copy Markdown
Member

viferga commented Mar 18, 2026

can you explain what are you tracing and how does it work? send a screenshot of it working in a telemetry environment

@Codingisinmyblud
Copy link
Copy Markdown
Author

Codingisinmyblud commented Mar 18, 2026

Hi, hope you're doing well.
In this PR I added the feature to trace both synchronous and asynchronous calls.

  • For synchronous calls it captures the full lifecycle from serialization through the blocking HTTP POST and then finally to deserialization.
  • For asynchronous calls, it captures the initial dispatch phase, which includes serialization and queueing.

With regards to how it works, I added a C++ RAII abstraction that manages the OpenTelemetry span lifecycle. It automatically extracts the target URL and method names to attach as standard attributes (rpc.method, metacall.target, metacall.async). If a network or serialization error were to occur, the span would securely be marked with the error status before destruction. When the OPTION_RPC_TRACING CMake flag is disabled, this class uses a strictly no-op backend, meaning there is zero runtime overhead.

The backend currently uses the built-in console exporter (OStreamSpanExporter). Standard out acts as the telemetry environment for now. I did this just to keep the PR simple and to prove that the architecture and attribute extraction work perfectly. We can easily swap to a OTLPSpanExporter for Zipkin or Jaeger in the fture.

I have attached a screenshot of the structured JSON synchronus call span and the asynchronus call span (from running the native test suite ctest -V -R metacall-rpc-test > test_output.txt)

For synchronous calls:

  • image

For asynchronous calls:

  • image

@viferga
Copy link
Copy Markdown
Member

viferga commented Mar 21, 2026

@Codingisinmyblud It looks great but I cannot merge it yet, I think we should discuss better how is the format and data we will use for tracing it. But it's a good start. For example I think we should serialize the body of the rpc call, so we know exactly how a function was called.

Also the variable for enabling this should be:

OPTION_BUILD_LOADERS_RPC_TELEMETRY

Maybe, or *_TRACING.

@Codingisinmyblud
Copy link
Copy Markdown
Author

Sounds good. I'll get started on those changes for now. Thanks

@viferga
Copy link
Copy Markdown
Member

viferga commented Apr 17, 2026

The CI is failing, I cannot merge it if it doesn't compile.

@viferga viferga marked this pull request as draft April 17, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants