Skip to content

Conversation

@PROFeNoM
Copy link
Contributor

@PROFeNoM PROFeNoM commented Dec 23, 2025

What does this PR do?

Adds multi-tenant API key support for LLM Observability, enabling platforms to route LLMObs telemetry to different Datadog organizations from a single process.

New API:

await llmobs.withRoutingContext({ ddApiKey, ddSite }, async () => {
  // All LLMObs spans here route to the specified Datadog org
})

Motivation

Multi-tenant AI platforms need to send LLM Observability data to their customers' Datadog organizations. This enables:

  • Customer-specific trace visibility
  • Data isolation between tenants
  • Compliance with data residency requirements (different DD sites per customer)

Implementation

  withRoutingContext({ ddApiKey, ddSite }, fn)
      ↓ AsyncLocalStorage
  registerLLMObsSpan()
      ↓ captures routing to span tags
  span.finish()
      ↓
  SpanProcessor.process()
      ↓ extracts routing from tags
  writer.append(event, routing)
      ↓ routes to correct buffer
  setInterval flush()
      ↓ flushes each buffer to its endpoint

Key Changes:

  • withRoutingContext(options, fn) method added to LLMObs SDK (uses AsyncLocalStorage)
  • Routing context captured at span creation time
  • BaseLLMObsWriter refactored to multi-buffer architecture (one buffer per API key)
  • Each buffer flushes independently to its corresponding endpoint
  • Agentless mode only

Testing

  • writers/multi-tenant.spec.js - Tests for buffer routing, endpoint construction, API key isolation, context nesting, and concurrency

@codecov
Copy link

codecov bot commented Dec 23, 2025

Codecov Report

❌ Patch coverage is 47.88732% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.34%. Comparing base (39fb819) to head (9615841).
⚠️ Report is 25 commits behind head on master.

Files with missing lines Patch % Lines
packages/dd-trace/src/llmobs/writers/base.js 52.00% 24 Missing ⚠️
packages/dd-trace/src/llmobs/sdk.js 0.00% 9 Missing ⚠️
packages/dd-trace/src/llmobs/tagger.js 40.00% 3 Missing ⚠️
packages/dd-trace/src/llmobs/noop.js 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7158      +/-   ##
==========================================
- Coverage   84.49%   84.34%   -0.16%     
==========================================
  Files         524      524              
  Lines       22463    22439      -24     
==========================================
- Hits        18981    18927      -54     
- Misses       3482     3512      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@PROFeNoM
Copy link
Contributor Author

@codex review

@github-actions
Copy link

github-actions bot commented Dec 23, 2025

Overall package size

Self size: 4.38 MB
Deduped: 5.2 MB
No deduping: 5.2 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 2.0.0 | 68.46 kB | 797.03 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

@datadog-datadog-prod-us1

This comment has been minimized.

@pr-commenter
Copy link

pr-commenter bot commented Dec 23, 2025

Benchmarks

Benchmark execution time: 2026-01-08 14:35:27

Comparing candidate commit 9615841 in PR branch alex/MLOB-4999_multi-tenant-routing-context-support with baseline commit 39fb819 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 290 metrics, 30 unstable metrics.

@PROFeNoM PROFeNoM force-pushed the alex/MLOB-4999_multi-tenant-routing-context-support branch from cd8146f to b34e1f5 Compare January 5, 2026 09:16
@PROFeNoM PROFeNoM force-pushed the alex/MLOB-4999_multi-tenant-routing-context-support branch from b34e1f5 to e531090 Compare January 5, 2026 09:36
@PROFeNoM PROFeNoM marked this pull request as ready for review January 5, 2026 09:47
@PROFeNoM PROFeNoM requested a review from a team as a code owner January 5, 2026 09:47
this._eventType = eventType

this._buffer = []
this._buffers = new Map()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use private properties instead of underscores for all changed properties and new ones. Changing old ones is nice to have as separate commit :)

Please also add a comment to our Claude.md file that highlights this. Otherwise new code changes will likely come up by the AI that use the underscore instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Converted to # private syntax where possible (#buffers, #getMaskedRoutingKey, #cleanupEmptyBuffers, #getUrlForRouting)

Kept _getRoutingKey and _getOrCreateBuffer with underscore because LLMObsSpanWriter (subclass) needs to access them.

Also added the convention to AGENTS.md (symlink to claude.md ?) under "Class Properties and Methods".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I did keep things like get _buffer, set _buffer with the underscore because they are backward compatibility shims. You were not thinking about changing them as well, right?

@PROFeNoM PROFeNoM requested a review from a team as a code owner January 6, 2026 09:15
@sabrenner sabrenner self-requested a review January 6, 2026 14:42
Copy link
Collaborator

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly small comments but one larger implementation comment i'm curious on your thoughts on. but overall the logic makes sense to me and the sdk api change looks good pending some type definition updates!

@PROFeNoM PROFeNoM marked this pull request as draft January 7, 2026 10:42
…LMObs class, remove legacy routing-context module, and enhance multi-tenant event handling with isolated buffers
if (!options?.ddApiKey) {
throw new Error('ddApiKey is required for routing context')
}
const currentStore = storage.getStore()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if someone nests routing contexts

llmobs.withRoutingContext({ ...tenant1Options }, () => {
  // create span1
  llmobs.withRoutingContext({ ...tenant2Options }, () => {
    // create span2
  })
})

i think as it is now, span1 would be routed to tenant1, and span2 would be routed to tenant2, although I'm not sure if a user would expect that, or for span2 to be sent to both tenant1 and tenant2.

maybe we can/should throw if there is already an active routing context? lmk what you think, but i think it could discourage the above if it's not a situation we want to promote. we can always relax that restraint in the future, but maybe we wanna make sure users get a good understanding & signals from our API.

but lmk if you disagree, i'm also fine to keep it as is, maybe with a warning log or smth but still going through with doing the nested routing context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is correct. As shown in the nested contexts override outer and restore after test, the inner routing context overrides the outer one, and restores after exit. It can indeed be considered a "silent override".

I personally believe that sending to both is problematic. It's complex, potentially dangerous (more easily to do an accidental data leakage), and harder (from a DEV-X) to reason about imo.

I believe that temporarily switching routing for a subset of operation can be a valid use case, hence why I believe that throwing may not be appropriate.
However, it may indeed be useful to provide visibility and to alert users of a potentially unintended behavior.

Right now, based on your suggestion, I've added a warning message:

Nested routing context detected. Inner context (****-key2) will override outer context (****-key1).  
Spans created in the inner context will only be sent to the inner context.

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally believe that sending to both is problematic. It's complex, potentially dangerous (more easily to do an accidental data leakage), and harder (from a DEV-X) to reason about imo.

+1

I believe that temporarily switching routing for a subset of operation can be a valid use case, hence why I believe that throwing may not be appropriate.

ok sounds good! i think a warning log can be fine, and if it become more popular of a use case we can demote it to a debug log or something since it might be nice to have some kind of logged signal still in that case. but im good with a warning log 😄

but i think we can remove the redacted api key bits, idt it's needed for this warning log, but lmk what you think. i think just

[LLM Observability] Nested routing context detected. Inner context will override outer context.  
Spans created in the inner context will only be sent to the inner context.

is good

@PROFeNoM PROFeNoM marked this pull request as ready for review January 7, 2026 15:32
Copy link
Collaborator

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a question on the test file added!

Comment on lines 467 to 470
const outerKey = currentStore.routingContext.apiKey
const innerKey = options.ddApiKey
const maskedOuter = outerKey ? `****${outerKey.slice(-4)}` : ''
const maskedInner = innerKey ? `****${innerKey.slice(-4)}` : ''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my response, if you're ok with it, i think we can forgo logging redacted api keys, and just do the log

const { storage } = require('../../../src/llmobs/storage')
let sdkLogger

function withRoutingContext (options, fn) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these tests be done against the actual sdk? i don't think it's right to re-define the function here and use it in the tests below

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants