-
Notifications
You must be signed in to change notification settings - Fork 360
feat(llmobs): implement multi-tenant routing context support #7158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(llmobs): implement multi-tenant routing context support #7158
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7158 +/- ##
==========================================
- Coverage 84.49% 84.34% -0.16%
==========================================
Files 524 524
Lines 22463 22439 -24
==========================================
- Hits 18981 18927 -54
- Misses 3482 3512 +30 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@codex review |
Overall package sizeSelf size: 4.38 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 2.0.0 | 68.46 kB | 797.03 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
|
To use Codex here, create a Codex account and connect to github. |
This comment has been minimized.
This comment has been minimized.
BenchmarksBenchmark execution time: 2026-01-08 14:35:27 Comparing candidate commit 9615841 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 290 metrics, 30 unstable metrics. |
…f no routing context is available.
…hance withRoutingContext documentation; add test for empty ddApiKey
cd8146f to
b34e1f5
Compare
b34e1f5 to
e531090
Compare
| this._eventType = eventType | ||
|
|
||
| this._buffer = [] | ||
| this._buffers = new Map() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use private properties instead of underscores for all changed properties and new ones. Changing old ones is nice to have as separate commit :)
Please also add a comment to our Claude.md file that highlights this. Otherwise new code changes will likely come up by the AI that use the underscore instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Converted to # private syntax where possible (#buffers, #getMaskedRoutingKey, #cleanupEmptyBuffers, #getUrlForRouting)
Kept _getRoutingKey and _getOrCreateBuffer with underscore because LLMObsSpanWriter (subclass) needs to access them.
Also added the convention to AGENTS.md (symlink to claude.md ?) under "Class Properties and Methods".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I did keep things like get _buffer, set _buffer with the underscore because they are backward compatibility shims. You were not thinking about changing them as well, right?
sabrenner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly small comments but one larger implementation comment i'm curious on your thoughts on. but overall the logic makes sense to me and the sdk api change looks good pending some type definition updates!
| if (!options?.ddApiKey) { | ||
| throw new Error('ddApiKey is required for routing context') | ||
| } | ||
| const currentStore = storage.getStore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if someone nests routing contexts
llmobs.withRoutingContext({ ...tenant1Options }, () => {
// create span1
llmobs.withRoutingContext({ ...tenant2Options }, () => {
// create span2
})
})i think as it is now, span1 would be routed to tenant1, and span2 would be routed to tenant2, although I'm not sure if a user would expect that, or for span2 to be sent to both tenant1 and tenant2.
maybe we can/should throw if there is already an active routing context? lmk what you think, but i think it could discourage the above if it's not a situation we want to promote. we can always relax that restraint in the future, but maybe we wanna make sure users get a good understanding & signals from our API.
but lmk if you disagree, i'm also fine to keep it as is, maybe with a warning log or smth but still going through with doing the nested routing context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct. As shown in the nested contexts override outer and restore after test, the inner routing context overrides the outer one, and restores after exit. It can indeed be considered a "silent override".
I personally believe that sending to both is problematic. It's complex, potentially dangerous (more easily to do an accidental data leakage), and harder (from a DEV-X) to reason about imo.
I believe that temporarily switching routing for a subset of operation can be a valid use case, hence why I believe that throwing may not be appropriate.
However, it may indeed be useful to provide visibility and to alert users of a potentially unintended behavior.
Right now, based on your suggestion, I've added a warning message:
Nested routing context detected. Inner context (****-key2) will override outer context (****-key1).
Spans created in the inner context will only be sent to the inner context.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally believe that sending to both is problematic. It's complex, potentially dangerous (more easily to do an accidental data leakage), and harder (from a DEV-X) to reason about imo.
+1
I believe that temporarily switching routing for a subset of operation can be a valid use case, hence why I believe that throwing may not be appropriate.
ok sounds good! i think a warning log can be fine, and if it become more popular of a use case we can demote it to a debug log or something since it might be nice to have some kind of logged signal still in that case. but im good with a warning log 😄
but i think we can remove the redacted api key bits, idt it's needed for this warning log, but lmk what you think. i think just
[LLM Observability] Nested routing context detected. Inner context will override outer context.
Spans created in the inner context will only be sent to the inner context.
is good
sabrenner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a question on the test file added!
packages/dd-trace/src/llmobs/sdk.js
Outdated
| const outerKey = currentStore.routingContext.apiKey | ||
| const innerKey = options.ddApiKey | ||
| const maskedOuter = outerKey ? `****${outerKey.slice(-4)}` : '' | ||
| const maskedInner = innerKey ? `****${innerKey.slice(-4)}` : '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per my response, if you're ok with it, i think we can forgo logging redacted api keys, and just do the log
| const { storage } = require('../../../src/llmobs/storage') | ||
| let sdkLogger | ||
|
|
||
| function withRoutingContext (options, fn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these tests be done against the actual sdk? i don't think it's right to re-define the function here and use it in the tests below
… and update tests to reflect changes
Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
…ions and context handling
What does this PR do?
Adds multi-tenant API key support for LLM Observability, enabling platforms to route LLMObs telemetry to different Datadog organizations from a single process.
New API:
Motivation
Multi-tenant AI platforms need to send LLM Observability data to their customers' Datadog organizations. This enables:
Implementation
Key Changes:
withRoutingContext(options, fn)method added to LLMObs SDK (usesAsyncLocalStorage)BaseLLMObsWriterrefactored to multi-buffer architecture (one buffer per API key)Testing
writers/multi-tenant.spec.js- Tests for buffer routing, endpoint construction, API key isolation, context nesting, and concurrency