feat(datadog_agent source): accept LLMObs telemetry at /api/v2/llmobs#25636
feat(datadog_agent source): accept LLMObs telemetry at /api/v2/llmobs#25636ronitanilkumar wants to merge 3 commits into
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca3999197a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| #[derive(Deserialize)] | ||
| struct LLMObsSpan { |
There was a problem hiding this comment.
Preserve optional LLMObs span fields
When an SDK emits fields such as span_links, config, or collection_errors, Serde silently ignores them because they are absent from this struct; the reconstruction below also retains only ml_app from the _dd object. These fields are part of current LLMObs span events, so linked-span relationships, experiment configuration, and collection errors are irreversibly lost before downstream transforms can inspect them.
Useful? React with 👍 / 👎.
| .flat_map(|item| { | ||
| let tracer_version = item.dd_tracer_version.clone(); | ||
| item.spans.into_iter().map(move |span| { | ||
| let mut log = LogEvent::default(); |
There was a problem hiding this comment.
Add standard source metadata to LLMObs logs
Every emitted LLMObs log is created without calling insert_standard_vector_source_metadata, unlike logs::decode_log_body. Consequently these events lack the configured source_type and ingest timestamp in both legacy and Vector log namespaces, breaking pipelines and observability logic that rely on the standard metadata supplied by this source's other log events.
Useful? React with 👍 / 👎.
| .collect(); | ||
|
|
||
| Ok(events) |
There was a problem hiding this comment.
After decoding succeeds, this function returns the events without emitting through source.events_received, while the log, metric, and trace decoders all emit CountByteSize through that registered handle. Deployments receiving LLMObs traffic will therefore under-report component_received_events_total and received-event byte metrics, potentially showing zero received events for an LLMObs-only source despite successfully forwarding data.
Useful? React with 👍 / 👎.
| if !self.disable_llmobs { | ||
| output.push(SourceOutput::new_maybe_logs(DataType::Log, llmobs_definition).with_port(LLMOBS)) |
There was a problem hiding this comment.
Define an LLMObs-specific output schema
When multiple_outputs and schema.enabled are both enabled, the llmobs port advertises a clone of the ordinary log decoder schema, which can require fields such as message and does not declare LLMObs fields such as span_id, trace_id, or meta. Schema-aware VRL compilation and sink validation will therefore reason about the wrong event shape; this port needs its own definition matching the events produced by decode_llmobs_body.
Useful? React with 👍 / 👎.
| body: Bytes, | ||
| api_key: Option<Arc<str>>, | ||
| ) -> Result<Vec<Event>, ErrorMessage> { | ||
| let envelope: Vec<LLMObsEnvelopeItem> = serde_json::from_slice(&body).map_err(|error| { |
There was a problem hiding this comment.
Accept the SDK's object envelope
Real dd-trace-py payloads, including the referenced 2.17 release, serialize one object shaped like {"_dd.stage":"raw","event_type":"span","spans":[...]}, whereas this deserializes only a top-level JSON array. Posting an SDK-generated payload to the newly registered endpoint therefore fails with HTTP 400 before any span is emitted; the array-shaped fixtures and curl example do not match the SDK wire format.
Useful? React with 👍 / 👎.
| warp::post() | ||
| .and(path!("api" / "v2" / "llmobs" / ..)) |
There was a problem hiding this comment.
Register the EVP proxy route used in agent mode
When an LLMObs SDK is configured to send through its Datadog Agent, it posts spans to /evp_proxy/v2/api/v2/llmobs (for example, dd-trace-py defines this as its proxied endpoint), not directly to /api/v2/llmobs. This filter only matches the direct intake path, so standard agent-mode clients pointed at Vector still receive a 404 and the feature only works after a nonstandard endpoint override.
Useful? React with 👍 / 👎.
| if let Some(ml_app) = span | ||
| .dd | ||
| .as_ref() | ||
| .and_then(|dd| dd.get("ml_app")) | ||
| .and_then(|v| v.as_str()) | ||
| { | ||
| log.insert("ml_app", ml_app.to_owned()); |
There was a problem hiding this comment.
For dd-trace-py 2.17 payloads, ml_app is encoded in the span's tags array as ml_app:<value> and there is no span-level _dd.ml_app object, so this branch never inserts the advertised top-level ml_app field. After accepting that SDK's envelope, otherwise valid Python events will therefore differ from the documented output and from the test fixture unless ml_app is also recovered from the tags.
Useful? React with 👍 / 👎.
| if let Some(v) = span.start_ns { | ||
| log.insert("start_ns", v); | ||
| } |
There was a problem hiding this comment.
Use the span start time as the log timestamp
LLMObs spans carry their actual event time in nanoseconds via start_ns, but the emitted log only stores that value as an ordinary integer and never assigns it the log timestamp meaning. Once standard source metadata is added, these events will be timestamped at Vector ingestion time instead, so delayed or buffered spans are written to time-aware log sinks at the wrong time; convert start_ns and use it as the event timestamp.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca3999197a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| #[derive(Deserialize)] | ||
| struct LLMObsSpan { |
There was a problem hiding this comment.
Preserve optional LLMObs span fields
When an SDK emits fields such as span_links, config, or collection_errors, Serde silently ignores them because they are absent from this struct; the reconstruction below also retains only ml_app from the _dd object. These fields are part of current LLMObs span events, so linked-span relationships, experiment configuration, and collection errors are irreversibly lost before downstream transforms can inspect them.
Useful? React with 👍 / 👎.
| .flat_map(|item| { | ||
| let tracer_version = item.dd_tracer_version.clone(); | ||
| item.spans.into_iter().map(move |span| { | ||
| let mut log = LogEvent::default(); |
There was a problem hiding this comment.
Add standard source metadata to LLMObs logs
Every emitted LLMObs log is created without calling insert_standard_vector_source_metadata, unlike logs::decode_log_body. Consequently these events lack the configured source_type and ingest timestamp in both legacy and Vector log namespaces, breaking pipelines and observability logic that rely on the standard metadata supplied by this source's other log events.
Useful? React with 👍 / 👎.
| .collect(); | ||
|
|
||
| Ok(events) |
There was a problem hiding this comment.
After decoding succeeds, this function returns the events without emitting through source.events_received, while the log, metric, and trace decoders all emit CountByteSize through that registered handle. Deployments receiving LLMObs traffic will therefore under-report component_received_events_total and received-event byte metrics, potentially showing zero received events for an LLMObs-only source despite successfully forwarding data.
Useful? React with 👍 / 👎.
| if !self.disable_llmobs { | ||
| output.push(SourceOutput::new_maybe_logs(DataType::Log, llmobs_definition).with_port(LLMOBS)) |
There was a problem hiding this comment.
Define an LLMObs-specific output schema
When multiple_outputs and schema.enabled are both enabled, the llmobs port advertises a clone of the ordinary log decoder schema, which can require fields such as message and does not declare LLMObs fields such as span_id, trace_id, or meta. Schema-aware VRL compilation and sink validation will therefore reason about the wrong event shape; this port needs its own definition matching the events produced by decode_llmobs_body.
Useful? React with 👍 / 👎.
| body: Bytes, | ||
| api_key: Option<Arc<str>>, | ||
| ) -> Result<Vec<Event>, ErrorMessage> { | ||
| let envelope: Vec<LLMObsEnvelopeItem> = serde_json::from_slice(&body).map_err(|error| { |
There was a problem hiding this comment.
Accept the SDK's object envelope
Real dd-trace-py payloads, including the referenced 2.17 release, serialize one object shaped like {"_dd.stage":"raw","event_type":"span","spans":[...]}, whereas this deserializes only a top-level JSON array. Posting an SDK-generated payload to the newly registered endpoint therefore fails with HTTP 400 before any span is emitted; the array-shaped fixtures and curl example do not match the SDK wire format.
Useful? React with 👍 / 👎.
| warp::post() | ||
| .and(path!("api" / "v2" / "llmobs" / ..)) |
There was a problem hiding this comment.
Register the EVP proxy route used in agent mode
When an LLMObs SDK is configured to send through its Datadog Agent, it posts spans to /evp_proxy/v2/api/v2/llmobs (for example, dd-trace-py defines this as its proxied endpoint), not directly to /api/v2/llmobs. This filter only matches the direct intake path, so standard agent-mode clients pointed at Vector still receive a 404 and the feature only works after a nonstandard endpoint override.
Useful? React with 👍 / 👎.
| if let Some(ml_app) = span | ||
| .dd | ||
| .as_ref() | ||
| .and_then(|dd| dd.get("ml_app")) | ||
| .and_then(|v| v.as_str()) | ||
| { | ||
| log.insert("ml_app", ml_app.to_owned()); |
There was a problem hiding this comment.
For dd-trace-py 2.17 payloads, ml_app is encoded in the span's tags array as ml_app:<value> and there is no span-level _dd.ml_app object, so this branch never inserts the advertised top-level ml_app field. After accepting that SDK's envelope, otherwise valid Python events will therefore differ from the documented output and from the test fixture unless ml_app is also recovered from the tags.
Useful? React with 👍 / 👎.
| if let Some(v) = span.start_ns { | ||
| log.insert("start_ns", v); | ||
| } |
There was a problem hiding this comment.
Use the span start time as the log timestamp
LLMObs spans carry their actual event time in nanoseconds via start_ns, but the emitted log only stores that value as an ordinary integer and never assigns it the log timestamp meaning. Once standard source metadata is added, these events will be timestamped at Vector ingestion time instead, so delayed or buffered spans are written to time-aware log sinks at the wrong time; convert start_ns and use it as the event timestamp.
Useful? React with 👍 / 👎.
Summary
Closes #25441
The Datadog LLMObs SDK sends span events to
/api/v2/llmobs. Thedatadog_agentsource had no handler for this route, so every request returned a 404 error. This PR registers the route, parses the JSON payload, and emits each span as aLogevent.When
multiple_outputsis enabled, events are routed to thellmobsoutput port and can be referenced as<component_id>.llmobsin downstream transforms and sinks. Whenmultiple_outputsis disabled, events flow to the default output alongside logs.LLMObs events are modeled as
Logevents rather thanTraceevents because Vector'sEvent::Tracevariant is coupled to APM protobuf semantics. UsingLoggives users full flexibility to route and transform LLMObs spans with existing sinks and VRL.A
disable_llmobsconfig field (default:false) follows the same opt-out convention asdisable_logs,disable_metrics, anddisable_traces.Vector configuration
How did you test this PR?
ml_appextractionfrom
span._dd.ml_app, API key propagation into event metadata,empty span arrays, and invalid JSON rejection.
cargo test -p vector --lib sources::datadog_agent. 43 passed,0 failed.
cargo vdev check events,make check-clippy,make check-fmt,and
make check-generated-docslocally. All were clean./api/v2/llmobs. Confirmed aLogevent appeared on thellmobsoutput with all expected fields present (
span_id,trace_id,ml_app,meta,metrics,tags,status).Example curl:
Output:
{"_dd":{"tracer_version":"2.17.0"},"duration":12345678900,"meta":{"model_name":"gpt-4","span":{"kind":"llm"}},"metrics":{"input_tokens":64},"ml_app":"my-llm-app","name":"my.workflow","span_id":"abc123","start_ns":1707763310981223236,"status":"ok","tags":["env:prod"],"trace_id":"xyz789"}Change Type
Is this a breaking change?
Does this PR include user facing changes?
changelog.d/25441_llmobs_endpoint.feature.md.References
datadog_agentsource should accept LLMObs telemetry #25441