feat(telemetry): close five OTel GenAI semantic convention emission gaps (#1035)#1036
Conversation
…lpers Add MELLEA_TRACE_CONTENT env-var gate (also recognises the standard OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT) and expose add_span_event() as a safe no-op wrapper. Both exported from mellea.telemetry and mellea.telemetry.tracing. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…ate attrs Five OTel GenAI semconv gaps closed (issue generative-computing#1035): 1. gen_ai.provider.name emitted alongside legacy gen_ai.system (semconv v1.37.0 migration; keep both for dashboard back-compat). 2. gen_ai.conversation.id mapped from existing session_id ContextVar; the existing mellea.session_id attribute is preserved alongside it. 3. llm.prompt_template.template emitted unconditionally from Instruction and GenerativeStub; llm.prompt_template.variables gated behind MELLEA_TRACE_CONTENT (user data). 4. error.type (Stable OTel) set on the error path in the new finalize_backend_span() helper alongside set_span_error(). finalize_backend_span() replaces the three-line record_token_usage + record_response_metadata + end_backend_span pattern in each backend. 5. gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions emitted as structured JSON (spec v1.37.0 schema) when MELLEA_TRACE_CONTENT is enabled. No deprecated per-role events (gen_ai.user.message etc.) are emitted. A gen_ai.client.inference.operation.details span event is added as a marker for log-oriented receivers. Also adds gen_ai.request.temperature/top_p/top_k/frequency_penalty/ presence_penalty from model_options, and cache/reasoning token attrs in record_token_usage(). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…etry Instruction: capture _template_description (raw string before Jinja substitution) and _user_variables (copy) in __init__; expose via prompt_template_metadata() returning (template, variables, version)|None. GenerativeStub: capture f_kwargs on each call; expose via prompt_template_metadata() using the function docstring as the template and f_kwargs as the variables. Neither change affects runtime behaviour — data is retained for duck-typed use by start_generate_span(). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Replace the duplicated record_token_usage + record_response_metadata + end_backend_span pattern in each backend's post_processing() with a single finalize_backend_span() call that also passes the conversation and output text for content capture. Pass model_options into start_generate_span() so request-parameter attributes (temperature, top_p, etc.) are surfaced on the span. The stream error path in core/base.py is also consolidated through finalize_backend_span(error=...). No behaviour change on the success path; error spans now carry error.type and ERROR status instead of silently closing. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
test/telemetry/test_genai_semconv_emission.py — 20 pure-unit tests covering each of the five gaps (no live backend or OTel SDK required): - gen_ai.provider.name + gen_ai.system dual-emission - gen_ai.conversation.id from session_id ContextVar - llm.prompt_template.* from Instruction (always / gated) - error.type + ERROR status via finalize_backend_span - gen_ai.input/output.messages structured JSON (gated) - no deprecated per-role events emitted - finalize_backend_span robustness (None span, broken span) docs/examples/telemetry/otel_genai_semconv_example.py — runnable example for human verification against otelite, demonstrating all five attributes and the error path. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…t capture (gap 5) Remove finalize_backend_span success-path consolidation and all content capture helpers (_emit_content_attributes, _conversation_to_parts). Revert all five backend files to upstream/main — gap 5 requires touching every backend and is better reviewed in isolation. finalize_backend_span is kept as an error-path-only helper (sets error.type + ERROR status, then closes the span) used by the stream error path in ModelOutputThunk.__aiter__. Full implementation including gap 5 is preserved on cs/issue-1035-full. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
|
Flagging this PR against the refreshed tracing epic #444 (rewritten today) so the two efforts stay aligned. Compatible to merge as-is. None of the four gaps landed here conflict with Phase 1 of the refresh — you're closing real semconv gaps regardless of whether emission happens in Two coordination items for after this lands:
Nothing here blocks merging. Flagging proactively so the order of operations is intentional. |
|
Thanks @ajbozarth — hadn't noticed the #1045 angle for gap 5, good to know before I went further down that path. Updated the description with both points. |
jakelorocco
left a comment
There was a problem hiding this comment.
I think logging these attributes in our telemetry traces makes sense. I have a few concerns about how it's being done here.
ajbozarth
left a comment
There was a problem hiding this comment.
A few items from Claude, only one big one.
Drop gap 3 (prompt-template capture) and the model_options/_REQUEST_PARAM_MAP plumbing in response to review feedback from @jakelorocco and @ajbozarth. Jake's objection to gap 3 is correct: stashing template state on Instruction and GenerativeStub is the wrong layer — it only covers two component types, captures pre-substitution values, and puts telemetry concerns inside domain objects. The right implementation is at the formatter render path, which covers all component types. That work belongs after generative-computing#1045 lands. The model_options/_REQUEST_PARAM_MAP block was dead code: no backend call site passes model_options, and even if wired the values would be pre-substitution. Per Nathan's review, the right call is to drop rather than carry forward a no-op. Request-param emission also belongs in the post-generative-computing#1045 plugin layer where the wire-format dict is visible. What remains in this PR: gap 1 — gen_ai.provider.name alongside legacy gen_ai.system gap 2 — gen_ai.conversation.id from session_id ContextVar gap 4 — error.type + ERROR status via finalize_backend_span cache/reasoning token fields in record_token_usage MELLEA_TRACE_CONTENT flag + add_span_event (infrastructure for future gap 5) Also fix OTel_SERVICE_NAME typo in the example (case-sensitive on Linux) and rewrite the example docstring and README entry to be PR-independent. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Remove the `action.prompt_template_metadata = None` assignment left over from the gap-3 prompt-template work that was withdrawn from this PR. The attribute is never read by `start_generate_span` in the trimmed implementation, making the line misleading. Add three unit tests for `add_span_event` (event forwarded to span, None-span no-op, empty-attributes default) patching `_OTEL_AVAILABLE` since the test environment has no OTel SDK installed. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
|
All CI checks are now passing (re-run cleared a Docker image pull flake on the sandbox test — pre-existing flake, unrelated to this PR's changes). The review comments from @ajbozarth and @jakelorocco have been addressed in the commits already on this branch: gaps 3 and model_options withdrawn per your feedback; gap 5 deferred with the infrastructure in place. Happy to clarify anything further. Requesting re-review when you have a moment — thanks! |
- Guard end_backend_span in its own try/except so SDK errors on the streaming error path cannot mask the original backend exception - Wire get_provider_name into all three span-creation functions so internal code uses it (was set but calling get_system_name directly, contradicting the docstring guidance) - Fix span_attrs: dict → dict[str, Any] per project typing conventions - Replace _backend.model_id private-attr mutation in example with start_session(model_id=...) public API - Add qualitative marker to example so it does not run in the fast loop - Add test_finalize_never_raises_if_end_span_raises to cover the now-guarded end_backend_span code path Assisted-by: Claude Code
|
Pushed a follow-up commit (5557734) addressing the self-review findings from the panel:
13 tests passing. Re-requesting review from @ajbozarth, @jakelorocco, @nrfulton. |
ajbozarth
left a comment
There was a problem hiding this comment.
Looks good, a few non-blocking nits from Claude:
…x example error path instrument_generate_from_context was imported but never called by any backend (all backends use start_generate_span); remove function, __all__ entry, stale imports in ollama.py and openai.py, and the corresponding test. Example error path now uses an unreachable base_url (localhost:19999) instead of a bogus model name, which could cause Ollama to attempt a pull rather than fail deterministically. Assisted-by: Claude Code
79769a3
Misc PR
Type of PR
Description
Surfaces data Mellea already holds at span-emission time. No new collection
points, no new span types, no backend changes, no performance-sensitive paths.
gen_ai.provider.namealongside legacygen_ai.systemstart_generate_span,instrument_generate_from_context/rawgen_ai.conversation.idfrom existingsession_idContextVarstart_generate_spanerror.type(Stable OTel) + ERROR status on stream failuresfinalize_backend_span(error-path only) →core/base.pyDeferred: Gap 3 — prompt template attributes. Withdrawn per review (@jakelorocco): wrong layer — belongs at the formatter render path inside
BackendTracingPlugin. Follow-up: epic #444 Phase 2 (depends on #1045).Deferred: model_options / request params. Withdrawn per review (@ajbozarth, @jakelorocco): no backend call site passes
model_optionstoday. Deferred toGENERATION_PRE_CALLplugin hook post-#1045.Deferred: Gap 5 — content capture.
gen_ai.input.messages,gen_ai.output.messages,gen_ai.system_instructionsrequire backend changes; post-#1045 these belong inBackendTracingPlugin. TheMELLEA_TRACE_CONTENTenv-var flag andadd_span_event()helper added here are the infrastructure gap 5 will need.Note:
MELLEA_TRACE_CONTENTwill be renamed toMELLEA_TRACES_CONTENTunder #1046. TheOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTalias is unaffected.Spec: OTel GenAI semconv v1.37.0:
gen_ai.systemdeprecated in favour ofgen_ai.provider.name; both emitted for one release cycle.Files changed:
mellea/telemetry/backend_instrumentation.py—get_provider_name();start_generate_span()extended with conversation id;finalize_backend_span()(error-path);record_token_usage()extended with cache/reasoning tokens.mellea/core/base.py— stream error path usesfinalize_backend_span(error=...).mellea/telemetry/tracing.py—is_content_tracing_enabled(),add_span_event(),_TRACE_CONTENT_ENABLEDflag.test/telemetry/test_genai_semconv_emission.py— 9 pure-unit tests (no live backend or OTel SDK required).docs/examples/telemetry/otel_genai_semconv_example.py— runnable against otelite.Testing
Attribution