feat(telemetry): close five OTel GenAI semantic convention emission gaps (#1035) by planetf1 · Pull Request #1036 · generative-computing/mellea

planetf1 · 2026-05-07T13:59:14Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes OTel omissions/gaps: chat content events, gen_ai.conversation.id, prompt templates, error status #1035 (gaps 1, 2, 4 — gaps 3 and 5 deferred)

Surfaces data Mellea already holds at span-emission time. No new collection
points, no new span types, no backend changes, no performance-sensitive paths.

Gap	Attribute(s)	Where
1. Provider identity	`gen_ai.provider.name` alongside legacy `gen_ai.system`	`start_generate_span`, `instrument_generate_from_context/raw`
2. Conversation identity	`gen_ai.conversation.id` from existing `session_id` ContextVar	`start_generate_span`
4. Error telemetry	`error.type` (Stable OTel) + ERROR status on stream failures	`finalize_backend_span` (error-path only) → `core/base.py`

Deferred: Gap 3 — prompt template attributes. Withdrawn per review (@jakelorocco): wrong layer — belongs at the formatter render path inside BackendTracingPlugin. Follow-up: epic #444 Phase 2 (depends on #1045).

Deferred: model_options / request params. Withdrawn per review (@ajbozarth, @jakelorocco): no backend call site passes model_options today. Deferred to GENERATION_PRE_CALL plugin hook post-#1045.

Deferred: Gap 5 — content capture. gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions require backend changes; post-#1045 these belong in BackendTracingPlugin. The MELLEA_TRACE_CONTENT env-var flag and add_span_event() helper added here are the infrastructure gap 5 will need.

Note: MELLEA_TRACE_CONTENT will be renamed to MELLEA_TRACES_CONTENT under #1046. The OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT alias is unaffected.

Spec: OTel GenAI semconv v1.37.0: gen_ai.system deprecated in favour of gen_ai.provider.name; both emitted for one release cycle.

Files changed:

mellea/telemetry/backend_instrumentation.py — get_provider_name(); start_generate_span() extended with conversation id; finalize_backend_span() (error-path); record_token_usage() extended with cache/reasoning tokens.
mellea/core/base.py — stream error path uses finalize_backend_span(error=...).
mellea/telemetry/tracing.py — is_content_tracing_enabled(), add_span_event(), _TRACE_CONTENT_ENABLED flag.
test/telemetry/test_genai_semconv_emission.py — 9 pure-unit tests (no live backend or OTel SDK required).
docs/examples/telemetry/otel_genai_semconv_example.py — runnable against otelite.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

…lpers Add MELLEA_TRACE_CONTENT env-var gate (also recognises the standard OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT) and expose add_span_event() as a safe no-op wrapper. Both exported from mellea.telemetry and mellea.telemetry.tracing. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…ate attrs Five OTel GenAI semconv gaps closed (issue generative-computing#1035): 1. gen_ai.provider.name emitted alongside legacy gen_ai.system (semconv v1.37.0 migration; keep both for dashboard back-compat). 2. gen_ai.conversation.id mapped from existing session_id ContextVar; the existing mellea.session_id attribute is preserved alongside it. 3. llm.prompt_template.template emitted unconditionally from Instruction and GenerativeStub; llm.prompt_template.variables gated behind MELLEA_TRACE_CONTENT (user data). 4. error.type (Stable OTel) set on the error path in the new finalize_backend_span() helper alongside set_span_error(). finalize_backend_span() replaces the three-line record_token_usage + record_response_metadata + end_backend_span pattern in each backend. 5. gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions emitted as structured JSON (spec v1.37.0 schema) when MELLEA_TRACE_CONTENT is enabled. No deprecated per-role events (gen_ai.user.message etc.) are emitted. A gen_ai.client.inference.operation.details span event is added as a marker for log-oriented receivers. Also adds gen_ai.request.temperature/top_p/top_k/frequency_penalty/ presence_penalty from model_options, and cache/reasoning token attrs in record_token_usage(). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…etry Instruction: capture _template_description (raw string before Jinja substitution) and _user_variables (copy) in __init__; expose via prompt_template_metadata() returning (template, variables, version)|None. GenerativeStub: capture f_kwargs on each call; expose via prompt_template_metadata() using the function docstring as the template and f_kwargs as the variables. Neither change affects runtime behaviour — data is retained for duck-typed use by start_generate_span(). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Replace the duplicated record_token_usage + record_response_metadata + end_backend_span pattern in each backend's post_processing() with a single finalize_backend_span() call that also passes the conversation and output text for content capture. Pass model_options into start_generate_span() so request-parameter attributes (temperature, top_p, etc.) are surfaced on the span. The stream error path in core/base.py is also consolidated through finalize_backend_span(error=...). No behaviour change on the success path; error spans now carry error.type and ERROR status instead of silently closing. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

test/telemetry/test_genai_semconv_emission.py — 20 pure-unit tests covering each of the five gaps (no live backend or OTel SDK required): - gen_ai.provider.name + gen_ai.system dual-emission - gen_ai.conversation.id from session_id ContextVar - llm.prompt_template.* from Instruction (always / gated) - error.type + ERROR status via finalize_backend_span - gen_ai.input/output.messages structured JSON (gated) - no deprecated per-role events emitted - finalize_backend_span robustness (None span, broken span) docs/examples/telemetry/otel_genai_semconv_example.py — runnable example for human verification against otelite, demonstrating all five attributes and the error path. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…t capture (gap 5) Remove finalize_backend_span success-path consolidation and all content capture helpers (_emit_content_attributes, _conversation_to_parts). Revert all five backend files to upstream/main — gap 5 requires touching every backend and is better reviewed in isolation. finalize_backend_span is kept as an error-path-only helper (sets error.type + ERROR status, then closes the span) used by the stream error path in ModelOutputThunk.__aiter__. Full implementation including gap 5 is preserved on cs/issue-1035-full. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

ajbozarth · 2026-05-08T22:34:17Z

Flagging this PR against the refreshed tracing epic #444 (rewritten today) so the two efforts stay aligned.

Compatible to merge as-is. None of the four gaps landed here conflict with Phase 1 of the refresh — you're closing real semconv gaps regardless of whether emission happens in backend_instrumentation.py or in the Phase 1 tracing plugins. The attributes this PR adds (gen_ai.provider.name, gen_ai.conversation.id, llm.prompt_template.*, gen_ai.request.{temperature, top_p, ...}, error.type, cache/reasoning tokens) will be preserved by the Phase 1 plugin migration — see #1045 acceptance criteria, which explicitly require no regression on anything this PR adds.

Two coordination items for after this lands:

Gap 5 (content capture) should rebase onto Phase 1, not land before it. Your deferral note says gap 5 needs to touch post_processing() in every backend — that's exactly the coupling Phase 1 eliminates. After refactor: move tracing onto plugin/hook pattern #1045, conversation/output data is available to plugins via GenerationPostCallPayload and mot.generation without any backend changes. Keeping cs/issue-1035-full on the shelf until Phase 1 merges saves 5 backend files of churn.
MELLEA_TRACE_CONTENT will get renamed under refactor: rename tracing env vars to plural and align with OTel semconv #1046 alongside MELLEA_TRACE_APPLICATION / MELLEA_TRACE_BACKEND → plural MELLEA_TRACES_*. Rationale: OTel's signal-specific endpoint env var is already plural (OTEL_EXPORTER_OTLP_TRACES_ENDPOINT), and MELLEA_METRICS_* is plural, so we're aligning with the OTel standard partner var. One-release deprecation warning on the old MELLEA_TRACE_* names. The OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT alias you added is the OTel standard and stays unchanged.

Nothing here blocks merging. Flagging proactively so the order of operations is intentional.

planetf1 · 2026-05-11T13:41:26Z

Thanks @ajbozarth — hadn't noticed the #1045 angle for gap 5, good to know before I went further down that path. Updated the description with both points.

jakelorocco

I think logging these attributes in our telemetry traces makes sense. I have a few concerns about how it's being done here.

ajbozarth

A few items from Claude, only one big one.

@jakelorocco

Drop gap 3 (prompt-template capture) and the model_options/_REQUEST_PARAM_MAP plumbing in response to review feedback from @jakelorocco and @ajbozarth. Jake's objection to gap 3 is correct: stashing template state on Instruction and GenerativeStub is the wrong layer — it only covers two component types, captures pre-substitution values, and puts telemetry concerns inside domain objects. The right implementation is at the formatter render path, which covers all component types. That work belongs after generative-computing#1045 lands. The model_options/_REQUEST_PARAM_MAP block was dead code: no backend call site passes model_options, and even if wired the values would be pre-substitution. Per Nathan's review, the right call is to drop rather than carry forward a no-op. Request-param emission also belongs in the post-generative-computing#1045 plugin layer where the wire-format dict is visible. What remains in this PR: gap 1 — gen_ai.provider.name alongside legacy gen_ai.system gap 2 — gen_ai.conversation.id from session_id ContextVar gap 4 — error.type + ERROR status via finalize_backend_span cache/reasoning token fields in record_token_usage MELLEA_TRACE_CONTENT flag + add_span_event (infrastructure for future gap 5) Also fix OTel_SERVICE_NAME typo in the example (case-sensitive on Linux) and rewrite the example docstring and README entry to be PR-independent. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Remove the `action.prompt_template_metadata = None` assignment left over from the gap-3 prompt-template work that was withdrawn from this PR. The attribute is never read by `start_generate_span` in the trimmed implementation, making the line misleading. Add three unit tests for `add_span_event` (event forwarded to span, None-span no-op, empty-attributes default) patching `_OTEL_AVAILABLE` since the test environment has no OTel SDK installed. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

planetf1 · 2026-05-13T11:22:49Z

All CI checks are now passing (re-run cleared a Docker image pull flake on the sandbox test — pre-existing flake, unrelated to this PR's changes).

The review comments from @ajbozarth and @jakelorocco have been addressed in the commits already on this branch: gaps 3 and model_options withdrawn per your feedback; gap 5 deferred with the infrastructure in place. Happy to clarify anything further. Requesting re-review when you have a moment — thanks!

- Guard end_backend_span in its own try/except so SDK errors on the streaming error path cannot mask the original backend exception - Wire get_provider_name into all three span-creation functions so internal code uses it (was set but calling get_system_name directly, contradicting the docstring guidance) - Fix span_attrs: dict → dict[str, Any] per project typing conventions - Replace _backend.model_id private-attr mutation in example with start_session(model_id=...) public API - Add qualitative marker to example so it does not run in the fast loop - Add test_finalize_never_raises_if_end_span_raises to cover the now-guarded end_backend_span code path Assisted-by: Claude Code

planetf1 · 2026-05-13T11:48:42Z

Pushed a follow-up commit (5557734) addressing the self-review findings from the panel:

finalize_backend_span: end_backend_span is now wrapped in its own try/except so an SDK error on the streaming error path cannot mask the original backend exception
get_provider_name: wired into all three span-creation functions (was computed but not called; gen_ai.provider.name was being set via system_name instead)
Type annotation: span_attrs: dict → dict[str, Any]
Example: replaced m2._backend.model_id = ... private-attr mutation with start_session(model_id=...) public API; added qualitative marker so it does not run in the default fast loop
New test: test_finalize_never_raises_if_end_span_raises covers the now-guarded end_backend_span path

13 tests passing. Re-requesting review from @ajbozarth, @jakelorocco, @nrfulton.

ajbozarth

Looks good, a few non-blocking nits from Claude:

…x example error path instrument_generate_from_context was imported but never called by any backend (all backends use start_generate_span); remove function, __all__ entry, stale imports in ollama.py and openai.py, and the corresponding test. Example error path now uses an unreachable base_url (localhost:19999) instead of a bogus model name, which could cause Ollama to attempt a pull rather than fail deterministically. Assisted-by: Claude Code

planetf1 added 5 commits May 7, 2026 14:55

github-actions Bot added the enhancement New feature or request label May 7, 2026

planetf1 marked this pull request as ready for review May 11, 2026 13:41

planetf1 requested review from a team, jakelorocco and nrfulton as code owners May 11, 2026 13:41

jakelorocco reviewed May 11, 2026

View reviewed changes

ajbozarth requested changes May 11, 2026

View reviewed changes

Comment thread mellea/telemetry/backend_instrumentation.py Outdated

Comment thread docs/examples/telemetry/otel_genai_semconv_example.py Outdated

Comment thread mellea/stdlib/components/genstub.py Outdated

Comment thread mellea/stdlib/components/instruction.py Outdated

This was referenced May 13, 2026

feat(telemetry): emit llm.prompt_template.* from tracing plugin (post-#1045) #1067

Open

feat(telemetry): emit gen_ai.request.* from tracing plugin (post-#1045) #1068

Open

planetf1 requested review from ajbozarth and jakelorocco May 13, 2026 07:49

jakelorocco approved these changes May 13, 2026

View reviewed changes

ajbozarth approved these changes May 13, 2026

View reviewed changes

Comment thread mellea/telemetry/backend_instrumentation.py Outdated

Comment thread mellea/telemetry/backend_instrumentation.py

Comment thread docs/examples/telemetry/otel_genai_semconv_example.py Outdated

planetf1 mentioned this pull request May 13, 2026

feat(telemetry): emit gen_ai.input/output.messages content events from tracing plugin (post-#1045) #1073

Open

ajbozarth mentioned this pull request May 13, 2026

test: backend coverage parity audit post-refactor #1054

Open

3 tasks

planetf1 mentioned this pull request May 13, 2026

feat(telemetry): normalise cache and reasoning token fields in GenerationMetadata across providers #1074

Open

planetf1 enabled auto-merge May 13, 2026 17:24

planetf1 added this pull request to the merge queue May 13, 2026

Merged via the queue into generative-computing:main with commit 79769a3 May 13, 2026
8 checks passed

planetf1 deleted the cs/issue-1035 branch May 13, 2026 18:48

Conversation

planetf1 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Attribution

Uh oh!

ajbozarth commented May 8, 2026

Uh oh!

planetf1 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 commented May 13, 2026

Uh oh!

planetf1 commented May 13, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

planetf1 commented May 7, 2026 •

edited

Loading

planetf1 commented May 11, 2026 •

edited

Loading