LCORE-1591: Observability for Lightspeed Core by asimurka · Pull Request #1482 · lightspeed-core/lightspeed-stack

asimurka · 2026-04-10T07:18:51Z

Description

Adds the feature architecture document for OpenTelemetry tracing in Lightspeed Core. Feature Specification only; no code changes.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue # LCORE-322
Closes # LCORE-1591

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Documentation
- Added a comprehensive OpenTelemetry tracing design: global enablement and mandatory OTLP export config when enabled, inbound/outbound W3C trace-context propagation toggles, recommended instrumentation scope and span naming/attributes, lifecycle and resilience behavior (initialize/flush/gated operation), config precedence (hybrid/env-first guidance), deployment/packaging notes, privacy constraints, and an open question on library-mode limits.

coderabbitai · 2026-04-10T07:19:05Z

Walkthrough

Adds a new OpenTelemetry tracing design spec for Lightspeed Core: YAML-driven global enablement with required OTLP exporter config, independent inbound/outbound W3C trace-context propagation toggles, FastAPI auto-instrumentation plus manual spans, lifecycle/resilience rules, deployment and env var guidance, and library-mode limitations.

Changes

Cohort / File(s)	Summary
OpenTelemetry Design Documentation `docs/design/observability-opentelemetry/observability-opentelemetry.md`	New design spec: defines OTel terminology and LCORE baseline; requires YAML flag to enable tracing and mandatory OTLP exporter fields (endpoint, protocol, service) when enabled; supports env var overrides (`OTEL_*`) with precedence rules; inbound/outbound W3C propagation toggles; FastAPI auto-instrumentation + recommended manual spans (LLM, RAG, tools, moderation, conversations, streaming); lifecycle (init on startup, flush on shutdown) and resilience (tracing failures must not affect request handling); config architecture variants (env-first, hybrid recommended); recommends injecting trace context to Llama Stack in service-mode via `httpx` request hook and documents library-mode limitations; span naming/attribute/event conventions, Prometheus metrics unchanged, deployment/docker-compose and dependency notes, `OTEL_SDK_DISABLED=true` as process kill switch, and open question about library-mode tracing limits.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant LCORE as Lightspeed Core (FastAPI)
    participant Llama as Llama Stack / External Service
    participant OTLP as OTLP Collector/Exporter

    Client->>LCORE: HTTP request (incoming)
    activate LCORE
    LCORE->>LCORE: Extract inbound W3C trace-context (if enabled)
    LCORE->>LCORE: FastAPI auto-instrumentation creates server span
    LCORE->>Llama: Outbound HTTP call (inject trace-context via httpx hook if enabled)
    Llama-->>LCORE: Response
    LCORE->>OTLP: Export spans to OTLP endpoint (OTLP exporter)
    deactivate LCORE

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly references the ticket identifier (LCORE-1591) and clearly communicates the main change: adding observability features for Lightspeed Core via OpenTelemetry tracing.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`:
- Around line 193-243: Fix markdown lint failures MD022 and MD031 by ensuring
there's exactly one blank line before and after each heading and that fenced
code blocks are preceded and followed by a blank line; specifically adjust
spacing around the "Configuration model:", "Pros:", "Cons:", "Important
constraints:", and the "Option 2 — Manual SDK initialization with YAML-driven
hybrid configuration" heading blocks in the observability-opentelemetry.md
content (lines shown in the review) so headings have a single blank line
separation and add/remove blank lines around the fenced code block at line ~297
to ensure one blank line before and after the triple-backtick fence.
- Line 107: Add a new "Verification / Test Plan" section in
observability-opentelemetry.md (to satisfy R10) that contains explicit
step-by-step validation checks and expected results: include subsections for
"Startup validation" (checks that tracing initializes, config/env var checks,
sample commands and expected logs), "Inbound continuation" (send an incoming
request and verify trace IDs are continued with expected span relationships),
"Outbound propagation" (make an outbound call and verify propagation headers and
downstream spans), and "Exporter failure behavior" (simulate exporter downtime
and verify retry/drop behavior and observable error logs); for each subsection
list exact commands or curl examples, environment variables to set, acceptance
criteria, and sample evidence (log snippets or trace UI screenshots).
- Line 298: Fix the typos in the documentation text: change "initializaion" to
"initialization" in the section heading "Enrichment of Llama Stack server-mode
client initializaion" and change "primarly" to "primarily" where it appears
(line referenced as 582). Update those two words in the markdown content so the
normative spec uses correct spelling.
- Around line 56-57: Change the Setext-style line so it is treated as a normal
paragraph: replace the standalone line "When tracing is disabled, no spans are
created and no tracing-related processing is performed." so it is followed by a
blank line before the thematic break (the sequence of hyphens) or remove the
hyphen-only line immediately after; ensure the hyphen separator remains a proper
horizontal rule by having an empty line above it so markdownlint MD003 no longer
interprets that sentence as a Setext heading.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 96b8e5e7-b660-45b6-8800-46ea7d3f5849

📥 Commits

Reviewing files that changed from the base of the PR and between 07ca1b1 and 67f9919.

📒 Files selected for processing (1)

docs/design/observability-opentelemetry/observability-opentelemetry.md

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: list_outdated_dependencies
GitHub Check: radon
GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: integration_tests (3.12)
GitHub Check: integration_tests (3.13)
GitHub Check: build-pr
GitHub Check: mypy
GitHub Check: bandit
GitHub Check: Pyright
GitHub Check: Pylinter
GitHub Check: E2E: server mode / ci
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

🪛 LanguageTool

docs/design/observability-opentelemetry/observability-opentelemetry.md

[grammar] ~298-~298: Ensure spelling is correct
Context: ...hment of Llama Stack server-mode client initializaion: python client_kwargs: dict[str, Any] = { "base_url": base_url, "api_key": api_key, "timeout": config.timeout, } if distributed_tracing_to_llama_enabled() and base_url is not None: client_kwargs["http_client"] = llama_stack_httpx_async_client( base_url=base_url, timeout=config.timeout, ) self._lsc = AsyncLlamaStackClient(**client_kwargs) Library mode: This wiring applies only...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.22.0)

docs/design/observability-opentelemetry/observability-opentelemetry.md

[warning] 56-56: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)

[warning] 193-193: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 198-198: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 204-204: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 209-209: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 223-223: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 231-231: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 237-237: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 243-243: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 297-297: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

docs/design/observability-opentelemetry/observability-opentelemetry.md (2)

194-244: ⚠️ Potential issue | 🟡 Minor

Fix markdownlint blockers (MD022/MD031) in heading/fence spacing.

Line 194/199/205/210/224/232/238/244 need blank lines around headings, and Line 298 needs a blank line after the fenced code block before prose on Line 299.

Minimal formatting fix pattern

 ### Configuration model:
+
 - All tracing configuration is provided via `OTEL_*` environment variables
 ...
-```
-Enrichment of Llama Stack server-mode client initialization:
+```
+
+Enrichment of Llama Stack server-mode client initialization:

Also applies to: 298-299

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`
around lines 194 - 244, The markdown has lint errors MD022/MD031 due to missing
blank lines around several headings and a missing blank line after a fenced code
block; add a single blank line before and after each top-level and subheading
such as "Configuration model:", "Pros:", "Cons:", "Important constraints:",
"Option 2 — Manual SDK initialization with YAML-driven hybrid configuration" and
the other listed headings, and ensure there is an empty line after any fenced
code block before the following paragraph (e.g., the fenced block that precedes
"Enrichment of Llama Stack server-mode client initialization:"); this will
satisfy markdownlint spacing rules without changing heading text or content.

107-108: ⚠️ Potential issue | 🟠 Major

Add an explicit verification/test plan section to satisfy R10.

The requirement on Line 108 says docs must describe verification, but there is no concrete, stepwise validation procedure with expected evidence.

Suggested structure to add

+## Verification / Test Plan
+
+### 1) Startup validation
+- Configure `observability.otel.enabled=true` with valid `export.endpoint/protocol/service`.
+- Expected: app starts, tracer initializes, no config validation errors.
+
+### 2) Missing mandatory export fields
+- Remove one mandatory field (for example `service`).
+- Expected: startup fails with validation error.
+
+### 3) Inbound continuation
+- Send request with `traceparent`.
+- Expected: server span continues same trace ID.
+
+### 4) Outbound propagation (service mode)
+- Enable `propagation.outgoing`.
+- Expected: downstream call carries W3C trace headers and links in same trace.
+
+### 5) Exporter failure behavior
+- Point endpoint to unreachable collector.
+- Expected: user-facing responses unaffected; tracing errors observable in logs.
+
+### 6) Sensitive data check
+- Inspect span attributes/events.
+- Expected: no raw prompts/retrieved content.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`
around lines 107 - 108, Add a new "Verification / Test Plan" section to the
observability-opentelemetry.md doc under the existing "R10 – Documentation"
heading that provides a concise, stepwise validation procedure: list required
and optional env vars to set (e.g., OTEL_SERVICE_NAME,
OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_METRICS_EXPORTER), commands to run the service
and otel-collector/test backends, exact actions to generate traces (e.g., sample
HTTP requests or workflows calling instrumented functions), how to confirm
traces/spans appear in the selected backend (Jaeger/OTLP/console) and sample
log/trace evidence to look for, and pass/fail criteria (e.g., span counts,
presence of service.name, correct attributes); include minimal example
verification commands and expected outputs so reviewers can follow and mark R10
satisfied.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`:
- Around line 178-180: The document currently gives conflicting normative
guidance: Step 1 endorses Option 3 (hybrid YAML+env) while later text (around
"Line 250") favors env-driven auto-instrumentation (Option 2) causing ambiguity;
pick one normative path and update the doc so Step 1, the summary paragraph (the
sentence mentioning Option 1/2/3), and the later lines referencing env-driven
auto-instrumentation consistently state the chosen approach (e.g., if keeping
Option 3, change the "env-driven auto-instrumentation is generally preferred"
sentence to explain that env-vars are supported but the canonical approach is
configuration-driven via LCORE and requires explicit tracing fields; if choosing
Option 2, change Step 1 and the summary to promote env-driven
auto-instrumentation and note when hybrid YAML+env is acceptable), and ensure
references to "opentelemetry-instrument", "LCORE", "Option 2", and "Option 3"
are updated for consistency.
- Around line 175-176: The doc contains contradictory guidance about missing
mandatory OTEL fields: one place says startup should fail when enabled:true is
missing endpoint/protocol/service name, another suggests the app may instead
“refuse tracing”; pick one normative behavior (e.g., "startup MUST fail" or
"startup MAY continue but disable tracing") and make both occurrences consistent
by replacing the ambiguous phrase with the chosen policy; update the paragraphs
that mention "If `enabled: true`, YAML must contain mandatory endpoint,
protocol, and service name; startup should fail if missing" and the later "or
refuse tracing" sentence so they both state the exact chosen action, and ensure
any precedence wording about YAML vs OTEL_* env vars remains consistent with
that decision.

---

Duplicate comments:
In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`:
- Around line 194-244: The markdown has lint errors MD022/MD031 due to missing
blank lines around several headings and a missing blank line after a fenced code
block; add a single blank line before and after each top-level and subheading
such as "Configuration model:", "Pros:", "Cons:", "Important constraints:",
"Option 2 — Manual SDK initialization with YAML-driven hybrid configuration" and
the other listed headings, and ensure there is an empty line after any fenced
code block before the following paragraph (e.g., the fenced block that precedes
"Enrichment of Llama Stack server-mode client initialization:"); this will
satisfy markdownlint spacing rules without changing heading text or content.
- Around line 107-108: Add a new "Verification / Test Plan" section to the
observability-opentelemetry.md doc under the existing "R10 – Documentation"
heading that provides a concise, stepwise validation procedure: list required
and optional env vars to set (e.g., OTEL_SERVICE_NAME,
OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_METRICS_EXPORTER), commands to run the service
and otel-collector/test backends, exact actions to generate traces (e.g., sample
HTTP requests or workflows calling instrumented functions), how to confirm
traces/spans appear in the selected backend (Jaeger/OTLP/console) and sample
log/trace evidence to look for, and pass/fail criteria (e.g., span counts,
presence of service.name, correct attributes); include minimal example
verification commands and expected outputs so reviewers can follow and mark R10
satisfied.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a79fc6b4-a3ec-477e-b931-22b2a9fabdad

📥 Commits

Reviewing files that changed from the base of the PR and between 67f9919 and e0e6b4f.

📒 Files selected for processing (1)

docs/design/observability-opentelemetry/observability-opentelemetry.md

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: build-pr
GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / ci
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

🪛 markdownlint-cli2 (0.22.0)

docs/design/observability-opentelemetry/observability-opentelemetry.md

[warning] 194-194: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 199-199: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 205-205: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 210-210: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 224-224: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 232-232: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 238-238: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 244-244: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 298-298: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

tisnik

It looks ok!

Q. for discussion: what happen if we won't include any Llama Stack traces? IMHO not much harm will be done as the only relevant + visible part is LCORE. In Llama Stack, the flow can change in future, so tracing will be a bit unstable from (for example) point of view of engineer, who wants to improve performance.

Context: almost all LS calls are pretty low level, so even LCORE traces might have enough granularity.

But I'm open for discussion.

tisnik · 2026-04-10T07:50:22Z

+- When both YAML and env vars define the same concern, **YAML mandatory fields take precedence** for endpoint, protocol, and service name; `OTEL_*` variables control optional advanced settings (sampling, headers, TLS files, instrumentor-only options).  
+
+**Recommendation:** Option 1 is rejected for maintainability. Option 2 remains viable if LCORE is run with `opentelemetry-instrument`. Option 3 leverages LCORE’s configuration-driven design to ensure mandatory fields are always explicit when tracing is enabled.  
+


option #2 seems promising.

Q - will those options be visible in config endpoint? If not, how we help admins with identifying tracing-related problems?

Q. for the future - how to handle configuration changes in this situation?

No, these options won't be visible in config endpoint by default. We can possibly implement some "injection" into v1/config response model if desired. Problem is that there is a lot of configurable options for OTEL (see this link) but it is possible to collect all environment variables with OTEL_ prefix and inject them into config.

To the best of my knowledge, OTEL_* environment variables are collected during process startup. As a result, any change to these variables requires the service process to be restarted/redeployed to apply changes.

@coderabbitai create issue

🧩 Analysis chain

@tisnik, I've created a GitHub issue to track the exposure of OTEL_* environment variables in the v1/config endpoint. The issue covers:

The problem: OTEL_* settings are not visible via v1/config, making it hard for admins to diagnose tracing configuration

The proposed solution: collect all OTEL_-prefixed env vars and inject them into the config response

A note that changes to these variables require a process restart

I'll share the issue link once the script completes.

🏁 Script executed:

#!/bin/bash gh issue create \ --repo lightspeed-core/lightspeed-stack \ --title "Expose OTEL_* environment variables in the v1/config endpoint" \ --body "## Summary When OpenTelemetry tracing is configured via \`OTEL_*\` environment variables (Option 2 or Option 3 from the observability architecture), these settings are currently not visible through the LCORE \`v1/config\` endpoint. This makes it difficult for administrators to: - Verify which tracing configuration is active - Diagnose tracing-related problems - Understand the effective OpenTelemetry configuration at runtime ## Proposed Solution Collect all environment variables with the \`OTEL_\` prefix and inject them into the \`v1/config\` response model, providing administrators visibility into the active tracing configuration alongside other LCORE settings. Note: As \`OTEL_*\` environment variables are read at process startup, any configuration changes require a service restart/redeployment to take effect. This should be clearly communicated in the config endpoint response or documentation. ## Related - PR: https://github.com/lightspeed-core/lightspeed-stack/pull/1482 - Discussion: https://github.com/lightspeed-core/lightspeed-stack/pull/1482#discussion_r3062907258 - Requested by: `@tisnik`" \ --assignee tisnik

Length of output: 63

@tisnik, the issue has been created and assigned to you: #1499

It tracks exposing OTEL_* environment variables in the v1/config endpoint to help administrators verify and diagnose tracing configuration.

tisnik · 2026-04-10T07:52:33Z

+
+**Recommendation:**
+- Continue using the `/metrics` endpoint for all operational metrics.
+- Expand the set of Prometheus metrics as needed to cover additional components (e.g., LLM calls, RAG retrieval, tool execution) as product needs evolve.


yes, it would be very valuable

tisnik · 2026-04-10T07:53:22Z

+**Recommendation:**
+- Continue using the `/metrics` endpoint for all operational metrics.
+- Expand the set of Prometheus metrics as needed to cover additional components (e.g., LLM calls, RAG retrieval, tool execution) as product needs evolve.
+- Ensure metrics align with existing naming conventions and maintain low cardinality to avoid high-memory cost in Prometheus servers.


is it a real problem now? Or just warning/note for the future. Because now (IIRC) the cardinality selectors are just provider+model...

I think it isn't a real problem rn.

asimurka · 2026-04-10T08:40:05Z

Q. for discussion: what happen if we won't include any Llama Stack traces?

I have an impression that the general expectation in distributed systems is that when you have multi-service integration, spans from downstream services should be part of the same trace to preserve end-to-end observability. From what I’ve seen, Llama Stack already provides fairly rich and detailed tracing, and it would be valuable to leverage that (at least in server mode) rather than losing that level of visibility.

Even with LLS spans, LCORE traces would remain complete and consistent on their own, but LLS spans would provide optional deeper insight into request-level behavior (especially latency breakdowns for more complex calls).

asimurka · 2026-04-10T08:43:43Z

almost all LS calls are pretty low level

Tbh, I don't think so. From what I can see in Jaeger, Llama Stack traces are pretty granular, with a lot of internal spans exposed per request.

tisnik · 2026-04-10T08:59:47Z

almost all LS calls are pretty low level

Tbh, I don't think so. From what I can see in Jaeger, Llama Stack traces are pretty granular, with a lot of internal spans exposed per request.

I agree, but my point is who will benefit from it. Admin/devops who look for perf. problems wont be probably interested in LS internals. More interesting is (IMHO) "is the RAG what caused the slowdown" or "which MCP call is the most time expensive?" etc. OTOH we migth be interested knowing tracing about LS providers, that would be good, indeed.

asimurka · 2026-04-10T09:05:33Z

More interesting is (IMHO) "is the RAG what caused the slowdown" or "which MCP call is the most time expensive?" etc. OTOH we migth be interested knowing tracing about LS providers, that would be good, indeed.

Yeah, I agree but exactly for this we need LLS traces, right? Otherwise this would be completely hidden under a single LCORE responses request span.

jrobertboos · 2026-04-10T12:50:05Z

@asimurka just wondering why the spike document is not included in this PR?

asimurka · 2026-04-13T06:23:25Z

@asimurka just wondering why the spike document is not included in this PR?

The document I propose IMO contains everything necessary for scoping the feature, including decision points.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

docs/design/observability-opentelemetry/observability-opentelemetry.md (2)
107-108: ⚠️ Potential issue | 🟠 Major

R10 verification requirement remains unfulfilled.

Requirement R10 mandates that the feature "shall include documentation describing how to... verify correct behavior," but the document contains no verification section, test plan, or acceptance criteria. This was flagged in a previous review and remains unresolved.

Without explicit verification procedures, implementation teams lack guidance on:

How to confirm tracing is correctly initialized

How to validate trace context propagation (inbound/outbound)

How to verify exporter connectivity and span delivery

How to test resilience when exporters fail

What observable evidence confirms correct behavior
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`
around lines 107 - 108, Add a new "Verification / Test Plan" section under R10
that provides concrete acceptance criteria and step-by-step checks: include a
tracing initialization smoke test (start app, assert tracer provider registered
and first span emitted), trace context propagation tests for inbound/outbound
requests (inject/extract W3C traceparent and assert span linkage), exporter
connectivity checks (configure OTLP/console exporters, confirm spans are
received by a collector or logs), resiliency tests (simulate exporter failure
and verify graceful degradation, buffer/drop behavior and error logs), and a
list of observable evidence (examples of expected span names, resource
attributes, log entries, and metrics) plus environment-variable toggle
instructions and commands to run automated e2e/unit tests that validate each
item; reference R10 and the observability-opentelemetry.md verification section
when implementing.
535-546: ⚠️ Potential issue | 🟠 Major

Resolve contradiction between architecture recommendation and deployment guidance.

Lines 178-179 recommend Option 3 (hybrid YAML+env with manual SDK initialization), but lines 541-544 show deployment files using opentelemetry-instrument in the ENTRYPOINT, which implements Option 1 (auto-instrumentation with env-only config). These approaches are mutually exclusive:

Option 1 (opentelemetry-instrument): SDK initializes before application code runs; configuration comes entirely from OTEL_* environment variables; manual SDK initialization is not possible.

Option 3 (manual SDK init): SDK is initialized explicitly in application code (e.g., FastAPI lifespan) after YAML is loaded; using opentelemetry-instrument would conflict with manual initialization.

Impact: Implementation teams will be confused about which approach to follow, potentially leading to broken deployments or wasted integration effort.
📋 Suggested resolution

Option A: If choosing Option 3 (hybrid, manual SDK init):

Update Step 11 to remove opentelemetry-instrument from deployment files:
-**`Containerfile` (LCORE image)** — change the final **`ENTRYPOINT`** so the wrapper runs first, for example:  
-`ENTRYPOINT ["opentelemetry-instrument", "python3.12", "src/lightspeed_stack.py"]`  
-(instead of invoking `python3.12` alone).
+**`Containerfile` (LCORE image)** — no changes required; tracing is initialized via manual SDK setup in application lifespan.
-**`scripts/llama-stack-entrypoint.sh` (Llama Stack image)** — the shell entrypoint should become **`exec opentelemetry-instrument llama stack run …`**.
+**`scripts/llama-stack-entrypoint.sh` (Llama Stack image)** — prefix with **`opentelemetry-instrument`** if Llama Stack uses auto-instrumentation (LCORE does not).
Option B: If choosing Option 1 (auto-instrumentation):

Update Step 1 recommendation at line 178 to endorse Option 2 (env-first) and clarify that LCORE YAML only provides propagation flags, not export configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`
around lines 535 - 546, The docs currently recommend Option 3 (manual SDK
initialization) but the deployment examples use opentelemetry-instrument in
ENTRYPOINT (auto-instrumentation), which conflicts; resolve by making the chosen
approach consistent: either remove opentelemetry-instrument from the
Containerfile ENTRYPOINT and llama-stack-entrypoint.sh and document that the
application will call manual SDK init (retain references to manual init,
YAML+env hybrid and functions like FastAPI lifespan), or change the architecture
recommendation to Option 1/2 and update the prose to state that deployments use
opentelemetry-instrument in ENTRYPOINT with configuration purely via OTEL_* env
vars (and that manual SDK init must not be used). Ensure all mentions of
ENTRYPOINT, opentelemetry-instrument, OTEL_* env vars, and "manual SDK
initialization"/"Option 3" are updated to the chosen option.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`:
- Around line 519-520: Clarify that the documented global kill switch
OTEL_SDK_DISABLED only applies automatically to auto-instrumentation and must be
manually honored in the hybrid/manual SDK initialization path (Option 3); update
the doc to instruct developers using Option 3 to read
process.env.OTEL_SDK_DISABLED (or equivalent env access) at startup and skip
initializing the OpenTelemetry SDK/tracer provider/exporters when it is truthy
so the kill switch behaves as documented.
- Line 412: Clarify that OTEL_PYTHON_FASTAPI_EXCLUDED_URLS only takes effect
when FastAPI auto-instrumentation is in use (e.g., via opentelemetry-instrument
with opentelemetry-instrumentation-fastapi); for Option 3 (hybrid with manual
SDK initialization) explicitly state that you must either enable the FastAPI
auto-instrumentation or register the FastAPI instrumentation manually (e.g.,
FastAPIInstrumentor or equivalent configuration in your manual SDK bootstrap)
for OTEL_PYTHON_FASTAPI_EXCLUDED_URLS to be honored, and add a short example
note describing enabling FastAPI instrumentation in manual setup.
- Around line 600-603: Clarify the contradictory guidance about library mode
tracing: update the Open questions section to match the definitive statement
earlier by explicitly declaring AsyncLlamaStackAsLibraryClient (library mode) as
unsupported for unified tracing and documenting the technical rationale
(LCORE/LLS in-process context mismatch) and operational implications; or, if
undecided, replace the single-question bullet with a short decision-framework
listing the three options (document as unsupported, invest in in-process context
propagation requiring LCORE+LLS contract changes, or recommend service mode for
production), include decision criteria (effort, user demand, operational
complexity) and a decision deadline so the specification is unambiguous for
implementers.
- Around line 504-513: Add a Pydantic validator on the
OpenTelemetryConfiguration model to enforce that when enabled is True the export
attribute (type OpenTelemetryExportConfiguration) is not None and that its
required fields (endpoint, protocol, service_name) are present and non-empty;
implement this as a `@root_validator` (or class validator) on
OpenTelemetryConfiguration, check values.get("enabled") and validate
values.get("export") and its properties, and raise a ValueError with a clear
message if the export block or any required subfield is missing so configuration
loading fails early.

---

Duplicate comments:
In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`:
- Around line 107-108: Add a new "Verification / Test Plan" section under R10
that provides concrete acceptance criteria and step-by-step checks: include a
tracing initialization smoke test (start app, assert tracer provider registered
and first span emitted), trace context propagation tests for inbound/outbound
requests (inject/extract W3C traceparent and assert span linkage), exporter
connectivity checks (configure OTLP/console exporters, confirm spans are
received by a collector or logs), resiliency tests (simulate exporter failure
and verify graceful degradation, buffer/drop behavior and error logs), and a
list of observable evidence (examples of expected span names, resource
attributes, log entries, and metrics) plus environment-variable toggle
instructions and commands to run automated e2e/unit tests that validate each
item; reference R10 and the observability-opentelemetry.md verification section
when implementing.
- Around line 535-546: The docs currently recommend Option 3 (manual SDK
initialization) but the deployment examples use opentelemetry-instrument in
ENTRYPOINT (auto-instrumentation), which conflicts; resolve by making the chosen
approach consistent: either remove opentelemetry-instrument from the
Containerfile ENTRYPOINT and llama-stack-entrypoint.sh and document that the
application will call manual SDK init (retain references to manual init,
YAML+env hybrid and functions like FastAPI lifespan), or change the architecture
recommendation to Option 1/2 and update the prose to state that deployments use
opentelemetry-instrument in ENTRYPOINT with configuration purely via OTEL_* env
vars (and that manual SDK init must not be used). Ensure all mentions of
ENTRYPOINT, opentelemetry-instrument, OTEL_* env vars, and "manual SDK
initialization"/"Option 3" are updated to the chosen option.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1dd68690-43fc-4a02-82c2-f68b723194fe

📥 Commits

Reviewing files that changed from the base of the PR and between e0e6b4f and bc1df2f.

📒 Files selected for processing (1)

docs/design/observability-opentelemetry/observability-opentelemetry.md

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: unit_tests (3.13)
GitHub Check: Pylinter
GitHub Check: unit_tests (3.12)
GitHub Check: build-pr
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci

🧰 Additional context used

🪛 markdownlint-cli2 (0.22.0)

docs/design/observability-opentelemetry/observability-opentelemetry.md

[warning] 194-194: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 199-199: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 205-205: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 210-210: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 224-224: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 232-232: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 238-238: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 244-244: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 298-298: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

coderabbitai

♻️ Duplicate comments (8)

docs/design/observability-opentelemetry/observability-opentelemetry.md (8)

250-251: ⚠️ Potential issue | 🟠 Major

Resolve normative recommendation conflict between Step 1 and Step 2.

Step 1 recommends Option 3 (hybrid), but this paragraph says env-driven auto-instrumentation is generally preferred. Pick one normative implementation path for LCORE.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`
around lines 250 - 251, Choose a single normative implementation for LCORE and
update the text to remove the conflict between Step 1 (Option 3 / hybrid) and
the recommendation favoring environment-variable-driven auto-instrumentation:
either adopt Option 3 (hybrid) as the authoritative path and explain how
env-driven auto-instrumentation is secondary/optional, or declare env-driven
auto-instrumentation as the primary approach and update Step 1 to match; ensure
the doc references the same canonical choice consistently and keep LCORE YAML
propagation flags (`incoming` / `outgoing`) behavior described as compatible
with that single chosen model.

599-602: ⚠️ Potential issue | 🟠 Major

Make library-mode status consistent (decision vs open question).

Earlier sections state library mode is not covered, but this section frames it as undecided. Either declare it as a known limitation or define a decision framework with deadline.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`
around lines 599 - 602, The document currently treats library-mode tracing
status inconsistently; resolve this by making a single authoritative statement:
update the Open questions section to either declare
AsyncLlamaStackAsLibraryClient (library mode) as a known unsupported limitation
for unified tracing (mention LCORE and LLS will not join traces) or state a
concrete decision plan with a deadline and owner to investigate in-process
context alignment (LCORE + LLS contract); ensure references to
AsyncLlamaStackAsLibraryClient, LCORE, and LLS are used so readers can find the
relevant sections and remove the contradictory language elsewhere so
library-mode status is consistent across the doc.

454-455: ⚠️ Potential issue | 🟠 Major

Document manual handling of OTEL_SDK_DISABLED for hybrid/manual init.

In manual SDK initialization paths, this kill switch must be explicitly checked before tracer/provider setup; otherwise behavior won’t match this requirement.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`
around lines 454 - 455, Update the doc to state that in manual/hybrid SDK
initialization paths the OTEL_SDK_DISABLED env var must be explicitly checked
before performing any tracer/provider setup; call out that implementations of
initOpenTelemetry(), setupTracerProvider(), configureTracer(), or any manual
tracer initialization entry points must early-return when
process.env.OTEL_SDK_DISABLED (or equivalent config) is truthy so no providers,
span processors, or exporters are created, and include a short example checklist
of where to add the check (entry/init functions) and the expected behavior when
the switch is set.

298-298: ⚠️ Potential issue | 🟡 Minor

Add blank lines around fenced code block (MD031).

The fence boundary around this snippet needs surrounding blank lines to pass markdownlint.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md` at
line 298, The fenced code block in the documentation is missing blank lines
around its opening/closing triple-backtick markers; update the markdown so there
is an empty line immediately before the opening ``` and an empty line
immediately after the closing ```, thereby satisfying MD031 and ensuring the
fenced code block has surrounding blank lines.

108-108: ⚠️ Potential issue | 🟠 Major

Add explicit verification/test-plan steps to satisfy R10.

R10 requires “how to verify correct behavior,” but there is no concrete test procedure with commands, expected outcomes, and evidence criteria.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md` at
line 108, Add a concrete verification/test-plan section to
observability-opentelemetry.md that satisfies R10 by listing step-by-step
commands to run (e.g., start the service with the documented env vars, run a
sample request via curl or a test script, and push spans via OTLP/collector),
the exact expected outcomes (span names, attributes, sampling decisions, and
HTTP status codes), and the evidence criteria to collect (trace IDs, screenshots
of Jaeger/UI queries, presence of spans in OTLP/collector logs or storage, and
matching attribute values). Reference the R10 requirement explicitly and include
example commands for starting the collector and service, running a test request,
querying the tracing backend (Jaeger/Tempo/OTLP endpoint), and a pass/fail
checklist for each verification step so reviewers can reproduce and validate
tracing behavior.

194-244: ⚠️ Potential issue | 🟡 Minor

Fix MD022 heading spacing in this section.

### Configuration model:, ### Pros:, ### Cons:, and ### Important constraints: are missing required blank-line separation and will fail markdown lint checks.

Suggested doc fix

 The application does not explicitly configure the SDK; instead, it relies on OpenTelemetry’s default initialization behavior.
 
 ### Configuration model:
+
 - All tracing configuration is provided via `OTEL_*` environment variables
 - YAML does not participate in SDK initialization
 - No runtime configuration merging occurs
 
 ### Pros:
+
 - No application-level tracing setup required
 ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`
around lines 194 - 244, Add a blank line before each second-level heading
instances "### Configuration model:", "### Pros:", "### Cons:", and "###
Important constraints:" so they are separated from preceding paragraphs (fixing
MD022); update the section where Option 2 begins to ensure each of those
headings has a single empty line above it in the markdown so the linter no
longer reports missing heading spacing.

557-557: ⚠️ Potential issue | 🟠 Major

Remove “fail fast or refuse tracing” ambiguity.

This line reintroduces two mutually exclusive startup behaviors for missing mandatory YAML export fields; specify one required behavior.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md` at
line 557, Replace the ambiguous phrase so the doc mandates a single behavior:
when LCORE tracing is set to enabled: true and any mandatory YAML export fields
are missing or invalid, the system must fail fast and refuse to start (i.e.,
abort startup with a clear error and log), rather than allowing tracing to
continue; update the sentence referencing LCORE, enabled: true, and export
fields to state this explicit fail-fast requirement and remove the alternative
"refuse tracing" wording that implies a choice.

412-412: ⚠️ Potential issue | 🟡 Minor

Clarify OTEL_PYTHON_FASTAPI_EXCLUDED_URLS dependency on FastAPI instrumentation.

The text currently implies it always applies, but it only works when FastAPI instrumentation is active.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design/observability-opentelemetry/observability-opentelemetry.md` at
line 412, Update the note to clarify that OTEL_PYTHON_FASTAPI_EXCLUDED_URLS only
takes effect when FastAPI instrumentation is enabled: change the sentence
mentioning OTEL_PYTHON_FASTAPI_EXCLUDED_URLS to explicitly state its dependency
on active FastAPI instrumentation (e.g., "These can be filtered via
OTEL_PYTHON_FASTAPI_EXCLUDED_URLS when FastAPI instrumentation is enabled") so
readers understand the env var has no effect without the FastAPI instrumentation
being active.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/design/observability-opentelemetry/observability-opentelemetry.md`:
- Around line 250-251: Choose a single normative implementation for LCORE and
update the text to remove the conflict between Step 1 (Option 3 / hybrid) and
the recommendation favoring environment-variable-driven auto-instrumentation:
either adopt Option 3 (hybrid) as the authoritative path and explain how
env-driven auto-instrumentation is secondary/optional, or declare env-driven
auto-instrumentation as the primary approach and update Step 1 to match; ensure
the doc references the same canonical choice consistently and keep LCORE YAML
propagation flags (`incoming` / `outgoing`) behavior described as compatible
with that single chosen model.
- Around line 599-602: The document currently treats library-mode tracing status
inconsistently; resolve this by making a single authoritative statement: update
the Open questions section to either declare AsyncLlamaStackAsLibraryClient
(library mode) as a known unsupported limitation for unified tracing (mention
LCORE and LLS will not join traces) or state a concrete decision plan with a
deadline and owner to investigate in-process context alignment (LCORE + LLS
contract); ensure references to AsyncLlamaStackAsLibraryClient, LCORE, and LLS
are used so readers can find the relevant sections and remove the contradictory
language elsewhere so library-mode status is consistent across the doc.
- Around line 454-455: Update the doc to state that in manual/hybrid SDK
initialization paths the OTEL_SDK_DISABLED env var must be explicitly checked
before performing any tracer/provider setup; call out that implementations of
initOpenTelemetry(), setupTracerProvider(), configureTracer(), or any manual
tracer initialization entry points must early-return when
process.env.OTEL_SDK_DISABLED (or equivalent config) is truthy so no providers,
span processors, or exporters are created, and include a short example checklist
of where to add the check (entry/init functions) and the expected behavior when
the switch is set.
- Line 298: The fenced code block in the documentation is missing blank lines
around its opening/closing triple-backtick markers; update the markdown so there
is an empty line immediately before the opening ``` and an empty line
immediately after the closing ```, thereby satisfying MD031 and ensuring the
fenced code block has surrounding blank lines.
- Line 108: Add a concrete verification/test-plan section to
observability-opentelemetry.md that satisfies R10 by listing step-by-step
commands to run (e.g., start the service with the documented env vars, run a
sample request via curl or a test script, and push spans via OTLP/collector),
the exact expected outcomes (span names, attributes, sampling decisions, and
HTTP status codes), and the evidence criteria to collect (trace IDs, screenshots
of Jaeger/UI queries, presence of spans in OTLP/collector logs or storage, and
matching attribute values). Reference the R10 requirement explicitly and include
example commands for starting the collector and service, running a test request,
querying the tracing backend (Jaeger/Tempo/OTLP endpoint), and a pass/fail
checklist for each verification step so reviewers can reproduce and validate
tracing behavior.
- Around line 194-244: Add a blank line before each second-level heading
instances "### Configuration model:", "### Pros:", "### Cons:", and "###
Important constraints:" so they are separated from preceding paragraphs (fixing
MD022); update the section where Option 2 begins to ensure each of those
headings has a single empty line above it in the markdown so the linter no
longer reports missing heading spacing.
- Line 557: Replace the ambiguous phrase so the doc mandates a single behavior:
when LCORE tracing is set to enabled: true and any mandatory YAML export fields
are missing or invalid, the system must fail fast and refuse to start (i.e.,
abort startup with a clear error and log), rather than allowing tracing to
continue; update the sentence referencing LCORE, enabled: true, and export
fields to state this explicit fail-fast requirement and remove the alternative
"refuse tracing" wording that implies a choice.
- Line 412: Update the note to clarify that OTEL_PYTHON_FASTAPI_EXCLUDED_URLS
only takes effect when FastAPI instrumentation is enabled: change the sentence
mentioning OTEL_PYTHON_FASTAPI_EXCLUDED_URLS to explicitly state its dependency
on active FastAPI instrumentation (e.g., "These can be filtered via
OTEL_PYTHON_FASTAPI_EXCLUDED_URLS when FastAPI instrumentation is enabled") so
readers understand the env var has no effect without the FastAPI instrumentation
being active.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 89e9f8c7-723f-4bde-b600-1098fa1dd938

📥 Commits

Reviewing files that changed from the base of the PR and between bc1df2f and 7e5db9c.

📒 Files selected for processing (1)

docs/design/observability-opentelemetry/observability-opentelemetry.md

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Pylinter
GitHub Check: build-pr
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

🪛 markdownlint-cli2 (0.22.0)

docs/design/observability-opentelemetry/observability-opentelemetry.md

[warning] 194-194: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 199-199: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 205-205: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 210-210: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 224-224: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 232-232: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 238-238: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 244-244: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 298-298: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

asimurka requested review from anik120 and tisnik April 10, 2026 07:19

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

asimurka force-pushed the observability_fa branch from 67f9919 to e0e6b4f Compare April 10, 2026 07:23

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread docs/design/observability-opentelemetry/observability-opentelemetry.md

Comment thread docs/design/observability-opentelemetry/observability-opentelemetry.md

tisnik reviewed Apr 10, 2026

View reviewed changes

asimurka force-pushed the observability_fa branch from e0e6b4f to bc1df2f Compare April 13, 2026 08:21

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Created FA doc for OTEL Observability

7e5db9c

asimurka force-pushed the observability_fa branch from bc1df2f to 7e5db9c Compare April 13, 2026 12:52

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

coderabbitai Bot mentioned this pull request Apr 14, 2026

Expose OTEL_* environment variables in the v1/config endpoint #1499

Closed

		- When both YAML and env vars define the same concern, YAML mandatory fields take precedence for endpoint, protocol, and service name; `OTEL_*` variables control optional advanced settings (sampling, headers, TLS files, instrumentor-only options).

		Recommendation: Option 1 is rejected for maintainability. Option 2 remains viable if LCORE is run with `opentelemetry-instrument`. Option 3 leverages LCORE’s configuration-driven design to ensure mandatory fields are always explicit when tracing is enabled.

Conversation

asimurka commented Apr 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tisnik Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asimurka commented Apr 10, 2026

Uh oh!

tisnik commented Apr 10, 2026

Uh oh!

asimurka commented Apr 10, 2026

Uh oh!

jrobertboos commented Apr 10, 2026

Uh oh!

asimurka commented Apr 13, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

asimurka commented Apr 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 10, 2026 •

edited

Loading

asimurka Apr 10, 2026 •

edited

Loading

asimurka commented Apr 10, 2026 •

edited

Loading