Evaluate OpenTelemetry semantic conventions for feature flag observability

## Context

Enterprise teams at NAV use Unleash with complex flag configurations (gradual rollouts, A/B testing, user targeting). The Unleash SDK already sends aggregate evaluation metrics (yes/no counts, variant distribution) to the Unleash server, but these are disconnected from the application's observability stack (traces, logs, metrics in Grafana).

## Proposal

Evaluate and prototype the [OpenTelemetry semantic conventions for feature flags](https://opentelemetry.io/docs/specs/semconv/feature-flags/feature-flags-events/) (currently Release Candidate status). The spec defines a standard `feature_flag.evaluation` event with structured attributes:

| Attribute | Description |
|---|---|
| `feature_flag.key` | Flag name (e.g., `quotes.submit`) |
| `feature_flag.result.variant` | Variant name |
| `feature_flag.result.value` | Evaluated value |
| `feature_flag.result.reason` | `default`, `targeting_match`, `split`, `error`, etc. |
| `feature_flag.provider.name` | `Unleash` |
| `feature_flag.context.id` | Targeting key (user/session ID) |

## Why This Matters for Enterprise Teams

1. **Trace correlation** — Flag evaluations as span events let you see "this request got variant X and returned a 500." Aggregate Unleash metrics can't do this.
2. **Cross-service impact** — A flag change in service A may affect latency in service B. OTel traces already connect the dots; adding flag events makes causation visible.
3. **Grafana-native** — Teams already use Tempo and Loki via NAIS. Flag events flow through the existing OTel pipeline with no new infrastructure.
4. **A/B testing** — Correlate variant assignment with business metrics (error rates, p95 latency) per variant per user cohort.
5. **Rollout safety** — Compare error rates between users hitting the new code path (`targeting_match`) vs. old (`default`) during gradual rollouts.

## Implementation Approach

A thin wrapper around Unleash SDK calls that emits OTel log events following the semantic conventions. This is lightweight and complements (not replaces) NAIS auto-instrumentation.

**Kotlin (backend):**
```kotlin
// Emit feature_flag.evaluation event via OTel Logger API
val logger = GlobalOpenTelemetry.get().logsBridge.loggerBuilder("feature-flags").build()
logger.logRecordBuilder()
    .setBody("feature_flag.evaluation")
    .setAttribute(AttributeKey.stringKey("feature_flag.key"), flagName)
    .setAttribute(AttributeKey.stringKey("feature_flag.result.variant"), variant.name)
    .setAttribute(AttributeKey.booleanKey("feature_flag.result.value"), enabled)
    .setAttribute(AttributeKey.stringKey("feature_flag.provider.name"), "Unleash")
    .emit()
```

**TypeScript (frontend API routes):**
```typescript
// Emit via @opentelemetry/api logs
import { logs } from '@opentelemetry/api';
const logger = logs.getLogger('feature-flags');
logger.emit({
  body: 'feature_flag.evaluation',
  attributes: {
    'feature_flag.key': flagName,
    'feature_flag.result.variant': variant?.name,
    'feature_flag.result.value': enabled,
    'feature_flag.provider.name': 'Unleash',
  },
});
```

## Suggested Grafana Dashboard Panels

| Panel | Query Basis |
|---|---|
| Flag evaluation rate by flag | Count of `feature_flag.evaluation` events grouped by `feature_flag.key` |
| Variant distribution over time | Group by `feature_flag.result.variant` |
| Error rate by flag variant | Correlate variant with HTTP 5xx spans |
| Flag evaluation reasons | Distribution of `feature_flag.result.reason` |
| P95 latency per variant | Filter spans by variant, compute duration percentiles |

## Open Questions

- Does NAIS auto-instrumentation already pick up OTel log events, or do we need explicit exporter config?
- Should this live as a shared library/pattern, or inline per-service?
- Is the OTel Logs API mature enough in the Java and Node.js SDKs for production use?
- Should we also evaluate Unleash's [Impact Metrics](https://github.com/Unleash/unleash-java-sdk#impact-metrics) (counters/gauges/histograms sent to Unleash server) as a complementary approach?

## Acceptance Criteria

- [ ] Prototype OTel `feature_flag.evaluation` events in quotes-backend (Kotlin)
- [ ] Prototype OTel `feature_flag.evaluation` events in quotes-frontend (TypeScript)
- [ ] Verify events appear in Tempo/Loki via NAIS auto-instrumentation pipeline
- [ ] Create example Grafana dashboard panels querying flag evaluation data
- [ ] Document findings and recommendation for enterprise teams


Attribute	Description
`feature_flag.key`	Flag name (e.g., `quotes.submit`)
`feature_flag.result.variant`	Variant name
`feature_flag.result.value`	Evaluated value
`feature_flag.result.reason`	`default`, `targeting_match`, `split`, `error`, etc.
`feature_flag.provider.name`	`Unleash`
`feature_flag.context.id`	Targeting key (user/session ID)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate OpenTelemetry semantic conventions for feature flag observability #275

Context

Proposal

Why This Matters for Enterprise Teams

Implementation Approach

Suggested Grafana Dashboard Panels

Open Questions

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Panel	Query Basis
Flag evaluation rate by flag	Count of `feature_flag.evaluation` events grouped by `feature_flag.key`
Variant distribution over time	Group by `feature_flag.result.variant`
Error rate by flag variant	Correlate variant with HTTP 5xx spans
Flag evaluation reasons	Distribution of `feature_flag.result.reason`
P95 latency per variant	Filter spans by variant, compute duration percentiles

Evaluate OpenTelemetry semantic conventions for feature flag observability #275

Description

Context

Proposal

Why This Matters for Enterprise Teams

Implementation Approach

Suggested Grafana Dashboard Panels

Open Questions

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions