feat(FeatureFlags): FFE APM feature-flag span enrichment (experimental, gated) by leoromanovsky · Pull Request #3996 · DataDog/dd-trace-php

leoromanovsky · 2026-06-16T22:18:21Z

feat(FeatureFlags): FFE APM feature-flag span enrichment

⚠️ Experimental, opt-in, gated behind DD_EXPERIMENTAL_FLAGGING_PROVIDER_SPAN_ENRICHMENT_ENABLED (off by default).

Summary

Adds Feature Flag Events (FFE) span enrichment to the feature-flag integration. When feature
flags are evaluated, the evaluation metadata is attached to the root APM span so APM customers
can filter traces and errors by active flag variant, and the FFE/Experimentation platform can
correlate spans with experiments. The wire format matches the merged reference implementation
(dd-trace-js#8343) so backend/Trino decode is identical.

How it works

A flag is evaluated (via the OpenFeature DataDogProvider or the native DDTrace\FeatureFlags\Client).
Each evaluation is accumulated inline against the current root span.
At root-span close, the accumulated state is encoded and written as ffe_* tags.

Configuration

Opt-in, off by default:

DD_EXPERIMENTAL_FLAGGING_PROVIDER_SPAN_ENRICHMENT_ENABLED=true

This is distinct from DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED.

Span tags added

Tag	Description	Format
`ffe_flags_enc`	All evaluated flag serial IDs	base64 delta-varint
`ffe_subjects_enc`	Subject → flags mapping (when `doLog=true`)	JSON `{ sha256(key): encodedIds }`
`ffe_runtime_defaults`	Fallback values for flags not in UFC	JSON `{ flagKey: value }`

Limits: 200 serial IDs, 10 subjects, 20 experiments/subject, 5 runtime defaults, 64 chars/runtime-default value (UTF-8-safe truncation).

Changes

Gate + config: add DD_EXPERIMENTAL_FLAGGING_PROVIDER_SPAN_ENRICHMENT_ENABLED (ext/configuration.h), off by default; thread the split serial_id Rust → C → PHP mapper (components-rs/ffe.rs, tracer/ffe.c/.h, ResultMapper.php).
Codec + accumulator: inline span-enrichment accumulation (SpanEnrichmentAccumulator.php) with delta-varint serial IDs + SHA256-hashed subject keys; write ffe_* tags at root-span close (tracer/span.c).
SpanEnrichmentBinder: binds enrichment to the native DDTrace\FeatureFlags\Client path in addition to the OpenFeature DataDogProvider, so non-OpenFeature consumers are enriched too.
Tests: accumulator + result-mapper unit tests; .phpt ext tests for native bridge, serial-id passthrough, eval metrics, and remote-config lifecycle.

Decisions

Inline accumulation (not a finally hook): PHP OpenFeature does not pass ResolutionDetails to finally hooks, so enrichment is accumulated inline.
No idle per-span overhead when the gate is off — the accumulator is absent and spans carry no ffe_* tags.
Lifecycle: accumulator reset on the root-span boundary.
ffe_* are bare tag names on span meta (not _dd.-prefixed); subject keys are SHA256 hashes emitted only when logging is authorized.

Validation

FFE dogfooding app

Validated live against the ffe-dogfooding app via a trace-intake tee-proxy that captures the raw /v0.4/traces payload and decodes the ffe_* tags. Flag ffe-dogfooding-string-flag (serial 2312):

Gate ON — the root span (web.request, auto-instrumented web SAPI) carried ffe_flags_enc decoding to serial [2312] plus a SHA256-hashed ffe_subjects_enc → [2312].
Gate OFF — span flushed with zero ffe_* tags.

Local system-tests run

Ran the frozen system-tests parametric suite (tests/parametric/test_ffe/test_span_enrichment.py, unchanged) against this branch's tracer (dd-library-php-1.21.0, C extension built from source for aarch64-linux-gnu, PHP 8.2 NTS):

TEST_LIBRARY=php ./run.sh PARAMETRIC -k span_enrichment
Library: php@1.21.0
============================= 18 passed in 49.99s ==============================

All 18 cases pass — ffe_flags_enc aggregates serial IDs across evaluations and propagates from child spans to the root (ZAgUAg== → [100,108,128,130]); ffe_subjects_enc carries SHA256-hashed targeting keys gated on doLog; ffe_runtime_defaults is added for not-found flags with 64-char truncation; and all frozen limits are enforced. The SpanEnrichmentBinder change above was required so the native DDTrace\FeatureFlags\Client path (used by the parametric server) is enriched. The system-tests enablement (parametric server.php + manifests/php.yml) is a separate draft PR against DataDog/system-tests.

Full dogfooding matrix + fix (2026-06-17)

Re-validated end-to-end through the real OpenFeature provider path behind the trace-intake
tee-proxy, decoding ffe_* with scripts/decode_ffe_span_tags.py (root span web.request,
service ffe-dogfooding-php8-openfeature, extension built from this branch):

Scenario	Result
Gate ON (serial 2312)	`ffe_flags_enc` → `[2312]`; `ffe_subjects_enc` = `{sha256(targeting key): ids}` only when do_log
Gate OFF	zero `ffe_*` tags; no binder constructed
Aggregation	multiple flags + 2 subjects on one root → `ffe_flags_enc` = `[829,1442,2311,2312]`, nothing overwritten (shared `SpanEnrichmentRegistry`)
Unicode + object runtime defaults	`ffe_runtime_defaults` raw UTF-8 (`héllo-wörld-☃-日本語-Ω`, `こんにちは`, `🎉`), valid JSON, values truncated to 64
Codec parity	`ZAgUAg==` → `[100,108,128,130]`

Fix found by the matrix (commit fix(ffe): emit ffe_* JSON as raw UTF-8 with unescaped slashes):
the unicode scenario showed ffe_runtime_defaults was \uXXXX-escaped (and object values were
truncated mid-escape-sequence, yielding invalid JSON) because json_encode() was called without
flags. Added JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES at the three json_encode sites in
SpanEnrichmentAccumulator.php so the emitted bytes match the frozen Node JSON.stringify
contract (raw UTF-8, bare /). Existing accumulator unit tests json_decode the tags (normalizing
escapes) so they are unaffected; the dogfooding loop is what surfaced the divergence.

System-tests re-confirmed against a tarball rebuilt from this branch's source: 18 passed
(TEST_LIBRARY=php ./run.sh PARAMETRIC -k span_enrichment, library php@1.21.0).

…n-enrichment gate - Add serial_id (i64) + has_serial_id (bool) to the Rust FfeResult struct and populate from assignment.serial_id (unwrap_or(0) + is_some()) in all ctors; regenerate the cbindgen common.h ABI to match. - Surface serialId as a nullable int on the DDTrace\FfeResult object in the C reader (tracer/functions.c), guarded by has_serial_id so absence stays null (Pattern B: missing != 0); update the stub + arginfo. - Thread serialId into ResultMapper::exposureData (only when present). - Add the gate CONFIG(BOOL, DD_EXPERIMENTAL_FLAGGING_PROVIDER_SPAN_ENRICHMENT_ENABLED, "false") to ext/configuration.h (distinct from the provider-enabled gate). - Update existing FFE phpt EXPECT blocks for the new serialId field.

…oot-close write - Add DDTrace\FeatureFlags\SpanEnrichmentAccumulator: per-root-span accumulator + ULEB128 delta-varint/base64/SHA256 codec ported verbatim from the frozen Node reference (dd-trace-js#8343). Limits 200/10/20/5/64, dedupe+sort, object defaults via json_encode, UTF-8-safe 64-char truncation; tag shapes ffe_flags_enc (bare base64), ffe_subjects_enc / ffe_runtime_defaults (JSON objects). - DataDogProvider: accumulate INLINE in resolve() right after recordEvaluationMetric (DG-004, no finally hook); gate-gated lazy accumulator (DG-005 zero-idle); error isolation via try/catch(\Throwable); runtime-default detection via missing variant. - Native request-scoped staging store in tracer/ffe.c (+ ddtrace_globals.h) flushed into the root span meta on the ddtrace_close_span root branch and cleared on root close / RSHUTDOWN (no cross-request leak); gate-off path does no work. - Add DDTrace\Internal\set_ffe_span_enrichment_tags() PHP-callable staging fn. - Tests: SpanEnrichmentAccumulatorTest (7 required L0 cases incl. gate-off control + codec golden round-trip), serial_id_passthrough.phpt (C bridge), ResultMapper serialId threading cases.

…ry (CR-01) The per-provider SpanEnrichmentAccumulator was only ever added to: clear() had zero production callers and accumulateSpanEnrichment() re-staged the FULL accumulated set on every resolve(). After a root span closed, the next root span re-staged the prior root's serial ids / hashed subjects / runtime defaults (within-request multi-root contamination), and because OpenFeature providers are process-level singletons the accumulator leaked across requests in persistent SAPIs -- a privacy leak of SHA256 subject keys. Fix: reset the PHP accumulator on the root-span boundary, in lockstep with the native close-span flush (which already clears the native staging slots on the same ddtrace_close_span root branch + RSHUTDOWN): - Track the active root span id (spl_object_id of DDTrace\root_span()). On any boundary transition, clear the accumulator + native staging store so a dropped/abandoned root (which never runs its onClose) and a new request both start clean. - Bind a one-shot accumulator clear to the root span's $onClose so the PHP object is reset when the root closes (mirrors the frozen Node reference #onSpanFinish cleanup). - Lifecycle is injectable (rootIdResolver / rootCloseScheduler) so the pure-PHP L0 suite can drive root transitions without the extension. Regression tests (fail-before / pass-after): two sequential root spans in one request -> root 2 stages only its own serial ids/subjects/ defaults; dropped-root and cross-request reset -> no carryover incl. no leaked hashed subject keys; root close clears the accumulator with no subsequent eval. Plus a Node String(value) runtime-default parity test (null/true/false/scalars/objects). Native ABI passthrough, codec (ZAgUAg==), limits, gate-off DG-005, and DG-004 inline accumulation are unchanged.

datadog-prod-us1-4 · 2026-06-16T22:19:20Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 14 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | benchmarks-tracer

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [7.1]

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [7.4]

View all 14 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 54.08% (-0.04%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 156726d | Docs | Datadog PR Page | Give us feedback!}

pr-commenter · 2026-06-16T23:37:24Z

Benchmarks [ tracer ]

Benchmark execution time: 2026-06-17 08:47:40

Comparing candidate commit 3229c80 in PR branch leo.romanovsky/ffe-apm-span-enrichment with baseline commit a65b400 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 194 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Long-running CLI servers (parametric test apps) starve the SIGVTALRM-driven remote-config refresh because the process is mostly blocked in IO rather than burning CPU time, so an FFE evaluation issued right after the agent ACKs a pushed UFC config still sees no config and falls back to defaults. Add a dd_trace_internal_fn('await_ffe_config') testing hook that actively pumps remote configs (mirrors await_agent_info) until ddog_ffe_has_config() is true. Enables the FROZEN system-tests span-enrichment parametric suite to load UFC via Remote Config in the long-running PHP parametric server.

Span enrichment was accumulated only inside the OpenFeature DataDogProvider (DG-004 inline path). The native DDTrace\FeatureFlags\Client evaluates flags without going through the provider, so consumers on the native path (the parametric system-tests app, and any non-OpenFeature caller) produced ffe_* tags on the root span for OpenFeature but NOT for the native Client. Extract the per-root-span accumulate/encode/root-boundary lifecycle into a reusable PHP7-compatible SpanEnrichmentBinder and bind it on Client::evaluate(), so both the provider and the native Client stage identical ffe_* tags from the same EvaluationDetails and stay in lockstep with the native close-span write. Honours the FROZEN contract (limits 200/10/20/5/64, delta-varint, SHA256 subjects, runtime-default detection). DG-005: no-op with the gate off.

…ment gate Register DD_EXPERIMENTAL_FLAGGING_PROVIDER_SPAN_ENRICHMENT_ENABLED in metadata/supported-configurations.json by running tooling/generate-supported-configurations.sh. The config was added to ext/configuration.h but the generated metadata was not regenerated, causing the Configuration Consistency CI check to fail.

assertIsInt() is only available in PHPUnit 7.5+, so the new serialId exposure-data test errored on the PHP 7.0 API unit-test job (older PHPUnit). assertInternalType() is unavailable too (removed in PHPUnit 9, and the matrix runs up to PHPUnit <10). Replace with assertTrue(is_int(...)), which works across the whole 7.0-8.5 matrix. The preceding strict assertSame already enforces the integer type.

…-only PR review (#3996), two native findings: - should-fix: DDTrace\root_span() calls dd_ensure_root_span(), which CREATES an autoroot span when none exists. Resolving the root id while merely evaluating a feature flag must not have that side effect. Add a non-creating DDTrace\Internal\peek_root_span_id() that reads DDTRACE_G(active_stack)-> root_span directly (no dd_ensure_root_span) and returns its object handle, identical to spl_object_id(\DDTrace\root_span()) but without trace-state creation. Wired into the stub + committed arginfo (phpize build uses the committed header as-is; no CI stub-hash gate). - should-fix: await_ffe_config sits in the production dd_trace_internal_fn dispatcher and actively pumps Remote Config, blocking up to 5s. Guard it behind a new DD_TEST_HELPERS compile flag (config.m4, defined for the standard CI/test/package builds the system-tests + ffe-dogfooding harnesses run against) so a hardened production build can compile the heavyweight test helper out of the dispatcher entirely. ZTS-safe (DDTRACE_G accessor); no allocation, no refcount changes.

… all paths PR review (#3996) blocker + should-fix. blocker: tracer/ffe.c set_ffe_span_enrichment_tags() REPLACES the three request-global tag slots on every call. Both DataDogProvider and each FeatureFlags\Client/SpanEnrichmentBinder owned a SEPARATE accumulator and staged independently, so two clients, two providers, or a mixed OpenFeature + native-client evaluation under ONE root span would OVERWRITE earlier serial ids / hashed subjects / runtime defaults instead of aggregating them. Fix: introduce SpanEnrichmentRegistry, a single request-scoped accumulator that ALL PHP evaluation paths feed. The staged tag set is now the union of every evaluation on the active root span, matching the frozen Node contract. No tag/encoding/limit semantics changed. should-fix (per-binder onClose retention): the lifecycle is centralized in the registry, which binds AT MOST ONE root-close reset per root span (tracked by rootCloseBoundRootId). Many short-lived clients under one long-lived root no longer each retain a closure + accumulator. SpanEnrichmentBinder is now a thin gate-checked adapter; DataDogProvider drops its inline accumulator + lifecycle. should-fix (gate-off not inert): Client and DataDogProvider now construct NO binder unless DD_EXPERIMENTAL_FLAGGING_PROVIDER_SPAN_ENRICHMENT_ENABLED is on, and evaluate()/resolve() skip the enrichment call entirely when the binder is absent — no per-evaluation config read with the gate off (DG-005). should-fix (root side effect): the registry resolves the root id via the new non-creating DDTrace\Internal\peek_root_span_id(), falling back to the (creating) DDTrace\root_span() only on older extensions.

…non-creating root PR review (#3996) regression coverage. - SpanEnrichmentRegistryTest (PHPUnit, runs without the native ext): two binders (standing in for two clients / a client + a provider) under one simulated root AGGREGATE their serial ids, hashed subjects, and runtime defaults into one staged payload rather than overwriting; CR-01 per-root reset still holds; at most ONE root-close reset is bound across many short-lived binders; the root-close reset clears the shared accumulator. - ClientTest: gate-off Client allocates no SpanEnrichmentBinder and evaluate() short-circuits enrichment without error. - SpanEnrichmentAccumulatorTest: rewired the DG-004 inline + CR-01 multi-root harness to drive the shared registry's seams (the lifecycle moved out of the provider); gate-off assertions now check spanEnrichmentBinder is null. - peek_root_span_id_non_creating.phpt (orchestrator L2, needs built ext): proves peek_root_span_id() returns null without creating a root span (active_span() stays null) and otherwise equals spl_object_id(root_span()).

…arity) json_encode() without flags escaped non-ASCII to \uXXXX and '/' to '\/', diverging from the frozen Node JSON.stringify contract for ffe_subjects_enc and ffe_runtime_defaults. For object/struct runtime defaults the \uXXXX inflation also pushed the value past the 64-char limit so the truncation cut mid-escape-sequence, yielding invalid JSON inside the tag. Add JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES at all three json_encode sites (toSpanTags subjects + runtime defaults, and stringifyDefault for object/array values) so the emitted bytes match Node exactly. Verified via the ffe-dogfooding unicode scenario: decoded ffe_runtime_defaults is now raw UTF-8 (héllo-wörld-☃-日本語-Ω / こんにちは / 🎉), valid JSON, codepoint-safe 64-char truncation.

leoromanovsky added 3 commits June 15, 2026 23:30

leoromanovsky added 9 commits June 16, 2026 19:56

Fix PHP span enrichment CI compatibility

156726d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(FeatureFlags): FFE APM feature-flag span enrichment (experimental, gated)#3996

feat(FeatureFlags): FFE APM feature-flag span enrichment (experimental, gated)#3996
leoromanovsky wants to merge 12 commits into
masterfrom
leo.romanovsky/ffe-apm-span-enrichment

leoromanovsky commented Jun 16, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-4 Bot commented Jun 16, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading

Uh oh!

pr-commenter Bot commented Jun 16, 2026 •

edited

Loading

Explanation

More details about the CI and significant changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leoromanovsky commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(FeatureFlags): FFE APM feature-flag span enrichment

Summary

How it works

Configuration

Span tags added

Changes

Decisions

Validation

FFE dogfooding app

Local system-tests run

Full dogfooding matrix + fix (2026-06-17)

Uh oh!

datadog-prod-us1-4 Bot commented Jun 16, 2026 • edited by datadog-datadog-prod-us1 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

pr-commenter Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [ tracer ]

Explanation

More details about the CI and significant changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leoromanovsky commented Jun 16, 2026 •

edited

Loading

datadog-prod-us1-4 Bot commented Jun 16, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading

pr-commenter Bot commented Jun 16, 2026 •

edited

Loading