✨ feat(profiling): add web worker CPU profiling support#4795
Draft
thomasbertet wants to merge 37 commits into
Draft
✨ feat(profiling): add web worker CPU profiling support#4795thomasbertet wants to merge 37 commits into
thomasbertet wants to merge 37 commits into
Conversation
Bundles Sizes Evolution
|
|
Implements the worker profiling feature described in PROF-15093. Workers opted-in via connectDatadogWorker() in the worker script and datadogRum.addProfilingWorker(worker) on the main thread. New files: - workerProfiling.types.ts — dd-start/stop-profiling command types and dd-worker-trace/error response types - workerProfilingAgent.ts — zero-dependency worker shim; drives new Profiler(), rolls 60s intervals autonomously, flushes on dd-stop-profiling and self.close() - workerProfilingCoordinator.ts — main-thread registry; stable correlationId per worker instance, dispatches commands, ships traces via FormData transport, handles abrupt worker crashes - transport/assembleWorkerProfilingPayload.ts — builds profile event with thread:worker / worker.name / thread.correlation_id tags, no view/action/longTask/vital context - entries/worker.ts — @datadog/browser-rum/worker sub-entry Modified files: - datadogProfiler.ts — accepts getWorkerCorrelationIds() callback; embeds correlation IDs in main-thread profile tags so the backend can join main and worker profiles - transport/assembly.ts — appends thread.correlation_id tags for each registered worker to the main-thread profile event - profilerApi.ts — creates WorkerProfilingCoordinator on RUM start, starts it after lazy-load, wires getWorkerCorrelationIds - rumPublicApi.ts — adds addProfilingWorker / removeProfilingWorker to RumPublicApi; adds getWorkerCoordinator() to ProfilerApi - noopProfilerApi.ts / stubProfilerApi.ts — add stub for the new getWorkerCoordinator method
Self-contained single-page app in test/apps/worker-profiling/ that
demonstrates the experimental web worker CPU profiling feature end-to-end.
The app:
- Inits Datadog RUM with profilingSampleRate: 100
- Spawns a dedicated worker (profilingWorker.ts) that loops over four
CPU-intensive workloads chosen to produce varied, recognisable call
stacks in a flamegraph:
· Sieve of Eratosthenes (prime counting, tight inner loop)
· Recursive Fibonacci fib(30) (exponential branching, deep stacks)
· 80×80 matrix multiplication (numeric throughput)
· 100×100 Mandelbrot set (mixed branches + arithmetic)
- Calls connectDatadogWorker() in the worker to enable the Datadog
profiling shim, and datadogRum.addProfilingWorker(worker) on the
main thread
- Displays live stats (iterations, primes, fib result, elapsed time)
- Provides Stop / Restart worker buttons wired to
removeProfilingWorker / addProfilingWorker
Webpack config:
- Two independent entry points (main + worker) → main.js + worker.js
- TsconfigPathsPlugin resolves @datadog/* to monorepo sources
- Dev server (port 8081) adds Document-Policy: js-profiling header
to both the document and worker.js responses
Also adds @datadog/browser-rum/worker to tsconfig.base.json paths
so IDEs and TsconfigPathsPlugin can resolve the new worker sub-entry.
Usage:
cd test/apps/worker-profiling && yarn install && yarn dev
# Open http://localhost:8081 in Chromium Canary with:
# --enable-features=DocumentPolicyInDedicatedWorker,ProfilerAPIForDedicatedWorker
…k config The SDK source references three build-time constants injected by the root webpack base via DefinePlugin. Without them the browser throws: ReferenceError: __BUILD_ENV__SDK_VERSION__ is not defined Add a DefinePlugin to the test app webpack config with dev-mode values: SDK_VERSION → 'dev' SDK_SETUP → 'npm' WORKER_STRING → '' (deflate worker not needed in the test app)
…ofile display Replace the /dev-null proxy with a real local proxy server (proxy-server.ts) that captures and displays profiling intake payloads directly in the UI. proxy-server.ts (new, port 8082): - POST /proxy — receives SDK intake requests via the standard ?ddforward= protocol; parsed by createIntakeProxyMiddleware (reused from E2E lib) without forwarding anything to Datadog - GET /events — SSE endpoint; streams a JSON summary of each captured profile to the browser page src/main.ts: - Points proxy: to http://localhost:8082/proxy - Connects to GET /events via EventSource and renders profile cards as they arrive (newest first, with fade-in animation) - Each card shows: thread (main/worker), worker name, sample count, frame count, duration, session ID, correlation IDs, and a top-8 frames table (frame name, source file, line, hit count) src/index.html: - Two-column layout: workload controls + worker stats on the left, live profile stream on the right - Proxy connection status indicator (green/red) - Profile cards with thread-coloured labels webpack.config.js: remove now-unused webpack-dev-server proxy config Usage: two terminals — `yarn dev` + `yarn proxy`, open localhost:8081
…me-origin Move the SSE + intake proxy from a separate process (proxy-server.ts, port 8082) into the webpack-dev-server's setupMiddlewares hook so that a single 'yarn dev' is all that's needed. - webpack.config.js: inline CJS port of intakeProxyMiddleware logic (busboy multipart parse, deflate decode, SSE broadcast) directly in setupMiddlewares — adds GET /events and POST /proxy on port 8081 - main.ts: point proxy and EventSource at same-origin paths (/proxy, /events) instead of http://localhost:8082 - index.html: remove 'run yarn proxy in a second terminal' note
…dev-middleware Empty __BUILD_ENV__WORKER_STRING__ caused RUM to spin up a no-op Blob Worker that never sent 'initialized', timing out after 30s and aborting. Fix: add a second webpack compiler in setupMiddlewares that builds the browser-worker package on the fly and serves it at /datadog-worker.js, then pass workerUrl: '/datadog-worker.js' to datadogRum.init() so the deflate worker loads from source instead of the empty inlined string.
…rker Revert to separate dev (port 8081) + proxy (port 8082) commands. proxy-server.ts: - Builds and serves /datadog-worker.js via webpack-dev-middleware so the deflate worker initialises correctly (fixes the 30s timeout that aborted RUM startup when WORKER_STRING was empty) - Handles CORS so the page on 8081 can POST/GET to 8082 - Parses profile multipart payloads inline (CJS-free, ESM busboy) - Broadcasts profile summaries over SSE GET /events webpack.config.cjs (renamed from .js): - Back to minimal config: no busboy/SSE, no second compiler - Renamed to .cjs so Node treats it as CommonJS with package type:module package.json: - Add type:module (silences Node ESM detection warning for proxy-server.ts) - Remove webpack-dev-middleware (only needed by proxy, uses root modules) - proxy script: node proxy-server.ts main.ts: workerUrl + proxy + EventSource all point to PROXY_ORIGIN (8082)
…-dev-server Browsers block cross-origin Worker construction, so the deflate worker bundle cannot be fetched directly from port 8082. Add a webpack-dev-server proxy rule that forwards GET /datadog-worker.js to localhost:8082 so the browser sees it as same-origin (port 8081). main.ts: workerUrl: '/datadog-worker.js' (same-origin, no PROXY_ORIGIN prefix)
Use <details>/<summary> so cards are collapsed on arrival showing only the thread label, timestamp and duration. Click to expand the stats, correlation IDs and top frames. Chevron rotates on open.
…very 30s Demonstrates that worker profiling works for transient workers too. shortLivedWorker.ts (new): - Calls connectDatadogWorker() then immediately runs a 5s CPU burst (sort 10k, prime factors, hash-like mixing loop) then calls self.close() main.ts: - Spawns a burst-worker-N every 30s (plus one immediately on load) - Calls addProfilingWorker() on spawn, removeProfilingWorker() on 'done' - Tracks spawn count in the UI webpack.config.cjs: add short-lived-worker entry point index.html: burst-workers card + stat counter
…ating
self.close() kills the message port before dd-worker-trace can be sent,
so the profile was silently dropped.
Fix: remove self.close() from shortLivedWorker — the burst now posts
{ kind: 'done' } and keeps the port open. The main thread calls
removeProfilingWorker() (which sends dd-stop-profiling → agent flushes
profiler.stop() → posts dd-worker-trace), then hard-terminates the
worker after a 5s grace period.
…ker() for short-lived workers
Two new APIs for cleanly profiling short-lived workers:
workerProfilingAgent.ts:
- connectDatadogWorker() now returns { stopAndFlush() } instead of void
- stopAndFlush(): flush current session then self.close() \u2014 call this
inside the worker instead of self.close() so the trace is captured
workerProfiling.types.ts:
- New dd-flush-and-close command: main thread asks worker to flush
then self.close() (used by flushAndTerminateProfilingWorker)
workerProfilingCoordinator.ts:
- flushAndTerminateWorker(): sends dd-flush-and-close, keeps registration
alive to receive the dd-worker-trace, hard-terminates after 5s safety net
profilerApi.ts / rumPublicApi.ts:
- flushAndTerminateProfilingWorker(worker) wired through the full stack
- Buffer also handles the new flushAndTerminate pending action
Test app \u2014 two short-lived worker variants, alternating every 30s:
- Variant A (odd): shortLivedWorker \u2014 calls stopAndFlush() then self.close()
- Variant B (even): shortLivedWorkerMainThreadClose \u2014 posts done, main
thread calls flushAndTerminateProfilingWorker()
…short-lived workers Workers no longer need to signal the main thread when they're done. The SDK layer handles everything: shortLivedWorker.ts (variant A \u2014 self-close): - Calls stopAndFlush() when burst is complete \u2014 that's it - No postMessage, no main-thread coordination needed shortLivedWorkerMainThreadClose.ts (variant B \u2014 main-close): - Just does work indefinitely, connectDatadogWorker() is the only SDK call - Main thread calls flushAndTerminateProfilingWorker() after 5s timeout main.ts: - Variant A: spawn + addProfilingWorker, nothing else (worker self-manages) - Variant B: spawn + addProfilingWorker + setTimeout flushAndTerminate - Remove 'done' message listener entirely
…ghted docs Restructure the page into three columns: - Left: live demo controls (worker stats, burst counter, status) - Center: code documentation with syntax-highlighted snippets for all three instrumentation patterns (long-lived, self-close, main-close) plus the HTTP header requirement - Right: captured profiles stream (unchanged) Uses highlight.js (CDN, github-dark theme) for TypeScript/JS/HTTP syntax highlighting — no build step changes needed.
Flex children don't shrink below their content size by default, so overflow-y: auto had nothing to scroll — cards were squashed and <details> expand didn't work. Fix: add min-height: 0 to .col, .col-right, and #profile-list so the flex sizing chain allows shrinking. Add flex-shrink: 0 to .profile-card so individual cards keep their natural height and the list scrolls.
…w-to Column order is now: 1. Live demo controls (260px) 2. Captured profiles (360px, scrollable) 3. How to instrument — code snippets (flex 1fr) Update borders accordingly (col-right gets right border, col-center is last so no border).
…+ startProfilingWorker
Public API (rumPublicApi.ts):
- addProfilingWorker() → registerProfilingWorker() — now returns () => void
(unregister function that flushes the session on call)
- removeProfilingWorker() — removed
- flushAndTerminateProfilingWorker() — removed (SDK never terminates workers)
Worker-side API (workerProfilingAgent.ts / entries/worker.ts):
- connectDatadogWorker() → startProfilingWorker()
- returned handle: { stopAndFlush() } → { stop() }
stop() only flushes the profile — worker lifecycle is caller's responsibility
- dd-flush-and-close command removed (workerProfiling.types.ts)
Coordinator (workerProfilingCoordinator.ts):
- addWorker() + removeWorker() + flushAndTerminateWorker() → registerWorker()
registerWorker() returns an unregister callback (flushes + tears down)
profilerApi.ts:
- Pending call buffer simplified to a single register action
- Buffered unregister correctly delegates once coordinator is ready
Test app updated throughout:
- main.ts: registerProfilingWorker, unregister pattern, no terminate calls
- profilingWorker.ts / shortLivedWorker.ts / shortLivedWorkerMainThreadClose.ts:
startProfilingWorker(), stop() + self.close() in self-close variant
- index.html: docs snippets updated to match new API
…AndTerminateProfilingWorker reference
Public API:
- registerProfilingWorker() → attachProfilingWorker() (returns detach fn)
Worker-side API:
- startProfilingWorker() → attachProfiler() (returns { flush })
- stop() on handle → flush()
Coordinator (internal):
- registerWorker() / unregisterWorker() → attachWorker() / detachWorker()
Wire protocol:
- dd-start-profiling → dd-profiling-config (honest: delivers config, does not
start profiling — new Profiler() happens inside the worker after receiving it)
- dd-stop-profiling → dd-detach-profiler
profilerApi.ts buffer: unregister → detach throughout
Test app + index.html docs updated to match.
…ts asynchronously
…andle
Symmetric counterpart to attachProfiler() — the name describes what it does
(disconnect from the pipeline) rather than the implementation detail (flush).
DatadogWorkerHandle: { flush } → { detachProfiler }
Test app + docs updated throughout.
Mirror what datadogProfiler does for the main-thread profiler: - visibilitychange → hidden: flush all workers (dd-detach-profiler) then immediately re-deliver dd-profiling-config so each worker starts a fresh Profiler instance (ready when the tab becomes visible again) - beforeunload: flush all workers, no restart (page is gone) Listeners are registered in start() and torn down in stop(), so they only run while a profiling session is active.
…lity change The previous implementation incorrectly restarted workers immediately after flushing on visibilitychange → hidden (restartAfter: true). The main-thread profiler actually pauses (stops collecting, no new Profiler instance) when the tab is hidden, and only resumes when visible again. Workers now follow the same contract: - visibilitychange → hidden: pauseAllWorkers() — sends dd-detach-profiler, enters isPaused state. No new Profiler started. - visibilitychange → visible: resumeAllWorkers() — re-delivers dd-profiling-config, workers start a fresh Profiler instance. - beforeunload: flushAndRestartAllWorkers() — flush + immediate restart, because beforeunload can fire while the page is still alive (e.g. mailto:). - detachWorker() / stop(): skip dd-detach-profiler if already paused (worker has no active session to flush).
All console.log/warn/error calls in workerProfilingAgent.ts, workerProfilingCoordinator.ts, and profilerApi.ts were development-only debug instrumentation that would be visible to customers in production. Removed before shipping.
…ile card - proxy-server: broadcast full tags array (split from tags_profiler string) - main.ts: render tags as pills below the correlation ID row - Worker-specific tags (thread:worker, worker.name:*, thread.correlation_id:*) rendered in amber to distinguish them from generic profiler tags
…ags, not bespoke fields The proxy was extracting thread, workerName, correlationIds from tags_profiler and broadcasting them as separate fields. The client was consuming them independently. This is extraneous work a customer cannot and would not do. Now: proxy broadcasts only tags[] (split from tags_profiler) + timing/frame data. Client derives thread, workerName, and correlation IDs purely from tag values — exactly as a real consumer would via the Datadog profiling API.
…azy chunk
workerProfilingCoordinator and assembleWorkerProfilingPayload were statically
imported into profilerApi.ts, shipping ~350 lines of coordinator code in the
main bundle for all sessions — even the vast majority that never profile.
Now both dynamic imports in lazyLoadProfiler share the same webpackChunkName
("datadogProfiler"), so webpack merges them into a single chunk that is only
fetched for sampled profiling sessions. The coordinator is created inside the
.then() callback alongside createRumProfiler, with no change to behaviour or
the public API.
…files SDK files: - workerProfilingAgent.ts: file-level eslint-disable for zone-js/Date.now (Zone.js does not run in workers; browser-core cannot be imported here), fix unused vars, remove durationMs, use ProfilerTrace type to drop 'as any' - workerProfilingCoordinator.ts: merge duplicate @browser-rum-core imports, add eslint-disable for Worker.addEventListener/removeEventListener (Zone.js does not patch Worker), add monitor-until comments, remove empty else block, remove unused durationMs - profilerApi.ts: fix import order, type→interface for PendingCall, curly - rumPublicApi.ts: fix @param path for options.name, eslint-disable no-empty-function - worker.ts: fix JSDoc indentation (use backtick inline code) - lazyLoadProfiler.ts: update return type for new LazyProfilerModule shape Test app files: - proxy-server.ts → proxyServer.ts (unicorn/filename-case) - package.json: add express/cors/busboy/webpack-dev-middleware deps - proxyServer.ts: fix JSDoc indentation, curly, floating promises (async route), no-misused-promises, unsafe-call/return with targeted eslint-disables - main.ts: fix JSDoc indentation, prefer-template, remove unnecessary assertions - profilingWorker.ts: fix JSDoc indentation, curly - shortLivedWorker.ts: fix JSDoc indentation, curly, no-bitwise disables, unbound-method (use handle.detachProfiler() instead of destructuring) - shortLivedWorkerMainThreadClose.ts: fix JSDoc indentation, curly, update stale comment referencing old API name
abd5a18 to
20356da
Compare
…r lint - tsconfig.json: fix @datadog/js-core/time path (missing /entries segment) - proxyServer.ts: remove async/await from Express route handler (version- independent void return); cast webpackDevMiddleware to express.RequestHandler to avoid no-misused-promises across different @types/express versions
The monorepo node_modules/@datadog/js-core symlink has no built cjs/esm output, so webpack cannot resolve the package.json exports map at build time. Add an explicit alias pointing directly at the TypeScript source file so webpack can bundle it without requiring a prior build step.
…r webpack config Missing /entries segment in path caused 'Cannot resolve @datadog/js-core/time' errors when the proxy server built the deflate worker bundle.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Adds experimental support for profiling dedicated web workers using the JS Self-Profiling API inside workers (WICG PR #88). This allows customers to get CPU profiles from their worker threads alongside main-thread profiles, correlated via the same session and a per-worker
thread.correlation_idtag.Currently experimental — requires Chromium Canary with
--enable-features=DocumentPolicyInDedicatedWorker,ProfilerAPIForDedicatedWorkerand theDocument-Policy: js-profilingheader served on the worker script.Closes PROF-15093. (Plan)
Changes
New public API
Main thread —
datadogRum.attachProfilingWorker(worker, { name? })registers aWorkerwith the profiling pipeline and returns adetachfunction. Callingdetach()flushes the current profiling session and disconnects the worker from the pipeline (worker lifecycle —terminate()/close()— is always the caller's responsibility).Worker — a new sub-entry
@datadog/browser-rum/workerexportsattachProfiler(), which starts the in-worker profiling agent and returns{ detachProfiler }. Minimal footprint: no RUM internals inside the worker.SDK changes
packages/browser-rum/src/domain/profiling/workerProfilingAgent.ts— in-worker profiling agent (attachProfiler)packages/browser-rum/src/domain/profiling/workerProfilingCoordinator.ts— main-thread coordinator: session sampling, config delivery, trace collection, pause/resume on visibility changepackages/browser-rum/src/domain/profiling/workerProfiling.types.ts— wire protocol types (dd-profiling-config,dd-detach-profiler,dd-worker-trace,dd-worker-error)packages/browser-rum/src/domain/profiling/transport/assembleWorkerProfilingPayload.ts— intake payload assembly for worker profiles (reuses existing transport, same endpoint)packages/browser-rum/src/entries/worker.ts— new sub-entry pointpackages/browser-rum/src/boot/profilerApi.ts— coordinator wiring + pre-start call bufferingpackages/browser-rum-core/src/boot/rumPublicApi.ts—attachProfilingWorkeronRumPublicApiWire protocol
dd-profiling-confignew Profiler()on receiptdd-detach-profilerdd-worker-tracedd-worker-errornot-supported-by-browser/missing-document-policy-header/unexpected-exceptionPause / resume
Mirrors main-thread profiler behaviour exactly:
visibilitychange → hidden: flush all workers, pause (no restart)visibilitychange → visible: re-deliverdd-profiling-configto all workersbeforeunload: flush + immediate restart (page may survive, e.g. mailto: links)Test app
test/apps/worker-profiling/— a self-contained demo app (webpack + Express proxy) that:tags_profilertags, top frames, and timingTest instructions
Requires Chrome Canary with experimental flags:
Profiles from both the main thread and workers should appear in the UI within ~60s. Worker profile cards display
thread:worker,worker.name:, andthread.correlation_id:tags in amber.Checklist