Problem
We currently cannot answer "where did a request's time go?" from inside the platform. When the cloud control-plane data API feels slow (e.g. a /api/v1/data/sys_environment console request measured at ~360ms of row-count-independent server time on top of network), external black-box probing cannot break it down. The cost is split across auth/session resolution, org-scope beforeFind hooks, the find + count SQL, and serialization — all invisible from outside.
Performance analysis should be a basic platform capability: a "perf-tuning mode" where an operator can see, per request, exactly how the time was spent — without standing up an external tracing backend.
What already exists (reuse, don't reinvent)
MetricsRegistry (counter/histogram/gauge) — packages/observability/src/contracts.ts:40
instrumentRouteHandler already times every HTTP request and assigns X-Request-Id — packages/runtime/src/observability/instrument.ts:52 (emits http_request_duration_ms)
HookMetricsRecorder already times every hook execution — packages/objectql/src/hook-metrics.ts:41
ObservabilityServicePlugin registers the metrics service — packages/runtime/src/observability/observability-service-plugin.ts:71
What's missing (this card)
- No per-request
Server-Timing response header. Metrics only flow to aggregate OTLP; nothing comes back to the browser, so a developer can't open DevTools → Network and see the breakdown for the request they just made.
- The SQL driver does not time individual queries.
packages/plugins/driver-sql/src/sql-driver.ts never wires knex .on('query'|'query-response'), so we have no per-query duration or per-request query count — the single most useful number for diagnosing "N sequential round-trips".
- No request-scoped collector. No
AsyncLocalStorage threads a "current request" through find/count/hook/driver, so per-query timings can't be attributed back to the originating HTTP request.
Proposed design
A request-scoped timing collector, gated, that emits a standard Server-Timing header.
- Collector: an
AsyncLocalStorage<TimingCollector> in @objectstack/runtime (or @objectstack/observability). When inactive, record() is a no-op → zero overhead when the mode is off.
- Entry/exit:
instrumentRouteHandler (instrument.ts:52) opens the scope at request start; at response it serializes the collected spans into a Server-Timing header. The dispatcher result already supports headers — http-dispatcher.ts (HttpDispatcherResult.response.headers).
- SQL spans: wire knex
.on('query') + .on('query-response') in SqlDriver (constructor, sql-driver.ts:523), keyed by __knexQueryUid, recording { sql, durationMs } into the active collector. Surfaces total DB time + query count + slowest query.
- Hook spans: feed the existing
HookMetricsRecorder durations (hook-metrics.ts) into the collector too — splits "hook time" from "DB time".
- Auth/session span: time the identity/session resolution in the dispatcher (
http-dispatcher.ts executionContext resolution) — the prime suspect for the unexplained overhead.
- Header shape (RFC:
Server-Timing):
Server-Timing: auth;dur=42, db;dur=210;desc="6 queries", hooks;dur=18, serialize;dur=7, total;dur=355
Gating
- Off by default. Enabled per-request via a debug header (
X-OS-Debug-Timing: 1) requiring admin/service identity, or globally via an env flag (OS_PERF_TIMING=1) for an environment under investigation.
- Optional richer JSON (per-query SQL list, slowest query) only for admin/service — never leak SQL text to ordinary users.
Open-core boundary
This is a mechanism → lives in the framework (the request path, driver, engine all do). Policy (who may see it, whether it's exposed in the cloud console as a "perf panel") is cloud-side. Mirrors the existing observability split (mechanism in framework, exporter/gating in cloud — see docs/design/observability.md).
Acceptance criteria
Why now
This is the tool that finally breaks down the ~360ms control-plane data-path overhead we've been chasing by external probing (see cloud perf investigation). It's both the diagnostic we need and a permanent platform capability.
Problem
We currently cannot answer "where did a request's time go?" from inside the platform. When the cloud control-plane data API feels slow (e.g. a
/api/v1/data/sys_environmentconsole request measured at ~360ms of row-count-independent server time on top of network), external black-box probing cannot break it down. The cost is split across auth/session resolution, org-scopebeforeFindhooks, thefind+countSQL, and serialization — all invisible from outside.Performance analysis should be a basic platform capability: a "perf-tuning mode" where an operator can see, per request, exactly how the time was spent — without standing up an external tracing backend.
What already exists (reuse, don't reinvent)
MetricsRegistry(counter/histogram/gauge) —packages/observability/src/contracts.ts:40instrumentRouteHandleralready times every HTTP request and assignsX-Request-Id—packages/runtime/src/observability/instrument.ts:52(emitshttp_request_duration_ms)HookMetricsRecorderalready times every hook execution —packages/objectql/src/hook-metrics.ts:41ObservabilityServicePluginregisters the metrics service —packages/runtime/src/observability/observability-service-plugin.ts:71What's missing (this card)
Server-Timingresponse header. Metrics only flow to aggregate OTLP; nothing comes back to the browser, so a developer can't open DevTools → Network and see the breakdown for the request they just made.packages/plugins/driver-sql/src/sql-driver.tsnever wires knex.on('query'|'query-response'), so we have no per-query duration or per-request query count — the single most useful number for diagnosing "N sequential round-trips".AsyncLocalStoragethreads a "current request" throughfind/count/hook/driver, so per-query timings can't be attributed back to the originating HTTP request.Proposed design
A request-scoped timing collector, gated, that emits a standard
Server-Timingheader.AsyncLocalStorage<TimingCollector>in@objectstack/runtime(or@objectstack/observability). When inactive,record()is a no-op → zero overhead when the mode is off.instrumentRouteHandler(instrument.ts:52) opens the scope at request start; at response it serializes the collected spans into aServer-Timingheader. The dispatcher result already supportsheaders—http-dispatcher.ts(HttpDispatcherResult.response.headers)..on('query')+.on('query-response')inSqlDriver(constructor,sql-driver.ts:523), keyed by__knexQueryUid, recording{ sql, durationMs }into the active collector. Surfaces total DB time + query count + slowest query.HookMetricsRecorderdurations (hook-metrics.ts) into the collector too — splits "hook time" from "DB time".http-dispatcher.tsexecutionContext resolution) — the prime suspect for the unexplained overhead.Server-Timing):Gating
X-OS-Debug-Timing: 1) requiring admin/service identity, or globally via an env flag (OS_PERF_TIMING=1) for an environment under investigation.Open-core boundary
This is a mechanism → lives in the framework (the request path, driver, engine all do). Policy (who may see it, whether it's exposed in the cloud console as a "perf panel") is cloud-side. Mirrors the existing observability split (mechanism in framework, exporter/gating in cloud — see
docs/design/observability.md).Acceptance criteria
/api/v1/data/<object>response carries aServer-Timingheader decomposing total into at least: auth, db (with query count), hooks, serialize.record()).Why now
This is the tool that finally breaks down the ~360ms control-plane data-path overhead we've been chasing by external probing (see cloud perf investigation). It's both the diagnostic we need and a permanent platform capability.