Skip to content

feat(cache-proxy): emit standalone OTLP traces#790

Merged
benben merged 1 commit into
mainfrom
ben/cache-proxy-tracing
Jun 17, 2026
Merged

feat(cache-proxy): emit standalone OTLP traces#790
benben merged 1 commit into
mainfrom
ben/cache-proxy-tracing

Conversation

@benben

@benben benben commented Jun 17, 2026

Copy link
Copy Markdown
Member

What

Adds OpenTelemetry tracing to the cache-proxy so its S3 fetches are visible in Tempo.

When a trace endpoint is configured, the proxy exports spans under service.name=duckgres-cache-proxy:

  • cache.get (cacheable GET) with cache.origin_fetch / cache.peer_fetch children
  • cache.connect (CONNECT tunnels)
  • cache.forward (non-cached methods)

Why these are standalone traces (not stitched into the query trace)

DuckDB httpfs (C++) builds the S3 requests and sends no traceparent, so the proxy has nothing to extract. True single-trace stitching would require one of:

  • per-request traceparent header injection through the httpfs path — only achievable at session granularity (via the secret), not per-query, so all queries in a session would collapse into one trace; or
  • an out-of-band worker→proxy side-channel, keyed by source IP, relying on the one-session-per-worker / serial-query invariant of the remote backend.

Neither is in scope here. Instead each span carries the cross-reference anchors for manual correlation against a query trace:

  • client.address — worker pod IP (→ org/session via Kubernetes)
  • the S3 object — server.address + url.path + duckgres.s3.range
  • span timestamp
  • duckgres.cache.source (hit/peer/miss), duckgres.cache.hit, duckgres.bytes, http.response.status_code

org_id is intentionally absent — the proxy has no per-request tenant identity.

Notes

  • Tracing init is replicated locally (cmd/cache-proxy/tracing.go) rather than imported from internal/cliboot, which transitively pulls in the DuckDB CGO runtime (via posthog/duckgres/server) that this standalone binary must not link. Same env vars as the main binary (OTEL_EXPORTER_OTLP_TRACES_ENDPOINT / DUCKGRES_TRACE_ENDPOINT / OTEL_EXPORTER_OTLP_TRACES_PATH); no-op when unset.
  • Volume: one DuckLake query issues many ranged GETs → many root traces. Init is AlwaysSample; add a sampler env knob if Tempo ingest becomes a concern.
  • Prod env wiring is out of this repo — no manifest here deploys the proxy. The DaemonSet (mw infra repo) must set the trace endpoint for spans to flow.

Testing

  • cmd/cache-proxy/tracing_test.go — in-memory span recorder asserts miss→hit, child-span nesting under cache.get, and cache.forward status.
  • The cache proxy is not deployed in tests/e2e-mw-dev (DUCKGRES_CACHE_ENABLED is off there), so this behavior is gated by the unit test rather than an e2e harness assertion — documented in the README.

🤖 Generated with Claude Code

Add OpenTelemetry tracing to the cache-proxy so its S3 fetches show up in
Tempo. Spans are emitted as standalone root traces under
service.name=duckgres-cache-proxy:

  - cache.get (cacheable GET) with cache.origin_fetch / cache.peer_fetch children
  - cache.connect (CONNECT tunnels)
  - cache.forward (non-cached methods)

DuckDB httpfs sends no traceparent, so these are deliberately NOT stitched
into the duckgres query trace — true single-trace stitching would need either
per-request header injection through the C++ httpfs path (session-global only,
not per-query) or an out-of-band worker->proxy side-channel keyed by source IP.
Instead spans carry the cross-reference anchors for manual correlation:
client.address (worker pod IP), the S3 object (server.address + url.path +
range), timestamp, and cache.source (hit/peer/miss).

Tracing init is replicated locally rather than imported from internal/cliboot,
which transitively pulls in the DuckDB CGO runtime (via posthog/duckgres/server)
that this standalone binary must not link. Same env vars as the main binary
(OTEL_EXPORTER_OTLP_TRACES_ENDPOINT / DUCKGRES_TRACE_ENDPOINT / _PATH); no-op
when unset.

The cache proxy is not deployed in tests/e2e-mw-dev (DUCKGRES_CACHE_ENABLED is
off there), so behavior is gated by cmd/cache-proxy/tracing_test.go rather than
an e2e harness assertion.
@benben benben merged commit 66399f4 into main Jun 17, 2026
25 checks passed
@benben benben deleted the ben/cache-proxy-tracing branch June 17, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant