Skip to content

feat(observability): enrich core proxy logs with transaction context#68

Draft
arnaugiralt wants to merge 9 commits intomasterfrom
feat/improve-core-logging
Draft

feat(observability): enrich core proxy logs with transaction context#68
arnaugiralt wants to merge 9 commits intomasterfrom
feat/improve-core-logging

Conversation

@arnaugiralt
Copy link
Member

@arnaugiralt arnaugiralt commented Mar 12, 2026

Summary

Improves structured logging across the core proxy to make every log line actionable without needing to cross-reference other sources.

  • Richer access logRequestLoggerMiddleware now emits marketplace_id, product_id, and target_host (host only, path and query stripped) alongside the existing vendor_id. Header name strings are pre-computed once at construction rather than concatenated per request.
  • Consistent operational logs — All warn/error/info log lines in the proxy that carry vendor_id and marketplace_id now also carry product_id, enabling consistent per-product filtering across the full log pipeline.
  • DEBUG logpoints — New structured DEBUG logs for credential injection (fast path and slow path, with credential_path, injected_header_count, plugin_duration_ms) and transaction context parsing.
  • Log-safe URL handling — All log output records only the host portion of vendor URLs. sanitizeURL (which kept the path) was replaced by extractTargetHost throughout. The allow-list rejection log gains target_host which was previously absent despite a comment claiming otherwise.
  • 499 on client disconnectcontext.Canceled from the plugin now writes status 499 (StatusClientClosedRequest) so the access log captures the correct status rather than the default 200.
  • Startup visibility — The configured allow-list (hosts and route count) is logged at startup.

Before / After log comparison

Successful request

Before (3 lines, from plan):

{"level":"WARN","msg":"forwarding to insecure HTTP target - DEVELOPMENT ONLY","trace_id":"...","target_host":"localhost:9099"}
{"level":"INFO","msg":"upstream response","trace_id":"...","status":200,"content_length":246}
{"level":"INFO","msg":"request completed","trace_id":"...","method":"GET","path":"/proxy","status":200,"latency_ms":9,"vendor_id":"VA-microsoft","client_ip":"","remote_addr":"[::1]:63723"}

After (6 lines at DEBUG):

{"level":"DEBUG","msg":"allow list validation passed","trace_id":"aa3aa905...","target_host":"localhost:9099"}
{"level":"DEBUG","msg":"transaction context parsed","trace_id":"aa3aa905...","vendor_id":"VA-microsoft","marketplace_id":"MP-US","product_id":"","target_host":"localhost:9099"}
{"level":"WARN","msg":"forwarding to insecure HTTP target - DEVELOPMENT ONLY","trace_id":"aa3aa905...","target_host":"localhost:9099"}
{"level":"DEBUG","msg":"credentials injected","trace_id":"aa3aa905...","vendor_id":"VA-microsoft","marketplace_id":"MP-US","target_host":"localhost:9099","credential_path":"fast","injected_header_count":1,"plugin_duration_ms":2}
{"level":"INFO","msg":"upstream response","trace_id":"aa3aa905...","status":200,"content_length":246,"vendor_id":"VA-microsoft","marketplace_id":"MP-US","product_id":"","target_host":"localhost:9099"}
{"level":"INFO","msg":"request completed","trace_id":"aa3aa905...","method":"GET","path":"/proxy","status":200,"latency_ms":4,"vendor_id":"VA-microsoft","marketplace_id":"MP-US","product_id":"","target_host":"localhost:9099","client_ip":"","remote_addr":"[::1]:58186"}

Failing request (unknown marketplace)

Before (3 lines, from plan):

{"level":"WARN","msg":"forwarding to insecure HTTP target - DEVELOPMENT ONLY","trace_id":"...","target_host":"localhost:9099"}
{"level":"ERROR","msg":"plugin error","trace_id":"...","error":"resolving TenantID: no mapping rule matched"}
{"level":"INFO","msg":"request completed","trace_id":"...","method":"GET","path":"/proxy","status":500,"latency_ms":0,"vendor_id":"VA-microsoft","client_ip":"","remote_addr":"[::1]:63740"}

After (5 lines at DEBUG):

{"level":"DEBUG","msg":"allow list validation passed","trace_id":"7397e8ba...","target_host":"localhost:9099"}
{"level":"DEBUG","msg":"transaction context parsed","trace_id":"7397e8ba...","vendor_id":"VA-microsoft","marketplace_id":"MP-APAC","product_id":"","target_host":"localhost:9099"}
{"level":"WARN","msg":"forwarding to insecure HTTP target - DEVELOPMENT ONLY","trace_id":"7397e8ba...","target_host":"localhost:9099"}
{"level":"ERROR","msg":"plugin error","trace_id":"7397e8ba...","vendor_id":"VA-microsoft","marketplace_id":"MP-APAC","product_id":"","target_host":"localhost:9099","error":"resolving TenantID: no mapping rule matched"}
{"level":"INFO","msg":"request completed","trace_id":"7397e8ba...","method":"GET","path":"/proxy","status":500,"latency_ms":0,"vendor_id":"VA-microsoft","marketplace_id":"MP-APAC","product_id":"","target_host":"localhost:9099","client_ip":"","remote_addr":"[::1]:58189"}

Every line is now traceable by trace_id, and the error log immediately tells you it was MP-APAC that had no mapping — no need to correlate with the request completed line. At INFO level only the last 2 lines appear, matching the plan's expected output exactly.


Benchmark Comparison: master → feat/improve-core-logging

Apple M1 Pro · Go · n=6 runs · benchstat p<0.05
Only showing benchmarks with meaningful signal — everything else was ~ (no significant change).

internal/proxy — the only package with real changes

Benchmark master ns/op branch ns/op Δ time Δ B/op Δ allocs
FullRequestCycle_FastPath 71.12 µs 73.01 µs +2.67% +1.80% +9.91%
FullRequestCycle_SlowPath 68.98 µs 71.05 µs +3.00% +1.74% +9.14%
FullRequestCycle_NoPlugin 67.34 µs 69.57 µs +3.31% +1.68% +8.33%
RequestLoggingMiddleware 1.334 µs 1.908 µs +43% +72% +36%
MiddlewareStack 1.796 µs 2.402 µs +34% +36% +22%
MiddlewareStack_Parallel 3.206 µs 3.543 µs +10.5% +5.04% +11.4%
PanicRecoveryMiddleware 110.1 ns 111.6 ns +1.32% ~ ~

All other packages — no significant change

Package Result
internal/cache ~ (all ns/op within noise)
internal/context ~ (small apparent improvement, no allocation change)
internal/observability ~
internal/router ~
internal/security ~

Interpretation

The FullRequestCycle numbers (+2.7–3.3%) reflect the new DEBUG logpoints (transaction context parsed, credentials injected) being evaluated per request — even though slog.Debug no-ops when debug level is disabled, the attribute construction still happens. The allocation bump (+8–10%) is the same cause.

The RequestLoggingMiddleware +43% and MiddlewareStack +34% are real and expected: the access log now emits 4 additional slog attributes per request (marketplace_id, product_id, target_host, plus the URL parse). These are the meaningful costs of this branch — ~570 ns and ~329 bytes added to the middleware stack per request, in exchange for a fully contextualized access log.

Raw benchmark files:

arnaugiralt and others added 9 commits March 12, 2026 16:45
- Extract applyFastPathCredentials helper (moves fast path block out of
  injectCredentials, keeping the context enrichment chain intact)
- Modify detectSlowPathInjections to return injected header count alongside
  the modified request (count will be used for DEBUG logging in next commit)
- Add sanitizeURL helper that strips query, fragment, and userinfo to
  prevent token leakage in DEBUG log output
- Add extractHost helper to observability package for parsing target host
  from raw URL strings in request logger

No behavior change. Prep for upcoming log enrichment phases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add per-request DEBUG logpoints in injectCredentials so Distributors can
trace exactly which caching path was taken and how long the plugin ran.

- Change injectCredentials signature to accept targetHost string (reuses
  the already-parsed *url.URL.Host from handleProxy, avoids re-parsing)
- Log "credentials injected" at DEBUG after Fast Path with:
  trace_id, vendor_id, marketplace_id, target_host, credential_path="fast",
  injected_header_count, plugin_duration_ms
- Log "credentials injected" at DEBUG after Slow Path detection with
  credential_path="slow" and detected injected_header_count
- Add integration tests capturing DEBUG output to verify both paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the plugin returns context.Canceled (client disconnected before the
plugin could fetch credentials), the proxy previously returned without
writing a status code. This caused RequestLoggerMiddleware to log the
request as status 200 (the ResponseCapturer default), which is misleading.

Now writes w.WriteHeader(499) — the nginx convention for client-closed
connections — so the logged status accurately reflects what happened.
No body is written since the client is already gone.

Export StatusClientClosedRequest constant so tests can reference it
symbolically rather than hardcoding 499.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add vendor_id, marketplace_id, and target_host to all INFO/WARN/ERROR logs
that were previously missing these fields, making it possible to correlate
log events with specific vendors and targets without cross-referencing the
"request completed" line.

- "upstream response": +vendor_id, +marketplace_id, +target_host
- "plugin error" / "plugin timeout" / "client disconnected": change
  handlePluginError signature to accept txCtx and targetHost, add all
  three fields to each log line
- "proxy error": +vendor_id, +marketplace_id, +target_host
- "plugin ModifyResponse error": +vendor_id, +marketplace_id, +target_host
- "error normalization failed": +vendor_id, +marketplace_id, +target_host

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change RequestLoggerMiddleware signature from vendorHeader string to
headerPrefix string. The middleware now constructs header names internally
using the stable suffix constants, enabling it to extract four context
fields per request instead of one.

New fields added to every "request completed" log line:
  - marketplace_id (from <prefix>-Marketplace-ID)
  - product_id     (from <prefix>-Product-ID)
  - target_host    (host-only, extracted from <prefix>-Target-URL)

The target_host field re-parses the URL in the deferred log closure. This
is the one accepted re-parse in the codebase: middleware ordering prevents
reusing the URL parsed in handleProxy (~200ns, negligible vs network RTT).

Update all call sites (server.go, server_test.go, middleware_bench_test.go,
request_logger_test.go) to pass the prefix string instead of a header name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add "transaction context parsed" DEBUG log in handleProxy after successful
  ParseContext, including vendor_id, marketplace_id, product_id, and a
  sanitized target_url (query string, fragment, userinfo stripped to prevent
  token leakage)
- Add "allow list validation passed" DEBUG log in AllowListMiddleware with
  trace_id and target_host
- Add trace_id to existing "missing target URL header" and "allow list
  validation failed" WARN logs (previously uncorrelatable under concurrency)
- Add "plugin opted out of error normalization" DEBUG log in buildModifyResponse
  (only fires when plugin returns ResponseAction{SkipErrorNormalization: true})
- Add extractHostFromURL local helper to router/middleware.go
- Import observability into router to access TraceIDFromContext (no cycle:
  observability does not import router)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… allow-list

logStartup previously logged timeout durations as nanosecond integers
(e.g. 30000000000 instead of "30s"). All timeout fields now call .String()
which produces human-readable values like "30s", "2m0s", "120ns".

Also adds three timeout fields that were configured but not previously logged:
ConnectTimeout, KeepAliveTimeout, ShutdownTimeout.

New "allow list configured" INFO log at startup shows hosts (sorted for
deterministic output), host_count, and route_count — giving operators
immediate visibility into which targets the proxy will accept.

New "no plugin configured" INFO log if plugin is nil at startup (one-time
event, not per-request noise).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues flagged by CI lint run:

1. gocritic/unnamedResult: detectSlowPathInjections returns two values —
   add named return params (modified, injectedCount) for clarity

2. gocognit: buildModifyResponse closure complexity reached 16 (limit 15)
   after the Phase 6 opt-out DEBUG log added a branch. Extract
   applyErrorNormalization helper to bring complexity back within limit.

No behavior change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pre-compute header name strings once at RequestLoggerMiddleware
  construction instead of per-request to avoid redundant allocations
- Add product_id to all operational log lines that had vendor_id and
  marketplace_id for consistent per-product filtering
- Replace sanitizeURL (kept path) with extractTargetHost (host only) in
  the transaction context parsed DEBUG log to prevent path segments from
  leaking sensitive data (e.g. /users/alice@example.com)
- Add target_host to allow-list rejection log and remove misleading
  comment that described intent that wasn't implemented

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@arnaugiralt arnaugiralt requested a review from qarlosh March 12, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant