feat(observability): enrich core proxy logs with transaction context by arnaugiralt · Pull Request #68 · cloudblue/chaperone

arnaugiralt · 2026-03-12T22:41:21Z

Summary

Improves structured logging across the core proxy to make every log line actionable without needing to cross-reference other sources.

Richer access log — RequestLoggerMiddleware now emits marketplace_id, product_id, and target_host (host only, path and query stripped) alongside the existing vendor_id. Header name strings are pre-computed once at construction rather than concatenated per request.
Consistent operational logs — All warn/error/info log lines in the proxy that carry vendor_id and marketplace_id now also carry product_id, enabling consistent per-product filtering across the full log pipeline.
DEBUG logpoints — New structured DEBUG logs for credential injection (fast path and slow path, with credential_path, injected_header_count, plugin_duration_ms) and transaction context parsing.
Log-safe URL handling — All log output records only the host portion of vendor URLs. sanitizeURL (which kept the path) was replaced by extractTargetHost throughout. The allow-list rejection log gains target_host which was previously absent despite a comment claiming otherwise.
499 on client disconnect — context.Canceled from the plugin now writes status 499 (StatusClientClosedRequest) so the access log captures the correct status rather than the default 200.
Startup visibility — The configured allow-list (hosts and route count) is logged at startup.

Before / After log comparison

Successful request

Before (3 lines, from plan):

{"level":"WARN","msg":"forwarding to insecure HTTP target - DEVELOPMENT ONLY","trace_id":"...","target_host":"localhost:9099"}
{"level":"INFO","msg":"upstream response","trace_id":"...","status":200,"content_length":246}
{"level":"INFO","msg":"request completed","trace_id":"...","method":"GET","path":"/proxy","status":200,"latency_ms":9,"vendor_id":"VA-microsoft","client_ip":"","remote_addr":"[::1]:63723"}

After (6 lines at DEBUG):

{"level":"DEBUG","msg":"allow list validation passed","trace_id":"aa3aa905...","target_host":"localhost:9099"}
{"level":"DEBUG","msg":"transaction context parsed","trace_id":"aa3aa905...","vendor_id":"VA-microsoft","marketplace_id":"MP-US","product_id":"","target_host":"localhost:9099"}
{"level":"WARN","msg":"forwarding to insecure HTTP target - DEVELOPMENT ONLY","trace_id":"aa3aa905...","target_host":"localhost:9099"}
{"level":"DEBUG","msg":"credentials injected","trace_id":"aa3aa905...","vendor_id":"VA-microsoft","marketplace_id":"MP-US","target_host":"localhost:9099","credential_path":"fast","injected_header_count":1,"plugin_duration_ms":2}
{"level":"INFO","msg":"upstream response","trace_id":"aa3aa905...","status":200,"content_length":246,"vendor_id":"VA-microsoft","marketplace_id":"MP-US","product_id":"","target_host":"localhost:9099"}
{"level":"INFO","msg":"request completed","trace_id":"aa3aa905...","method":"GET","path":"/proxy","status":200,"latency_ms":4,"vendor_id":"VA-microsoft","marketplace_id":"MP-US","product_id":"","target_host":"localhost:9099","client_ip":"","remote_addr":"[::1]:58186"}

Failing request (unknown marketplace)

Before (3 lines, from plan):

{"level":"WARN","msg":"forwarding to insecure HTTP target - DEVELOPMENT ONLY","trace_id":"...","target_host":"localhost:9099"}
{"level":"ERROR","msg":"plugin error","trace_id":"...","error":"resolving TenantID: no mapping rule matched"}
{"level":"INFO","msg":"request completed","trace_id":"...","method":"GET","path":"/proxy","status":500,"latency_ms":0,"vendor_id":"VA-microsoft","client_ip":"","remote_addr":"[::1]:63740"}

After (5 lines at DEBUG):

{"level":"DEBUG","msg":"allow list validation passed","trace_id":"7397e8ba...","target_host":"localhost:9099"}
{"level":"DEBUG","msg":"transaction context parsed","trace_id":"7397e8ba...","vendor_id":"VA-microsoft","marketplace_id":"MP-APAC","product_id":"","target_host":"localhost:9099"}
{"level":"WARN","msg":"forwarding to insecure HTTP target - DEVELOPMENT ONLY","trace_id":"7397e8ba...","target_host":"localhost:9099"}
{"level":"ERROR","msg":"plugin error","trace_id":"7397e8ba...","vendor_id":"VA-microsoft","marketplace_id":"MP-APAC","product_id":"","target_host":"localhost:9099","error":"resolving TenantID: no mapping rule matched"}
{"level":"INFO","msg":"request completed","trace_id":"7397e8ba...","method":"GET","path":"/proxy","status":500,"latency_ms":0,"vendor_id":"VA-microsoft","marketplace_id":"MP-APAC","product_id":"","target_host":"localhost:9099","client_ip":"","remote_addr":"[::1]:58189"}

Every line is now traceable by trace_id, and the error log immediately tells you it was MP-APAC that had no mapping — no need to correlate with the request completed line. At INFO level only the last 2 lines appear, matching the plan's expected output exactly.

Benchmark Comparison: master → feat/improve-core-logging

Apple M1 Pro · Go · n=6 runs · benchstat p<0.05
Only showing benchmarks with meaningful signal — everything else was ~ (no significant change).

`internal/proxy` — the only package with real changes

Benchmark	master ns/op	branch ns/op	Δ time	Δ B/op	Δ allocs
FullRequestCycle_FastPath	71.12 µs	73.01 µs	+2.67%	+1.80%	+9.91%
FullRequestCycle_SlowPath	68.98 µs	71.05 µs	+3.00%	+1.74%	+9.14%
FullRequestCycle_NoPlugin	67.34 µs	69.57 µs	+3.31%	+1.68%	+8.33%
RequestLoggingMiddleware	1.334 µs	1.908 µs	+43%	+72%	+36%
MiddlewareStack	1.796 µs	2.402 µs	+34%	+36%	+22%
MiddlewareStack_Parallel	3.206 µs	3.543 µs	+10.5%	+5.04%	+11.4%
PanicRecoveryMiddleware	110.1 ns	111.6 ns	+1.32%	~	~

All other packages — no significant change

Package	Result
`internal/cache`	~ (all ns/op within noise)
`internal/context`	~ (small apparent improvement, no allocation change)
`internal/observability`	~
`internal/router`	~
`internal/security`	~

Interpretation

The FullRequestCycle numbers (+2.7–3.3%) reflect the new DEBUG logpoints (transaction context parsed, credentials injected) being evaluated per request — even though slog.Debug no-ops when debug level is disabled, the attribute construction still happens. The allocation bump (+8–10%) is the same cause.

The RequestLoggingMiddleware +43% and MiddlewareStack +34% are real and expected: the access log now emits 4 additional slog attributes per request (marketplace_id, product_id, target_host, plus the URL parse). These are the meaningful costs of this branch — ~570 ns and ~329 bytes added to the middleware stack per request, in exchange for a fully contextualized access log.

Raw benchmark files:

master: bench-master.txt
feat/improve-core-logging: bench-branch.txt

- Extract applyFastPathCredentials helper (moves fast path block out of injectCredentials, keeping the context enrichment chain intact) - Modify detectSlowPathInjections to return injected header count alongside the modified request (count will be used for DEBUG logging in next commit) - Add sanitizeURL helper that strips query, fragment, and userinfo to prevent token leakage in DEBUG log output - Add extractHost helper to observability package for parsing target host from raw URL strings in request logger No behavior change. Prep for upcoming log enrichment phases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add per-request DEBUG logpoints in injectCredentials so Distributors can trace exactly which caching path was taken and how long the plugin ran. - Change injectCredentials signature to accept targetHost string (reuses the already-parsed *url.URL.Host from handleProxy, avoids re-parsing) - Log "credentials injected" at DEBUG after Fast Path with: trace_id, vendor_id, marketplace_id, target_host, credential_path="fast", injected_header_count, plugin_duration_ms - Log "credentials injected" at DEBUG after Slow Path detection with credential_path="slow" and detected injected_header_count - Add integration tests capturing DEBUG output to verify both paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When the plugin returns context.Canceled (client disconnected before the plugin could fetch credentials), the proxy previously returned without writing a status code. This caused RequestLoggerMiddleware to log the request as status 200 (the ResponseCapturer default), which is misleading. Now writes w.WriteHeader(499) — the nginx convention for client-closed connections — so the logged status accurately reflects what happened. No body is written since the client is already gone. Export StatusClientClosedRequest constant so tests can reference it symbolically rather than hardcoding 499. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add vendor_id, marketplace_id, and target_host to all INFO/WARN/ERROR logs that were previously missing these fields, making it possible to correlate log events with specific vendors and targets without cross-referencing the "request completed" line. - "upstream response": +vendor_id, +marketplace_id, +target_host - "plugin error" / "plugin timeout" / "client disconnected": change handlePluginError signature to accept txCtx and targetHost, add all three fields to each log line - "proxy error": +vendor_id, +marketplace_id, +target_host - "plugin ModifyResponse error": +vendor_id, +marketplace_id, +target_host - "error normalization failed": +vendor_id, +marketplace_id, +target_host Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Change RequestLoggerMiddleware signature from vendorHeader string to headerPrefix string. The middleware now constructs header names internally using the stable suffix constants, enabling it to extract four context fields per request instead of one. New fields added to every "request completed" log line: - marketplace_id (from <prefix>-Marketplace-ID) - product_id (from <prefix>-Product-ID) - target_host (host-only, extracted from <prefix>-Target-URL) The target_host field re-parses the URL in the deferred log closure. This is the one accepted re-parse in the codebase: middleware ordering prevents reusing the URL parsed in handleProxy (~200ns, negligible vs network RTT). Update all call sites (server.go, server_test.go, middleware_bench_test.go, request_logger_test.go) to pass the prefix string instead of a header name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add "transaction context parsed" DEBUG log in handleProxy after successful ParseContext, including vendor_id, marketplace_id, product_id, and a sanitized target_url (query string, fragment, userinfo stripped to prevent token leakage) - Add "allow list validation passed" DEBUG log in AllowListMiddleware with trace_id and target_host - Add trace_id to existing "missing target URL header" and "allow list validation failed" WARN logs (previously uncorrelatable under concurrency) - Add "plugin opted out of error normalization" DEBUG log in buildModifyResponse (only fires when plugin returns ResponseAction{SkipErrorNormalization: true}) - Add extractHostFromURL local helper to router/middleware.go - Import observability into router to access TraceIDFromContext (no cycle: observability does not import router) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… allow-list logStartup previously logged timeout durations as nanosecond integers (e.g. 30000000000 instead of "30s"). All timeout fields now call .String() which produces human-readable values like "30s", "2m0s", "120ns". Also adds three timeout fields that were configured but not previously logged: ConnectTimeout, KeepAliveTimeout, ShutdownTimeout. New "allow list configured" INFO log at startup shows hosts (sorted for deterministic output), host_count, and route_count — giving operators immediate visibility into which targets the proxy will accept. New "no plugin configured" INFO log if plugin is nil at startup (one-time event, not per-request noise). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two issues flagged by CI lint run: 1. gocritic/unnamedResult: detectSlowPathInjections returns two values — add named return params (modified, injectedCount) for clarity 2. gocognit: buildModifyResponse closure complexity reached 16 (limit 15) after the Phase 6 opt-out DEBUG log added a branch. Extract applyErrorNormalization helper to bring complexity back within limit. No behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Pre-compute header name strings once at RequestLoggerMiddleware construction instead of per-request to avoid redundant allocations - Add product_id to all operational log lines that had vendor_id and marketplace_id for consistent per-product filtering - Replace sanitizeURL (kept path) with extractTargetHost (host only) in the transaction context parsed DEBUG log to prevent path segments from leaking sensitive data (e.g. /users/alice@example.com) - Add target_host to allow-list rejection log and remove misleading comment that described intent that wasn't implemented Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

arnaugiralt and others added 9 commits March 12, 2026 16:45

arnaugiralt requested a review from qarlosh March 12, 2026 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): enrich core proxy logs with transaction context#68

feat(observability): enrich core proxy logs with transaction context#68
arnaugiralt wants to merge 9 commits intomasterfrom
feat/improve-core-logging

arnaugiralt commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arnaugiralt commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before / After log comparison

Successful request

Failing request (unknown marketplace)

Benchmark Comparison: master → feat/improve-core-logging

internal/proxy — the only package with real changes

All other packages — no significant change

Interpretation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arnaugiralt commented Mar 12, 2026 •

edited

Loading

`internal/proxy` — the only package with real changes