Skip to content

feat: add log aggregation and request correlation for Playwright tests#298

Merged
mxschmitt merged 6 commits into
mainfrom
log-improvements
May 29, 2026
Merged

feat: add log aggregation and request correlation for Playwright tests#298
mxschmitt merged 6 commits into
mainfrom
log-improvements

Conversation

@mxschmitt

@mxschmitt mxschmitt commented Jan 15, 2026

Copy link
Copy Markdown
Owner

Summary

  • Implement custom in-memory log-aggregator microservice for collecting and correlating logs across services
  • Add unique request ID and test ID propagation through headers (X-Request-ID, X-Test-ID)
  • Convert all services to structured JSON logging using logrus
  • Automatically attach correlated backend logs to Playwright test reports
  • Enable end-to-end request tracing from frontend through control service, workers, and file service

Technical Details

  • New service: log-aggregator - lightweight Go HTTP service with 60min TTL in-memory storage
  • New package: internal/logagg - shared library for posting logs to aggregator (best-effort, 2s timeout)
  • ID propagation: Request IDs and test IDs flow through all services via HTTP headers and AMQP messages
  • Log correlation: Playwright tests fetch aggregated logs via /logs/{testId} endpoint and attach to test reports

Changes by Service

  • control-service: Request-scoped logging, log buffering, ID generation/propagation
  • worker: JSON logging, ID propagation to execution environment, log collection
  • file-service: Request correlation for file uploads
  • e2e tests: Automatic log fetching and attachment after test execution
  • CI: Configure LOG_AGGREGATOR_URL for GitHub Actions environment

Infrastructure

  • K8s deployment + NodePort service for log-aggregator (port 30092)
  • Environment variables: LOG_AGGREGATOR_URL, LOG_AGGREGATOR_ENABLED
  • All Dockerfiles updated for consistent builder stage naming (AS builder)

🤖 Generated with Claude Code

@mxschmitt mxschmitt force-pushed the log-improvements branch 5 times, most recently from e702c16 to 31aba4e Compare January 15, 2026 17:38
@mxschmitt mxschmitt requested a review from Copilot January 16, 2026 18:40

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive logging and correlation system for Playwright tests, enabling backend logs to be aggregated and correlated with test reports through unique identifiers.

Changes:

  • Implemented a new log-aggregator service for centralized log collection and querying
  • Added requestID and testID fields throughout the system for log correlation
  • Enhanced all services (control, file, worker) to capture and forward logs with correlation IDs
  • Updated e2e tests to automatically attach aggregated logs to test reports

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
worker-python/Dockerfile Standardized Dockerfile AS keyword capitalization
worker-javascript/Dockerfile Standardized Dockerfile AS keyword capitalization
worker-java/Dockerfile Standardized Dockerfile AS keyword capitalization
worker-csharp/Dockerfile Standardized Dockerfile AS keyword capitalization
log-aggregator/main.go New in-memory log aggregation service with HTTP API
log-aggregator/Dockerfile Dockerfile for building the log-aggregator service
k8/log-aggregator-service.yaml Kubernetes service configuration for log-aggregator
k8/log-aggregator-deployment.yaml.tpl Kubernetes deployment template for log-aggregator
k8/file-deployment.yaml.tpl Added LOG_AGGREGATOR environment variables
k8/control-deployment.yaml.tpl Added LOG_AGGREGATOR environment variables
internal/workertypes/types.go Added RequestID and TestID fields to worker request/response payloads
internal/worker/worker.go Integrated structured logging with log aggregation
internal/logagg/logagg.go Core log aggregation client library
internal/logagg/hook.go Logrus hook for automatic log forwarding
file-service/main.go Added request-scoped logging with correlation IDs
file-service/Dockerfile Updated to copy full internal directory
e2e/tests/visual.spec.ts Added log attachment functionality
e2e/tests/api.spec.ts Added log attachment with correlation IDs
control-service/workers.go Updated worker to pass correlation IDs
control-service/main.go Enhanced with request-scoped logging and correlation
control-service/Dockerfile Standardized Dockerfile AS keyword capitalization
.github/workflows/nodejs.yml Configured log-aggregator in CI pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread control-service/main.go
@mxschmitt mxschmitt changed the title feat: implement logging and correlation for Playwright tests feat: add log aggregation and request correlation for Playwright tests Jan 16, 2026
mxschmitt and others added 2 commits May 29, 2026 13:31
Critical fixes:
- Replace deprecated io/ioutil with io.ReadAll in control-service
- Fix race condition in worker logger by using instance logger instead of modifying global logger
- Add error handling for http.ResponseWriter.Write() in log-aggregator

Medium priority improvements:
- Create custom HTTP client with proper timeouts in logagg package to ensure context cancellation is respected
- Configure connection pooling and timeout settings for better reliability

This ensures:
1. Go 1.16+ compatibility (no deprecated APIs)
2. Thread-safe logging in worker processes
3. Proper error reporting when writing log responses fails
4. HTTP requests properly respect context deadlines

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
mxschmitt added 4 commits May 29, 2026 14:00
defer logagg.DeferPost(...) deferred the call to DeferPost itself and
discarded the returned closure, so the end-of-request buffer flush never
ran. Logs already reach the aggregator via the per-entry logrus hook, so
the deferred buffer flush was redundant (and would double-send if fixed
with a trailing call). Remove DeferPost and the now write-only logBuffer
in file-service and worker; control-service keeps its buffer for error
response bodies.
alpine:3.23 now ships squid-7.5-r0; the exact pin squid=7.3-r0 no longer
resolves (apk: breaks world[squid=7.3-r0]), failing the build (squid) CI
job and blocking the e2e test job. Verified the image builds with 7.5-r0.
The video-recording example crawled the live news.ycombinator.com,
clicking specific links in a loop. When HN was slow or rate-limited the
CI runner, a .click() hit the 30s timeout, the snippet threw, and no
video was produced - failing 'Examples > 4: should be able to record a
video' (visual.spec.ts:86, ~50% flip rate on main). The failing link
varied run-to-run, confirming an external-dependency flake rather than a
selector bug.

Navigate the stable playwright.dev docs instead (unique exact-match
links to avoid strict-mode violations). Verified locally: snippet runs
3/3 producing ~100KB .webm files, frontend builds, and the new snippet
is bundled via the ?raw import. Also fix the stale example description.
handleRun set the X-Request-ID/X-Test-ID response headers twice: once
unconditionally near the top and again right before the success
response. Keep the early set (it covers all response paths) and remove
the redundant pre-success duplicate.
@mxschmitt mxschmitt merged commit 4ff7ff1 into main May 29, 2026
10 checks passed
@mxschmitt mxschmitt deleted the log-improvements branch May 29, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants