Skip to content

Channels: HTTP proxy with observability for Lightning#4419

Draft
stuartc wants to merge 9 commits intomainfrom
channels
Draft

Channels: HTTP proxy with observability for Lightning#4419
stuartc wants to merge 9 commits intomainfrom
channels

Conversation

@stuartc
Copy link
Member

@stuartc stuartc commented Feb 11, 2026

Summary

Channels adds lightweight reverse-proxy functionality with observability to Lightning. Deploy a Channel between two systems and get instant visibility into requests and responses, with full audit trail.

  • Source: Authenticates inbound requests (API key / Basic Auth via project credentials)
  • Sink: Forwards to target system with credential injection
  • Observability: Every proxied request logged with headers, body preview, SHA256 hash, timing
  • Snapshots: Channel config captured at request time for auditable history

Streaming proxy powered by Weir — constant memory regardless of payload size.

Spec: #4322 | Go-live target: 26 Feb 2026

Stories

Phase 1 — Foundation

Phase 2 — Core Proxy

Phase 3 — Observability

Phase 4 — UI

Phase 5 — Performance Confidence

Dependency Graph

#4399 Schema ──┬──→ #4401 Proxy ──→ #4403 Source auth
               │       │           #4404 Sink auth (parallel)
               │       │
               ├───────┤──→ #4405 Observer ──→ #4406 Snapshots
               │       │
               ├──→ #4400 Audit trail
               ├──→ #4407 Channel UI
               └───────┴──→ #4408 History page

#4409 Mock sink ──┐
                  ├──→ #4410 K6 load tests
#4401 Proxy ──────┘

@github-project-automation github-project-automation bot moved this to New Issues in v2 Feb 11, 2026
* Add channels tables migration (#4399)

Create the four core database tables for the channels feature:
channels, channel_snapshots, channel_requests, and channel_events
with all indexes, foreign keys, and constraints per the data model spec.

* Add channel schema modules (#4399)

Create four Ecto schemas under lib/lightning/channels/:
- Channel: core config with optimistic locking
- ChannelSnapshot: immutable point-in-time copies
- ChannelRequest: proxy request tracking with Ecto.Enum state
- ChannelEvent: detailed request/response event log

* Add Channels context and Project association (#4399)

Create Lightning.Channels context with CRUD operations:
list_channels_for_project/1, get_channel!/1, create_channel/1,
update_channel/2, delete_channel/1 (with :has_history guard).
Wire up has_many :channels on Project schema.

* Add channel factories and context tests (#4399)

Add ExMachina factories for all 4 channel schemas. Create context test
covering CRUD operations, uniqueness constraints, optimistic locking,
and deletion protection for channels with history. Fix lock_version
default to 0 (matching Workflow pattern) and use stale_error_field
for clean changeset errors on version conflicts.

* Add @moduledoc to channel schema modules (#4399)

Add module documentation to all four channel schemas to satisfy
credo --strict readability checks.

* Fix lock_version default and delete_channel error handling (#4399)

- Align migration lock_version default to 0, matching Workflow pattern
- Replace rescue ConstraintError with foreign_key_constraint changeset
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 90.58824% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.31%. Comparing base (c7e5016) to head (d664e62).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
lib/lightning/channels/handler.ex 85.00% 9 Missing ⚠️
lib/lightning/channels.ex 80.00% 5 Missing ⚠️
lib/lightning_web/plugs/channel_proxy_plug.ex 95.91% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4419      +/-   ##
==========================================
+ Coverage   89.26%   89.31%   +0.05%     
==========================================
  Files         425      434       +9     
  Lines       20090    20243     +153     
==========================================
+ Hits        17933    18081     +148     
- Misses       2157     2162       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

* Add channel proxy plug with endpoint routing (#4401)

Wire Weir as a path dependency and create ChannelProxyPlug that
intercepts /channels/:id/*path requests before Plug.Parsers,
looks up the channel, sets proxy headers, and forwards to the
sink via Weir.proxy/2 with streaming.

* Add sink_url validation, simplify proxy plug, add path traversal tests (#4401)

- Validate sink_url with Validators.validate_url/2 in Channel changeset
  to reject non-http(s) URLs at creation time rather than at proxy time
- Remove manual x-request-id handling from ChannelProxyPlug since Weir
  now injects it into upstream requests automatically
- Add path traversal tests documenting that ".." as channel_id fails
  UUID validation (404) and ".." in subpaths is forwarded as-is

* Use GitHub source for weir dep, with local path override via WEIR_PATH env var
* Add mock sink server for channel proxy testing (#4409)

Standalone Elixir script using Bandit + Plug that simulates configurable
sink behaviour. Supports fixed, delay, timeout, auth, and mixed modes
with CLI-configurable status codes, delays, and body sizes.

* Add load test runner for channel proxy benchmarking (#4409)

Standalone Elixir script that drives HTTP traffic through the channel
proxy to a mock sink. Connects to Lightning via distributed Erlang for
channel setup and memory sampling. Supports happy_path, ramp_up,
large_payload, large_response, mixed_methods, and slow_sink scenarios.
Reports latency percentiles, throughput, error rates, and BEAM memory.

* Add README for channel proxy benchmarking tools (#4409)

Documents mock sink modes, load test scenarios, CLI options, quick start
guide, and how to interpret results (especially memory delta for verifying
streaming proxy behaviour).

* Add run_all.sh runner and move mock sink delay to query params (#4409)

Replace the mock sink's --delay CLI arg and delay mode with a ?delay=N
query parameter, matching the existing ?response_size=N pattern. Also add
?status=N for per-request status overrides. This lets the load test drive
all scenarios against a single mock sink instance without restarts.

Add --delay option to the load test script (defaults to 2000ms for
slow_sink scenario) which appends ?delay=N to the request URL.

Add run_all.sh which runs all 7 scenarios in sequence with preflight
checks, timestamped log/CSV output, and bail-on-first-failure.
…#4410)

Wrap ChannelProxyPlug in three telemetry spans (request, fetch_channel,
upstream) to identify where proxy latency originates. Split the 1044-line
load_test.exs monolith into 7 focused modules under lib/load_test/. Add
Bench.TelemetryCollector (GenServer+ETS) that gets deployed onto the
Lightning BEAM via RPC to collect server-side timing during load tests.
* Add Channels.Handler module and ttfb_ms to ChannelEvent (#4405)

Implements Weir.Handler behaviour for persisting channel proxy requests:
- handle_request_started: sync ChannelRequest creation with header redaction
- handle_response_started: captures TTFB and response headers
- handle_response_finished: async persistence via Task.Supervisor

Adds ttfb_ms column to channel_events for time-to-first-byte tracking.

* Add get_or_create_current_snapshot to Channels context (#4405)

Upserts a ChannelSnapshot for the channel's current lock_version,
handling concurrent creation races via ON CONFLICT DO NOTHING + re-fetch.

* Wire Channels.Handler into proxy plug with Task.Supervisor (#4405)

Connect the handler to ChannelProxyPlug so every proxied request creates
a ChannelRequest (sync) and ChannelEvent (async). Add snapshot
get-or-create before proxying, pass handler state with channel/snapshot
context, and add Task.Supervisor for async persistence.

* Add handler, snapshot, and proxy plug tests (#4405)

- Handler unit tests: request creation, rejection, header redaction,
  TTFB capture, async event persistence, state transitions
- Snapshot context tests: create, idempotent return, new on version bump
- Proxy plug integration tests: full flow with DB verification
- Fix encode_headers to convert tuples to lists for Jason encoding

* Replace Process.sleep with PubSub-based async coordination in tests (#4405)

Broadcast {:channel_request_completed, request_id} from persist_completion/2
so tests can assert_receive instead of sleeping. Eliminates race conditions
in CI and lays groundwork for #4408 real-time history.

* Squash ttfb_ms column into main channels migration (#4405)

Since neither migration has shipped, merge the separate
add_ttfb_ms_to_channel_events migration into create_channels_tables
to keep migration history clean.

* Classify proxy errors and document skipped callback contract (#4405)

Replace raw inspect() of Weir error structs with classify_error/1 that
maps known transport errors (nxdomain, econnrefused, etc.) and timeout
tuples to stable string identifiers for persistence. Expand the moduledoc
to document when handle_response_started is skipped and which handler
state fields will be absent.

* Fix request_path, tighten factories, and review cleanups (#4405)

- Use forward_path instead of conn.path_info for request_path so
  persisted path reflects the upstream path, not the internal route
- Remove default associations from channel factories so callers must
  provide their own channel/snapshot, preventing cross-entity mismatches
- Simplify delete_channel test to match the actual constraint tested
- Add TODO on broadcast noting it fires even on partial persistence failure

* Update weir

* Report which lint checks failed in CI instead of generic message

* Fix Credo alias style in Channels.Handler
Adds the saturation scenario (ramp through concurrency levels to find
throughput ceiling) and an independent --charts flag that generates
gnuplot PNG charts with timestamped output paths.

Standard scenarios produce a combined throughput + latency chart (dual
y-axis: RPS as filled area, p50/p95/p99 latency as line plots over
1-second buckets). Saturation generates throughput and latency vs
concurrency line charts from CSV data.

--charts auto-creates a timestamped CSV for saturation when --csv is
not specified. The two flags are fully independent.
* Add channel_auth_methods join table and schema (#4403)

Create the channel_auth_methods table with role discriminator and
polymorphic FKs (webhook_auth_method_id, project_credential_id).
Remove source_project_credential_id from channels table.
Add ChannelAuthMethod schema with exclusive FK and role-target
consistency validations. Update Channel schema with filtered
has_many associations for source and sink auth methods.

* Extract shared LightningWeb.Auth module from WebhookAuth plug (#4403)

Move pure auth validation functions (valid_key?, valid_user?,
has_credentials?) into a shared module so both WebhookAuth (triggers)
and ChannelProxyPlug (channels) can reuse them without duplicating
security-sensitive code.

* Add get_channel_with_source_auth/1 and update proxy plug to preload source auth (#4403)

* Add source authentication validation to ChannelProxyPlug (#4403)

Insert authenticate_source/2 between channel fetch and upstream proxying.
Channels with no source auth methods remain publicly accessible (fail-open).
Valid credentials pass through, missing credentials return 401, and wrong
credentials return 404 to hide channel existence.

* Add tests and factory for channel source authentication (#4403)

- Add channel_auth_method factory to ExMachina factories
- Add get_channel_with_source_auth/1 context tests
- Add ChannelAuthMethod changeset validation tests (exclusive FKs,
  role-target consistency, unique constraints)
- Add LightningWeb.Auth unit tests (API key, Basic Auth, credential
  detection)
- Add source auth proxy plug integration tests (auth enforcement,
  401/404 responses, multiple auth methods, mixed types)
- Add unique_constraint declarations to ChannelAuthMethod changeset

* Return JSON error responses from ChannelProxyPlug (#4403)

Replace plain-text send_resp errors with structured JSON responses
(e.g. {"error": "Not Found"}) to match WebhookAuth and other API
error paths in the codebase.

* Consolidate channel_auth_methods into original migration and remove source_project_credential_id (#4403)

Merge the separate channel_auth_methods migration into the base
create_channels_tables migration and remove the now-unused
source_project_credential_id column from channels and channel_snapshots.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

1 participant

Comments