From 7e1293d87bfd9bb6d91c4169eeb9a23d750c2858 Mon Sep 17 00:00:00 2001 From: Juhana Ilmoniemi Date: Mon, 11 May 2026 10:28:22 +0300 Subject: [PATCH 1/3] spec: client-IP extraction helper for HTTP requests (#51) --- .../specs/architecture/51-client-ip-helper.md | 199 ++++++++++++++++++ 1 file changed, 199 insertions(+) create mode 100644 docs/specs/architecture/51-client-ip-helper.md diff --git a/docs/specs/architecture/51-client-ip-helper.md b/docs/specs/architecture/51-client-ip-helper.md new file mode 100644 index 0000000..f442172 --- /dev/null +++ b/docs/specs/architecture/51-client-ip-helper.md @@ -0,0 +1,199 @@ +# Spec: client-IP extraction helper (#51) + +## Files to read first + +- `internal/relay/server_endpoint.go:129-136` — existing unexported `remoteHost` helper. The new exported helper supersedes its shape but does NOT replace it in this ticket (no callers move; that is the wiring ticket's job). +- `internal/relay/healthz.go:1-57` — package convention for an exported `http`-touching primitive: short `package relay` doc comment, single function, no init state. +- `internal/relay/registry.go` / `internal/relay/registry_test.go` — confirms tests live in `package relay` (not `relay_test`) so they can reach unexported helpers and constants. Mirror this for the new test file. +- `internal/relay/server_endpoint_test.go:1-16` — established import style (`net/http`, `net/http/httptest`, standard-library-first ordering). The new test file follows the same shape. +- `docs/threat-model.md` § *DoS resistance — connection floods, slow-loris, fork-bomb retry* (l.32-40) — the "Future hardening: per-IP rate limit" sentence this helper feeds. +- `docs/PROJECT-MEMORY.md` § *Patterns established* — loud-failure pattern (relevant context for why the empty-string fallback is "deny by caller," not silent-passthrough). + +## Context + +The relay's threat model (§ DoS resistance) flags per-IP throttling as deferred future hardening; sibling ticket #50 (per-IP rate-limit type) needs a stable client-IP string per request. This ticket ships **only** the extraction primitive so its trust-model semantics — when to read `X-Forwarded-For` (XFF), when to ignore it — can be reviewed and tested in isolation, before any caller wires it in. + +The pre-existing `remoteHost` in `server_endpoint.go:129-136` is a **logging** helper: it strips the port from `r.RemoteAddr` and falls back to `r.RemoteAddr` verbatim on parse error. It is unexported, has no XFF handling, and (importantly) returns a non-empty string in all cases — appropriate for log lines, wrong for a rate-limit key. The new helper is a **security** primitive: it must distinguish "I have a usable source IP" from "I do not," and it must encode the trust decision at the call site (the caller opts in to trusting XFF). + +The two helpers diverge on contract, not on mechanics, which is why we add a new export rather than mutate `remoteHost`. `remoteHost` keeps its logging job; the wiring ticket (#50 or its sibling) will decide whether logging migrates to the new helper, and that is out of scope here. + +## Design + +### Single exported function + +`internal/relay/client_ip.go`: + +```go +package relay + +import ( + "net" + "net/http" + "strings" +) + +// ClientIP extracts the client's source IP from an incoming HTTP request. +// +// When trustForwardedFor is false, ClientIP returns the host portion of +// r.RemoteAddr with the port stripped. Use this in deployments where the +// relay is directly internet-facing: X-Forwarded-For is attacker-controlled +// and must be ignored. +// +// When trustForwardedFor is true, ClientIP returns the left-most entry of +// X-Forwarded-For (the original client per the de-facto convention) with +// surrounding whitespace stripped. If the header is absent or yields no +// non-empty entry, ClientIP falls back to the host portion of r.RemoteAddr. +// Use this only when the relay is fronted by a reverse proxy the operator +// trusts to set XFF correctly. +// +// ClientIP returns the empty string only when no usable source is available +// (RemoteAddr cannot be parsed and either XFF was not consulted or yielded +// nothing). Callers MUST treat the empty string as "deny": the rate-limit +// wiring ticket enforces this. ClientIP itself performs no policy. +// +// The returned string is the raw host portion as it appears on the wire — +// no canonicalisation (no IPv6 lower-casing, no zone-id stripping). Callers +// that need a canonical form must apply net.ParseIP themselves. +func ClientIP(r *http.Request, trustForwardedFor bool) string { + if trustForwardedFor { + if v := r.Header.Get("X-Forwarded-For"); v != "" { + // Left-most entry = original client per de-facto convention. + if comma := strings.IndexByte(v, ','); comma >= 0 { + v = v[:comma] + } + if first := strings.TrimSpace(v); first != "" { + return first + } + } + } + host, _, err := net.SplitHostPort(r.RemoteAddr) + if err != nil { + return "" + } + return host +} +``` + +Key shape decisions: + +- **One function, no struct, no constructor.** The helper is pure — no I/O, no goroutines, no state. The signature already encodes the only policy bit (trust). Wrapping it in a type would be ceremony. +- **`trustForwardedFor` is a bool, not a more elaborate "trusted-proxy-CIDR list."** The AC names a single bool; the threat model's near-term need is direct-vs-fronted (a binary), and a CIDR allowlist is an obvious follow-up the wiring ticket can take. Naming the parameter `trustForwardedFor` (not `trustProxies`) keeps the scope precise. +- **Left-most XFF entry, no header chain walk.** AC explicitly names left-most. The other camp (right-most-trusted) requires an allowlist of trusted hops the relay does not have. With the bool-only contract, left-most is the only well-defined choice; the wiring ticket can introduce a hop-aware variant if/when CIDR trust lists exist. +- **`r.Header.Get` not `r.Header.Values`.** XFF is canonically a single comma-separated header. `Header.Get` returns the first occurrence's full value; if a malicious client sends `X-Forwarded-For: a` plus a second `X-Forwarded-For: b` header, `Header.Get` returns `a`. Either way the helper takes the left-most token of the first header. This matches Go's `net/http.Request.Header` semantics; the test asserts the single-header case explicitly. The multi-header case is **out of scope** for this ticket and called out under § *Open questions*. +- **`strings.IndexByte` over `strings.SplitN`.** Avoids the slice allocation when the header has multiple entries — a non-zero benefit at rate-limit-decision frequency. Same effect. +- **`net.SplitHostPort` for the `RemoteAddr` parse.** Handles `host:port`, `[ipv6]:port`, and surfaces an error on malformed input — exactly the discriminator we need to return `""`. The existing `remoteHost` falls back to `r.RemoteAddr` on parse failure; we deliberately diverge (empty-string → "deny") because the consumer is a rate-limit gate, not a log line. +- **`RemoteAddr` is **always** consulted as the fallback even when `trustForwardedFor` is true.** This matters when the proxy strips XFF or the request bypasses the proxy somehow. The AC names this explicitly. + +### Data flow + +``` +*http.Request ──► ClientIP(r, trust=false) ──► SplitHostPort(r.RemoteAddr) + │ + ok ───────┴─────► host + err ──► "" (caller denies) + +*http.Request ──► ClientIP(r, trust=true) ──► r.Header.Get("X-Forwarded-For") + │ + non-empty ───┴─► IndexByte(',') ──► TrimSpace ──► first + │ │ + │ empty ──► fallback + empty ──────────────────────────► fallback + │ + ▼ + SplitHostPort(r.RemoteAddr) + │ + ok ──► host + err ─► "" +``` + +### Concurrency model + +None. The helper reads two fields of `*http.Request` (`RemoteAddr` and `Header`) and returns a string. Both fields are stable for the request lifetime; the net/http server does not mutate them concurrently with handler code. No locks, no goroutines, no shared state. + +### Error handling + +The helper has one error path: `net.SplitHostPort(r.RemoteAddr)` failed and no XFF entry was used. The chosen surface is **empty-string return**, not a `(string, error)` pair, because: + +- Every caller would handle the error identically (treat as deny). +- The AC names the empty-string fallback explicitly. +- A `(string, error)` pair would push a meaningless `if err != nil` check into every caller — the kind of ceremony [`CODING-STYLE.md`] flags as anti-idiomatic. The helper's docstring promises the deny semantics; the wiring ticket enforces them in code. + +The helper does NOT log on the parse-failure path. Logging is the caller's responsibility; this is a pure function. The wiring ticket may choose to emit a debug log on `""` returns if observed in production, but that is policy, not primitive. + +### Testing strategy + +A new test file `internal/relay/client_ip_test.go` in `package relay` (matches the rest of the package per convention). A single table-driven test exercises the matrix the AC enumerates. The test uses `httptest.NewRequest` to construct requests with controlled `RemoteAddr` and headers — no real listener, no goroutines. + +Required test cases (one row each): + +| Name | `RemoteAddr` | XFF header | `trustForwardedFor` | Want | +|---|---|---|---|---| +| `RemoteAddr_IPv4_WithPort` | `192.0.2.5:54321` | (none) | `false` | `192.0.2.5` | +| `RemoteAddr_IPv6_Bracketed` | `[2001:db8::1]:54321` | (none) | `false` | `2001:db8::1` | +| `RemoteAddr_Loopback_IPv6` | `[::1]:8080` | (none) | `false` | `::1` | +| `RemoteAddr_MalformedNoPort` | `192.0.2.5` | (none) | `false` | `""` | +| `RemoteAddr_Empty` | `` (empty) | (none) | `false` | `""` | +| `XFF_Disabled_HeaderIgnored` | `192.0.2.5:54321` | `203.0.113.7` | `false` | `192.0.2.5` | +| `XFF_Enabled_Single` | `10.0.0.1:443` | `203.0.113.7` | `true` | `203.0.113.7` | +| `XFF_Enabled_MultiEntry` | `10.0.0.1:443` | `203.0.113.7, 192.0.2.10, 198.51.100.4` | `true` | `203.0.113.7` | +| `XFF_Enabled_LeadingWhitespace` | `10.0.0.1:443` | ` 203.0.113.7 , 192.0.2.10` | `true` | `203.0.113.7` | +| `XFF_Enabled_Absent_FallsBackToRemoteAddr` | `192.0.2.5:54321` | (none) | `true` | `192.0.2.5` | +| `XFF_Enabled_Empty_FallsBackToRemoteAddr` | `192.0.2.5:54321` | `` (empty string) | `true` | `192.0.2.5` | +| `XFF_Enabled_WhitespaceOnlyFirstEntry_FallsBackToRemoteAddr` | `192.0.2.5:54321` | ` , 198.51.100.4` | `true` | `192.0.2.5` | +| `XFF_Enabled_RemoteAddrAlsoMalformed_ReturnsEmpty` | `garbage` | `` (empty string) | `true` | `""` | + +Notes: + +- The `XFF_Enabled_WhitespaceOnlyFirstEntry_FallsBackToRemoteAddr` row enforces the "no non-empty entry → fallback" branch explicitly. Without it, an attacker that sends `X-Forwarded-For: ,` could starve the fallback if the implementation only checked the trimmed first byte. The table catches that. +- The `XFF_Enabled_RemoteAddrAlsoMalformed_ReturnsEmpty` row enforces the bottom-of-flow `""` return when both sources fail. This is the "no usable source" AC line. +- IPv4-in-IPv6 forms (e.g. `[::ffff:192.0.2.5]:54321`) are NOT in the table — `net.SplitHostPort` handles them and the helper returns the host portion as-is per the no-canonicalisation contract. Adding a row would test `net.SplitHostPort`, not the helper. Out of scope. +- Tests construct requests via `httptest.NewRequest(http.MethodGet, "/", nil)` then set `r.RemoteAddr` and `r.Header` directly. No `httptest.NewServer`. + +`make test -race`, `make vet`, `make build` must remain clean per repo norm. + +### What this ticket does NOT do + +- **No wiring change to `cmd/pyrycode-relay/main.go`.** Stated in the AC; the helper is unused at end-of-ticket. This is intentional and is `go vet`-clean because the function is exported. +- **No move of the existing `remoteHost` helper.** It stays unexported, stays in `server_endpoint.go`, stays as-is. Migrating logging to the new helper is the wiring ticket's call (it may decide `remoteHost`'s fallback-to-verbatim is correct for logging and leave it alone). +- **No per-IP rate-limit type.** Sibling ticket (#50 or its descendant). This ticket is the primitive only. +- **No trusted-proxy CIDR list.** Future ticket. Today's bool `trustForwardedFor` is a flat all-or-nothing trust signal. +- **No `X-Real-IP` support.** XFF is the AC's named header. `X-Real-IP` is a separate (and more proxy-specific) convention; it can be added in the wiring ticket once a concrete proxy is chosen. +- **No multi-header XFF concatenation.** See § *Open questions*. + +### Open questions + +- **Multi-header `X-Forwarded-For`.** If a request carries two `X-Forwarded-For:` headers, `r.Header.Get` returns only the first. RFC 7239 (the standardised `Forwarded` header) supersedes the de-facto XFF behaviour, and most real proxies emit a single comma-joined `X-Forwarded-For`. The de-facto-correct treatment for two-header XFF is to join them with commas before taking the left-most entry; the relay does NOT do this today and the AC does not name the case. **Resolution:** defer to the wiring ticket. The current behaviour (take left-most of first header) is documented in the docstring and tested in the single-header row. If the wiring ticket discovers a real proxy that emits two-header XFF, the resolution is straightforward (`r.Header.Values("X-Forwarded-For")` + `strings.Join`); the change is two lines and one new test row. + +--- + +## Security review + +**Verdict:** PASS + +**Findings:** + +- [Trust boundaries] No findings — the trust boundary is explicit: a single `bool` parameter on a single exported function. The helper itself performs **no** trust decision; the caller passes the bit it has determined from operator configuration. The docstring names the consequence ("XFF is attacker-controlled and must be ignored" in direct-facing deployments) so a developer reading the call site sees the threat without leaving the helper. The wiring ticket (#50 / sibling) owns the policy of where the bit comes from; this spec deliberately does not pre-commit it (flag vs env vs config-file is a follow-up decision). +- [Tokens, secrets, credentials] N/A by design — the helper handles a single string (IP) that is not a secret. Logging considerations are downstream. +- [File operations] N/A — no I/O. +- [Subprocess / external command execution] N/A — pure function. +- [Cryptographic primitives] N/A — no randomness, no comparison against a secret. +- [Network & I/O] No findings — the helper reads two fields of an already-parsed `*http.Request`. It does not call `Read`, set timeouts, or open sockets. No untrusted bytes are consumed in unbounded form: `r.Header.Get` returns the header value as parsed by `net/http`, which enforces its own header-size cap (`http.Server.MaxHeaderBytes`, default 1 MiB) at upgrade time. The helper performs no length check of its own because the upstream bound is already in place at the only call site that will ever exist (`cmd/pyrycode-relay/main.go` runs a single `http.Server` instance whose `MaxHeaderBytes` is the stdlib default; if the relay ever raises that, this helper's per-call cost is `O(len(XFF))` for the `IndexByte` + `TrimSpace` scan — linear, no allocation beyond the returned substring's header span). +- [Error messages, logs, telemetry] No findings — the helper does not log or wrap errors. The empty-string return is the only side-channel; the caller decides whether to log on `""`. The threat-model § *Log hygiene* "MAY be logged: remote host (IP)" line authorises logging the result. The helper does NOT log the **input** (request headers), which is the part with token-bearing risk. +- [Concurrency] N/A — pure function; the `*http.Request` is owned by the handler goroutine. +- [Threat model alignment] No findings — this primitive directly feeds the threat-model's § *DoS resistance — connection floods* "Future hardening: token-bucket rate limit" line. The split (primitive here, wiring next) is deliberate so the trust-model nuance gets its own review pass. + +**Adversarial walk — what a hostile actor could try:** + +1. **Header injection via XFF.** Adversarial phone sends `X-Forwarded-For: 127.0.0.1` to forge a loopback identity. **Defence:** the caller must pass `trustForwardedFor=false` in any internet-facing deployment. The docstring is explicit; the wiring ticket enforces the default. If the operator misconfigures the bit, every request reports `127.0.0.1` as its source and the rate-limit collapses to a single bucket — a self-DoS, not a privilege escalation. Acceptable as a configuration-error surface; the threat-model entry for log hygiene has the same shape (operator-owned). +2. **Header injection via spoofed XFF when proxy IS trusted.** Phone sets `X-Forwarded-For: victim-ip` before reaching the proxy. **Defence:** trusted proxies overwrite (or append to) XFF; an upstream-set value is the proxy's responsibility to sanitise. The relay-side helper cannot defend against a misconfigured proxy. Documented in the docstring ("only when the relay is fronted by a reverse proxy the operator trusts to set XFF correctly"). The wiring ticket should call this out in operator docs; here we ship the primitive. +3. **Whitespace bombing.** XFF set to a 1 MiB string of spaces. **Defence:** the helper does `IndexByte(',')` first (finds no comma → returns the full string), then `TrimSpace` on the result. `TrimSpace` is O(n) and allocation-free for a fully-whitespace string (returns the empty substring slice). The fallback-to-RemoteAddr branch triggers. `http.Server.MaxHeaderBytes` caps the input at 1 MiB so the linear scan is bounded. No amplification. +4. **Pathological comma.** XFF set to `,` (just a comma). `IndexByte` finds index 0; `v = v[:0]` is the empty string; `TrimSpace` is `""`; fallback to `RemoteAddr`. Behaves identically to the absent-header case. +5. **Embedded null bytes / control chars.** XFF set to `203.0.113.7\x00.attacker.example`. The helper returns the byte sequence up to the first comma (or the whole header if none), then trims ASCII whitespace. **It does not sanitise control bytes.** The threat is downstream: a rate-limit key with embedded nulls is fine (Go strings are byte-safe); a log line including the unsanitised string could confuse a log parser. **Resolution:** the threat-model § *Log hygiene* names log injection as a known concern; the wiring ticket's logging decision must `strconv.Quote` or similar before emit. This helper deliberately does not strip control bytes because (a) doing so would change the semantics of the returned key (two different attackers' nulls collapse to the same bucket) and (b) it would push log-safety policy into a security primitive. Documented under § *Out of scope*. **SHOULD FIX, deferred to the wiring ticket** — the wiring ticket must wrap log calls with quoting if it emits the IP string. Adding a `[clientip] note: caller must quote on log emit` line to the docstring is a free belt-and-suspenders move. +6. **RemoteAddr without a port.** Some net stacks (test fixtures, custom servers) set `r.RemoteAddr` to a bare IP. `net.SplitHostPort("192.0.2.5")` returns an error; the helper returns `""`. The rate-limit caller denies. **Acceptable** because (a) the stdlib `http.Server` always sets `host:port` for real sockets and (b) tests that construct `httptest.NewRequest` must set `RemoteAddr` explicitly to a `host:port` form (the test table enforces this). +7. **RemoteAddr empty.** Same path as (6). Empty-string return; deny. Tested. +8. **IPv6 zone-id.** `r.RemoteAddr = [fe80::1%eth0]:443`. `net.SplitHostPort` returns `fe80::1%eth0`; the helper preserves it. The rate-limit caller uses the full string as the bucket key. **Acceptable**: zone-id is part of the address per RFC 4007 and two different zones legitimately denote different interfaces; collapsing them is the wrong default. The docstring's "no canonicalisation" line names this. + +Per § *Decision*: no MUST FIX. One SHOULD FIX (log-quoting) flagged in finding (5), deferred to the wiring ticket with the resolution path stated. + +**Reviewer:** architect (self-review per `architect/security-review.md`) +**Date:** 2026-05-11 From 636deeaf5623fefaae7d58cf523b245ba27884db Mon Sep 17 00:00:00 2001 From: Juhana Ilmoniemi Date: Mon, 11 May 2026 11:39:52 +0300 Subject: [PATCH 2/3] relay: client-IP extraction helper for HTTP requests (#51) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds ClientIP, a pure helper that returns the client source IP from an *http.Request. With trustForwardedFor=false it returns the host portion of r.RemoteAddr; with trustForwardedFor=true it takes the left-most X-Forwarded-For entry and falls back to RemoteAddr when the header is absent, empty, or yields no non-empty first entry. Returns "" only when no usable source is available — callers treat that as deny. Table-driven tests cover the RemoteAddr (IPv4/IPv6/malformed/empty) and XFF (single, multi-entry, whitespace, disabled, fallback, double- malformed) matrix from the spec. No callers wired in this ticket — wiring is the rate-limit ticket's job. Co-Authored-By: Claude Opus 4.7 --- internal/relay/client_ip.go | 48 ++++++++++++ internal/relay/client_ip_test.go | 122 +++++++++++++++++++++++++++++++ 2 files changed, 170 insertions(+) create mode 100644 internal/relay/client_ip.go create mode 100644 internal/relay/client_ip_test.go diff --git a/internal/relay/client_ip.go b/internal/relay/client_ip.go new file mode 100644 index 0000000..9dd8c7c --- /dev/null +++ b/internal/relay/client_ip.go @@ -0,0 +1,48 @@ +package relay + +import ( + "net" + "net/http" + "strings" +) + +// ClientIP extracts the client's source IP from an incoming HTTP request. +// +// When trustForwardedFor is false, ClientIP returns the host portion of +// r.RemoteAddr with the port stripped. Use this in deployments where the +// relay is directly internet-facing: X-Forwarded-For is attacker-controlled +// and must be ignored. +// +// When trustForwardedFor is true, ClientIP returns the left-most entry of +// X-Forwarded-For (the original client per the de-facto convention) with +// surrounding whitespace stripped. If the header is absent or yields no +// non-empty entry, ClientIP falls back to the host portion of r.RemoteAddr. +// Use this only when the relay is fronted by a reverse proxy the operator +// trusts to set XFF correctly. +// +// ClientIP returns the empty string only when no usable source is available +// (RemoteAddr cannot be parsed and either XFF was not consulted or yielded +// nothing). Callers MUST treat the empty string as "deny": the rate-limit +// wiring ticket enforces this. ClientIP itself performs no policy. +// +// The returned string is the raw host portion as it appears on the wire — +// no canonicalisation (no IPv6 lower-casing, no zone-id stripping). Callers +// that emit the string in log lines must quote it (e.g. via strconv.Quote) +// to avoid log injection from embedded control bytes. +func ClientIP(r *http.Request, trustForwardedFor bool) string { + if trustForwardedFor { + if v := r.Header.Get("X-Forwarded-For"); v != "" { + if comma := strings.IndexByte(v, ','); comma >= 0 { + v = v[:comma] + } + if first := strings.TrimSpace(v); first != "" { + return first + } + } + } + host, _, err := net.SplitHostPort(r.RemoteAddr) + if err != nil { + return "" + } + return host +} diff --git a/internal/relay/client_ip_test.go b/internal/relay/client_ip_test.go new file mode 100644 index 0000000..b122ecb --- /dev/null +++ b/internal/relay/client_ip_test.go @@ -0,0 +1,122 @@ +package relay + +import ( + "net/http" + "net/http/httptest" + "testing" +) + +func TestClientIP(t *testing.T) { + tests := []struct { + name string + remoteAddr string + xffHeader string + setXFF bool + trustForwardedFor bool + want string + }{ + { + name: "RemoteAddr_IPv4_WithPort", + remoteAddr: "192.0.2.5:54321", + want: "192.0.2.5", + }, + { + name: "RemoteAddr_IPv6_Bracketed", + remoteAddr: "[2001:db8::1]:54321", + want: "2001:db8::1", + }, + { + name: "RemoteAddr_Loopback_IPv6", + remoteAddr: "[::1]:8080", + want: "::1", + }, + { + name: "RemoteAddr_MalformedNoPort", + remoteAddr: "192.0.2.5", + want: "", + }, + { + name: "RemoteAddr_Empty", + remoteAddr: "", + want: "", + }, + { + name: "XFF_Disabled_HeaderIgnored", + remoteAddr: "192.0.2.5:54321", + xffHeader: "203.0.113.7", + setXFF: true, + trustForwardedFor: false, + want: "192.0.2.5", + }, + { + name: "XFF_Enabled_Single", + remoteAddr: "10.0.0.1:443", + xffHeader: "203.0.113.7", + setXFF: true, + trustForwardedFor: true, + want: "203.0.113.7", + }, + { + name: "XFF_Enabled_MultiEntry", + remoteAddr: "10.0.0.1:443", + xffHeader: "203.0.113.7, 192.0.2.10, 198.51.100.4", + setXFF: true, + trustForwardedFor: true, + want: "203.0.113.7", + }, + { + name: "XFF_Enabled_LeadingWhitespace", + remoteAddr: "10.0.0.1:443", + xffHeader: " 203.0.113.7 , 192.0.2.10", + setXFF: true, + trustForwardedFor: true, + want: "203.0.113.7", + }, + { + name: "XFF_Enabled_Absent_FallsBackToRemoteAddr", + remoteAddr: "192.0.2.5:54321", + trustForwardedFor: true, + want: "192.0.2.5", + }, + { + name: "XFF_Enabled_Empty_FallsBackToRemoteAddr", + remoteAddr: "192.0.2.5:54321", + xffHeader: "", + setXFF: true, + trustForwardedFor: true, + want: "192.0.2.5", + }, + { + name: "XFF_Enabled_WhitespaceOnlyFirstEntry_FallsBackToRemoteAddr", + remoteAddr: "192.0.2.5:54321", + xffHeader: " , 198.51.100.4", + setXFF: true, + trustForwardedFor: true, + want: "192.0.2.5", + }, + { + name: "XFF_Enabled_RemoteAddrAlsoMalformed_ReturnsEmpty", + remoteAddr: "garbage", + xffHeader: "", + setXFF: true, + trustForwardedFor: true, + want: "", + }, + } + + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + r := httptest.NewRequest(http.MethodGet, "/", nil) + r.RemoteAddr = tc.remoteAddr + if tc.setXFF { + r.Header.Set("X-Forwarded-For", tc.xffHeader) + } + + got := ClientIP(r, tc.trustForwardedFor) + if got != tc.want { + t.Errorf("ClientIP(remoteAddr=%q, xff=%q set=%v, trust=%v) = %q, want %q", + tc.remoteAddr, tc.xffHeader, tc.setXFF, tc.trustForwardedFor, got, tc.want) + } + }) + } +} From 0356789ae40b4b21a773328d9c7c8ff412f35f4f Mon Sep 17 00:00:00 2001 From: Juhana Ilmoniemi Date: Mon, 11 May 2026 11:44:17 +0300 Subject: [PATCH 3/3] =?UTF-8?q?docs:=20client-IP=20extraction=20helper=20?= =?UTF-8?q?=E2=80=94=20codebase=20note,=20patterns,=20lessons=20(#51)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per-ticket codebase note under docs/knowledge/codebase/51.md covering the ClientIP helper, its trust-model surface, and the deferred items the wiring ticket inherits. Two patterns added to PROJECT-MEMORY.md: - Empty-string-as-deny on pure security primitives (no (value, error) ceremony when every caller would handle the error identically). - Two helpers that diverge on contract, not mechanics, get a new export rather than a mutation — preserves both contracts and defers the migration call to the wiring ticket. Three lessons added to docs/lessons.md: - r.Header.Get returns only the first XFF header value when XFF arrives as separate headers; deferred fix uses r.Header.Values + Join. - IndexByte + slice for the allocation-free first-token parse vs SplitN. - net.SplitHostPort as the canonical host:port (and [ipv6]:port) parser. Co-Authored-By: Claude Opus 4.7 --- docs/PROJECT-MEMORY.md | 2 ++ docs/knowledge/codebase/51.md | 34 ++++++++++++++++++++++++++++++++++ docs/lessons.md | 12 ++++++++++++ 3 files changed, 48 insertions(+) create mode 100644 docs/knowledge/codebase/51.md diff --git a/docs/PROJECT-MEMORY.md b/docs/PROJECT-MEMORY.md index 2423761..005f0b4 100644 --- a/docs/PROJECT-MEMORY.md +++ b/docs/PROJECT-MEMORY.md @@ -56,6 +56,8 @@ Stateless WebSocket router between mobile clients and pyry binaries. Internet-ex - **Type-owned background goroutine: `ticker + done chan + WaitGroup`, idempotent + synchronous `Close`.** When a type owns a long-lived bookkeeping goroutine (sweep/eviction loop, not a per-conn read pump), start it from the constructor and stop it via the type's own `Close()`. Shape: `closeOnce.Do(func(){ close(done) })` then `wg.Wait()` **outside** the once — putting `wg.Wait()` inside would let a second concurrent caller return early while shutdown is still in flight. The wiring site registers `defer x.Close()` at the composition root. Adopted in `IPRateLimiter` (#50). Distinct from "Per-conn goroutines exit cleanly via LIFO defers" — that pattern is for goroutines whose lifecycle is owned by an HTTP handler; this one is for goroutines whose lifecycle is owned by the type itself. - **When one timestamp must serve two semantic purposes, split it.** In `IPRateLimiter` (#50), refill math needs an anchor that advances in token-sized chunks (preserving fractional elapsed time for the next call) while eviction needs an anchor that updates on every poll (so an actively-hammered/denied bucket isn't dropped and recycled for a fresh burst). A single `last`-of-anything cannot do both without one purpose corrupting the other. Two fields (`refillLast`, `pollLast`) decouple them at zero structural cost and make a load-bearing security invariant (no free-burst-recycling) structurally true rather than carefully maintained. Generalisable: whenever a timestamp's "should it update here?" question gets conflicting answers from different callers, the answer is two timestamps. - **Injectable wall-clock as an unexported `now func() time.Time` field, defaulting to `time.Now`.** When a type's correctness depends on `time.Now` and tests need determinism without sleeping, give the struct a package-private `now func() time.Time` field set to `time.Now` in the constructor; in-package tests assign a fake (`l.now = fakeClock.Now`) directly. No exported `WithClock` option, no clock interface, no constructor variants. The single eviction goroutine reads `l.now` on every tick; assignment before the first tick is race-free in practice and `-race` tests cover the steady-state. Adopted in `IPRateLimiter` (#50); applies to any future timer-driven primitive whose tests need fake-clock determinism. +- **Empty-string return as the "deny" signal on a pure security primitive — no `(string, error)` ceremony.** When every caller would handle the error identically (treat as deny), an `(value, error)` pair pushes a meaningless `if err != nil` into every call site. Instead: document the empty-string contract in the function's docstring, name the deny semantics explicitly, and let the wiring ticket enforce it in code. The helper performs no policy; the caller's `if ip == "" { deny }` is the policy. Adopted in `ClientIP` (#51). Applies to any pure security primitive whose output is keyed (rate-limit key, bucket id) and whose only failure mode is "no usable input." +- **Two helpers diverge on contract, not on mechanics → add a new export, don't mutate the existing one.** `internal/relay/server_endpoint.go`'s unexported `remoteHost` (logging helper: falls back to `r.RemoteAddr` verbatim on parse error, no XFF awareness) and the new `ClientIP` (security primitive: empty-string-as-deny, opt-in XFF trust) share the same `net.SplitHostPort` mechanics but have structurally different contracts. Mutating `remoteHost` to add XFF and change its empty-string semantics would have broken its log-line callers; adding a new export preserves both contracts and lets the wiring ticket decide whether logging migrates. Pattern: when a new caller needs the same mechanics with a *different* contract (deny semantics, validation strictness, return shape), the answer is a new export, not a parameter on the old one. Adopted in `ClientIP` (#51). ## Conventions diff --git a/docs/knowledge/codebase/51.md b/docs/knowledge/codebase/51.md new file mode 100644 index 0000000..bbb2285 --- /dev/null +++ b/docs/knowledge/codebase/51.md @@ -0,0 +1,34 @@ +# Ticket #51 — client-IP extraction helper for HTTP requests + +Pure helper that produces the source IP for a request, in a form suitable for keying a rate-limit bucket. Standalone primitive: ships the function and its test table only, no callers wired. Feeds the per-IP rate-limiter (#50) once the wiring ticket lands; the split exists so the trust-model nuance (when to read `X-Forwarded-For`, when to ignore it) gets its own review pass before any handler depends on it. + +## Implementation + +- **`internal/relay/client_ip.go`** — exported `ClientIP(r *http.Request, trustForwardedFor bool) string`. Pure function, no I/O, no state. + - `trustForwardedFor=false`: returns `net.SplitHostPort(r.RemoteAddr)` host portion. Empty string on parse error. + - `trustForwardedFor=true`: takes the left-most `X-Forwarded-For` entry (`strings.IndexByte(',')` + `strings.TrimSpace`), falls back to `RemoteAddr` if the header is absent, empty, or yields no non-empty first entry. + - Empty-string return is the only "no usable source" signal — callers MUST treat as deny. No `(string, error)` pair; every caller would handle the error identically. + - No canonicalisation: the returned string is the raw host portion as it appears on the wire (no IPv6 lower-casing, no zone-id stripping). Two different zones (`fe80::1%eth0` vs `fe80::1%eth1`) legitimately denote different interfaces; collapsing them is the wrong default. Docstring names the consequence — log emitters must `strconv.Quote` to avoid log injection from embedded control bytes. +- **`internal/relay/client_ip_test.go`** — 13-row table covering `RemoteAddr` (IPv4 with port, IPv6 bracketed, IPv6 loopback, malformed no-port, empty), XFF-disabled-header-ignored, XFF-enabled (single, multi-entry, leading whitespace), and the four fallback rows (absent header, empty header, whitespace-only first entry, both sources malformed → `""`). Constructed via `httptest.NewRequest` with `r.RemoteAddr` and `r.Header` set directly — no real listener. +- **No changes to `cmd/pyrycode-relay/main.go`** — per the AC, this is the primitive only. The unexported `remoteHost` helper in `server_endpoint.go:129-136` is **unchanged**: it remains the logging helper (falls back to `r.RemoteAddr` verbatim on parse error). The two helpers diverge on contract (logging vs rate-limit key), so the new helper is a new export rather than a mutation of `remoteHost`. Whether to migrate logging is the wiring ticket's call. + +## Trust model + +The bool parameter is the entire policy surface. The helper itself performs no trust decision; the caller passes the bit it has determined from operator configuration. Two named threats: + +- **Direct-facing deployment, `trustForwardedFor=true` by mistake.** Adversarial phone forges `X-Forwarded-For: 127.0.0.1`; every request reports `127.0.0.1`; rate-limit collapses to one bucket. Self-DoS via misconfiguration, not privilege escalation. The wiring ticket is responsible for defaulting the bit to `false`. +- **Trusted-proxy deployment, proxy misconfigured.** Phone sets `X-Forwarded-For: victim-ip` *before* the proxy; the proxy must overwrite or append. The relay-side helper cannot defend against an upstream that doesn't sanitise — that's the proxy's responsibility. Documented in the docstring. + +## Out of scope + +- Wiring into any handler / `cmd/pyrycode-relay/main.go` — the rate-limit wiring ticket's job. +- Trusted-proxy CIDR allowlist (right-most-trusted hop chain). Today's bool is flat all-or-nothing. The hop-aware variant is a future ticket once a real proxy is chosen. +- `X-Real-IP` support. Separate (proxy-specific) convention; deferred to the wiring ticket. +- Multi-header `X-Forwarded-For` (two separate `X-Forwarded-For:` headers on one request). `r.Header.Get` returns only the first; if a real proxy is observed emitting two-header XFF, the fix is `r.Header.Values` + `strings.Join` — two lines, one new test row. Deferred until observed (per *Evidence-Based Fix Selection*). +- Canonicalisation, zone-id stripping, control-byte sanitisation — deliberate non-policy on a security primitive. + +## Cross-links + +- [Threat model § DoS resistance](../../threat-model.md) — the "Future hardening: per-IP rate limit" line this helper enables. +- [Codebase: #50 per-IP rate-limiter](50.md) — the sibling primitive this helper produces keys for. +- [Lesson: `r.Header.Get` returns the first header only](../../lessons.md) — the multi-header XFF gotcha deferred above. diff --git a/docs/lessons.md b/docs/lessons.md index f11218f..e40f6ca 100644 --- a/docs/lessons.md +++ b/docs/lessons.md @@ -102,6 +102,18 @@ After crediting `tokensAdd := int(elapsed / refillEvery)` tokens, the natural-lo The natural shape for "idempotent close that stops one goroutine" is `closeOnce.Do(func(){ close(done); wg.Wait() })`. That works for a single caller, but a second concurrent `Close()` skips the `Do` body entirely and returns immediately — while the goroutine is still running and the first caller is still inside `wg.Wait()`. Caller two now races past `Close` thinking shutdown is complete. Hoist `wg.Wait()` outside the once: `closeOnce.Do(func(){ close(done) }); wg.Wait()`. Now both callers block on the WaitGroup until the goroutine exits; both return synchronously. Matches the "Close is idempotent and synchronous" expectation on `WSConn.Close`. Source: `IPRateLimiter.Close` (#50). +## `r.Header.Get("X-Forwarded-For")` returns only the FIRST header value when XFF arrives as multiple separate headers + +`r.Header` is a `map[string][]string`. `Get` returns `values[0]`. If a request carries `X-Forwarded-For: a` and a second `X-Forwarded-For: b` as distinct headers (legal per RFC 7230 §3.2.2 for list-valued headers), `Get` returns only `a` and the helper takes the left-most token of that. Most real proxies emit a single comma-joined `X-Forwarded-For`, so this is structurally fine *today*; the de-facto-correct fix when a real proxy emits two-header XFF is `r.Header.Values("X-Forwarded-For")` + `strings.Join(..., ",")` before splitting. Documented as deferred in `ClientIP`'s spec under § Open questions; not fixed pre-emptively per *Evidence-Based Fix Selection*. Source: `internal/relay/client_ip.go` (#51). + +## For allocation-free "first comma-separated token" parsing, use `strings.IndexByte` + slice, not `strings.SplitN` + +`strings.SplitN(s, ",", 2)` allocates a `[]string` of length 1 or 2 every call. At rate-limit-decision frequency (one call per WS upgrade), the allocation isn't free. The equivalent `if i := strings.IndexByte(s, ','); i >= 0 { s = s[:i] }` is a single slice-header rewrite — no heap traffic, identical semantics, identical readability. Use `SplitN` when you need *both* halves; use `IndexByte` + slice when you only need the prefix. Source: `ClientIP` (#51). + +## `net.SplitHostPort` is the right tool for "host portion of a `host:port` string" — handles `[ipv6]:port` and errors cleanly on malformed input + +Manual `strings.LastIndex(s, ":")` parsing breaks on bracketed IPv6 literals (`[::1]:443`). `net.SplitHostPort` handles both forms, strips the brackets from IPv6 (returning `::1`, not `[::1]`), and returns a non-nil error on missing colon or empty input — which doubles as the discriminator for "I have a usable source IP" vs "I do not." Prefer it over hand-rolled splitting whenever both IPv4 and IPv6 are in scope (i.e. almost always). Same call sites: `ClientIP` (#51), `EnforceHost` (#9), the unexported `remoteHost` logging helper (`server_endpoint.go`). + ## Map iteration with `delete(m, k)` is spec-safe; `for range` snapshot is not required The Go spec guarantees that deleting the current key during `for k, v := range m { ... delete(m, k) ... }` is safe — the deleted entry is omitted from subsequent iterations, and entries not yet visited may or may not be visited (which is fine when you're filtering by a predicate that any-order eviction satisfies). No need to build a `toDelete` slice first. Source: `IPRateLimiter.sweep` (#50); confirmed by the Go spec §"For statements with `range` clause".