Skip to content
Merged
17 changes: 14 additions & 3 deletions architecture/client.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,19 @@ Client(httpx2_client=httpx2.Client(trust_env=False))
AsyncClient(httpx2_client=httpx2.AsyncClient(trust_env=False))
```

## Bounded error bodies (`max_error_body_bytes`)
## Bounded response bodies (`max_response_body_bytes`)

Both `Client` and `AsyncClient` accept `max_error_body_bytes: int | None = None`. The default (`None`) is backward-compatible: error bodies are read without a size limit.
Both `Client` and `AsyncClient` accept `max_response_body_bytes: int | None = None`. The default (`None`) is unbounded; a non-`None` value below `1` is rejected with `ValueError` at construction. The cap is **status-agnostic** (a `200` trips it the same as a `500`) and counts **decoded** bytes — the actual in-memory footprint, and the only measure that catches a compression bomb (a 133-byte gzip body decoding to 100 KB).

When set, `stream()` raises `ResponseTooLargeError` on a 4xx/5xx response whose declared `Content-Length` header exceeds the cap — before the body is read. Responses without a declared `Content-Length` (chunked transfer) are still read unbounded: a hard mid-read cap would require httpx2 private API, which this project forbids.
The cap bounds memory that httpware buffers on your behalf, at two sites:

- **The non-streaming terminal** (`send()` and the per-verb helpers). When a cap is set, the terminal switches from `httpx2.send(request)` to `send(request, stream=True)` and accumulates decoded bytes through the shared `_read_capped` helper, failing fast with `ResponseTooLargeError` the moment the cap is crossed. When the cap is `None`, the terminal keeps the plain buffered `send()` fast path — zero streaming overhead.
- **`stream()`'s internal error pre-read** — the 4xx/5xx body httpware reads so `exc.response.content` works is routed through the same `_read_capped`. **User-driven `stream()` iteration is never capped** — you chose streaming to own that memory.

The declared `Content-Length` is used only as an *early reject* (if even the compressed size already exceeds the cap, fail before reading a byte); it is never an early accept, so the accumulator always runs — chunked and bomb bodies are caught, not waved through. `ResponseTooLargeError.reason` is `"declared"` or `"streamed"` accordingly. Entirely public httpx2 API — no private access.

**Bodiless responses bypass the cap.** Responses that carry no message body — to a `HEAD` request, or with status `204`/`304` — buffer nothing, so the cap never applies to them even when they declare a large `Content-Length` (`HEAD` legitimately echoes the entity length). These are returned unchanged, preserving their original headers.

**Rebuilt headers.** The accumulator yields the *decoded* body, so the rebuilt Response drops the wire-encoding headers (`Content-Encoding`, `Transfer-Encoding`, and the now-incorrect compressed `Content-Length`); httpx2 recomputes `Content-Length` from the buffered content. Carrying `Content-Encoding` forward would make httpx2 re-decode already-decoded bytes and raise.

**Caveat:** on the capped path the buffered response is rebuilt via the public `httpx2.Response(content=...)` constructor, which does not carry `.elapsed` (httpx2 only sets it on its own buffered `send()`). Clients that set a cap and read `response.elapsed` will find it absent; the `None`-cap fast path preserves it.
2 changes: 1 addition & 1 deletion architecture/errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The error-mapping table (what `httpx2` exception maps to which `httpware` except

The "no `__init__` override" rule scopes only to `StatusError` subclasses. Non-status `ClientError` subclasses — `DecodeError`, `MissingDecoderError`, `BulkheadFullError`, `RetryBudgetExhaustedError`, `CircuitOpenError`, `ResponseTooLargeError` — deliberately define `__init__` with keyword-only fields.

`ResponseTooLargeError` is raised from `stream()` when `max_error_body_bytes` is set and a 4xx/5xx response's declared `Content-Length` exceeds the cap. It is a non-status `ClientError`; it does not carry a `StatusError`-style positional `response` and is not in `STATUS_TO_EXCEPTION`.
`ResponseTooLargeError` is raised when `max_response_body_bytes` is set and a response body would exceed the cap — status-agnostic (a `200` can trip it), counting **decoded** bytes. It fires from the non-streaming terminal (`send()`) and from `stream()`'s internal error pre-read; user-driven `stream()` iteration is never capped. The `reason` field discriminates the two trip modes: `"declared"` (the declared `Content-Length` already exceeds the cap, rejected before any byte is read — `content_length` holds it) and `"streamed"` (the decoded body crossed the cap mid-read, the chunked or compression-bomb case, where the true size is unknown by design). It is a non-status `ClientError`; it does not carry a `StatusError`-style positional `response` and is not in `STATUS_TO_EXCEPTION`. Because it is neither a `StatusError`, `NetworkError`, nor `TimeoutError`, it is not retried and does not count toward the circuit breaker.

## Security: request headers are reachable via `exc.response.request`

Expand Down
231 changes: 231 additions & 0 deletions planning/changes/2026-06-23.03-response-body-cap/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
---
status: shipped
date: 2026-06-23
slug: response-body-cap
summary: Replace error-only max_error_body_bytes with a status-agnostic, decoded-byte max_response_body_bytes cap enforced by a streaming capped-accumulator terminal.
supersedes: null
superseded_by: null
pr: 78
outcome: Shipped via #78 — max_error_body_bytes removed (breaking, pre-1.0) for status-agnostic max_response_body_bytes, enforced at the non-streaming terminal and stream()'s error pre-read via a shared _read_capped accumulator counting decoded bytes (catches compression bombs); Content-Length kept as early-reject only. ResponseTooLargeError gained a declared/streamed reason; >=1 validated. Not retried / not breaker-counted; cap-wins on over-cap retryable 5xx. None-cap keeps the plain send() fast path (.elapsed preserved). 756 tests, 100% coverage. Promoted into architecture/client.md + errors.md; release note 0.15.0.
---

# Design: Status-agnostic response-body cap

## Summary

Replace the shipped `max_error_body_bytes` knob with a status-agnostic
`max_response_body_bytes` cap that actually bounds memory on the non-streaming
path. Today the cap only fires inside `stream()`, only on 4xx/5xx, and only as a
declared-`Content-Length` pre-check — so a non-streaming `send()` buffers the
whole body before httpware ever gets control, and even `stream()` reads chunked
or compression-bombed error bodies unbounded. The new design routes both the
internal terminal and `stream()`'s error pre-read through a single shared
`_read_capped` helper that streams the response, accumulates **decoded** bytes
against the cap, and fails fast with `ResponseTooLargeError` the moment the cap
is crossed. Entirely public httpx2 API — no `httpx2._`. Off by default (`None`).

## Motivation

The 2026-06-14 deep audit flagged (Medium) that `max_error_body_bytes` is not a
real cap: for a non-streaming `send()`, `httpx2.Client.send(request)` buffers the
entire body into memory before httpware reaches the decode seam, so there is no
enforcement point at all on the hot path. The existing guard lives only at
`stream()` entry and only rejects when `Content-Length` is declared.

Two concrete holes:

1. **The success path is unprotected and is the larger surface.** A typed
`send(response_model=X)` against a `200` with a multi-GB body exhausts the
heap. Memory exhaustion has no status code; an error-only cap bolts the
smaller door and leaves the bigger one open.
2. **Compression bombs defeat the `Content-Length` pre-check.** Verified: a
133-byte gzip body decodes to 100,000 bytes (`aiter_bytes()` yields the
*decoded* stream; `Content-Length` reports the *compressed* 133). Real bombs
run ~1000:1. A header pre-check waves these straight through.

Feasibility — the reason this was deferred — is resolved: the audit feared a
true mid-read cap needs httpx2 private API. It does not. `httpx2.{Async,}Client`
expose `send(request, stream=True)`, `Response.aiter_bytes()/iter_bytes()`, and a
public `Response(content=...)` constructor. That is the whole mechanism.

## Non-goals

- **A request-body cap.** This bounds response bodies only.
- **Capping user-driven `stream()` iteration.** When the caller iterates chunks
themselves they own the memory; capping it would defeat `stream()`.
- **A general per-connection limit.** That is httpx2's `limits`; orthogonal.
- **Reporting the true oversized body size.** When the accumulator trips we stop
at the first chunk over the line and do not know (and will not fabricate) the
total.
- **Preserving `.elapsed` on the capped path.** See Risk; an inherent cost of
rebuilding the `Response` via public API.

## Design

### 1. One knob: `max_response_body_bytes` (replaces `max_error_body_bytes`)

Both `Client` and `AsyncClient` take `max_response_body_bytes: int | None = None`.
`None` (default) is unbounded — backward-compatible behavior. The old
`max_error_body_bytes` is **deleted outright** (no deprecation shim — acceptable
pre-1.0). Construction validates `>= 1` and raises
`ValueError("max_response_body_bytes must be >= 1")`, matching the
`failure_threshold` idiom in `circuit_breaker.py`. `0`/negative are rejected;
`None` is the only way to disable.

### 2. The cap counts decoded bytes; `Content-Length` is early-reject only

The accumulator counts what `aiter_bytes()` yields (decoded / decompressed),
because decoded size is the actual memory footprint and is the only thing that
stops a compression bomb. The declared `Content-Length` header (the *compressed*
size) is used **only** as an early reject — if even the compressed size already
exceeds the cap, the decoded body certainly will, so we fail before reading a
byte. It is **never** an early accept: a small/absent `Content-Length` says
nothing about decoded size, so the accumulator always runs regardless.

### 3. Shared capped reader + pure accumulator core

A pure core, trivially property-testable:

```python
def _accumulate_capped(chunks: Iterable[bytes], cap: int) -> bytes:
buf = bytearray()
for chunk in chunks:
buf += chunk
if len(buf) > cap:
raise _CapExceeded(read=len(buf)) # internal signal
return bytes(buf)
```

`bytearray` grown in place (no transient list + `b"".join` double allocation),
one `bytes()` at the end. A sync `_read_capped` and async `_read_capped_async`
wrap it with the early reject and the `Response` rebuild:

```python
async def _read_capped_async(response, cap, request) -> httpx2.Response:
cl = _parse_content_length(response.headers.get("content-length"))
if cl is not None and cl > cap:
raise ResponseTooLargeError(status_code=response.status_code, limit=cap,
content_length=cl, reason="declared")
try:
content = _accumulate_capped_sync_over(response.aiter_bytes(), cap) # async variant
except _CapExceeded:
raise ResponseTooLargeError(status_code=response.status_code, limit=cap,
content_length=cl, reason="streamed")
return httpx2.Response(status_code=response.status_code, headers=response.headers,
content=content, request=request,
extensions=_safe_extensions(response.extensions),
history=response.history)
```

`_read_capped` takes a *Response*, not a client — so it is agnostic to whether
the response came from the request-based terminal `send(stream=True)` or
`stream()`'s method+url path. It never closes the stream; the caller owns
lifecycle. `_safe_extensions` copies `http_version`/`reason_phrase` and drops the
now-stale `network_stream` (the buffered Response never uses it).

### 4. Terminal: branch on `cap is None`

`_terminal` keeps the plain fast path when the cap is off, so non-cap users pay
zero streaming overhead and keep `.elapsed`. Only when a cap is set does it
stream and route through `_read_capped`, owning the stream lifecycle:

```python
async def _terminal(self, request):
async with _httpx2_exception_mapper():
if self._max_response_body_bytes is None:
response = await self._httpx2_client.send(request) # unchanged fast path
else:
resp = await self._httpx2_client.send(request, stream=True)
try:
response = await _read_capped_async(resp, self._max_response_body_bytes, request)
finally:
await resp.aclose()
_raise_on_status_error(response)
return response
```

### 5. `stream()`: error pre-read routed through the same helper

`stream()`'s existing 4xx/5xx pre-read (`await response.aread()`, guarded today
by the `Content-Length`-only check) is replaced by `_read_capped`. The user-driven
success path is untouched. This is the only place `stream()` itself buffers, so
the cap reaches it there and nowhere else; `exc.response.content` still works,
now bounded, and chunked/bombed error bodies are caught instead of waved through.

### 6. `ResponseTooLargeError` gains an explicit `reason`

Status-agnostic now (`status_code` can be `200`). Two trip modes carry different
information, so the discriminator is explicit rather than inferred:

- `limit: int` — the cap (always known).
- `status_code: int` — always known; distinguishes a 200 trip from a 5xx.
- `content_length: int | None` — the server's *declared* header, nullable,
informational only.
- `reason: typing.Literal["declared", "streamed"]` — `"declared"` = early reject
on `Content-Length`; `"streamed"` = accumulator crossed the cap (the
bomb/chunked case). No `bytes_read`/"actual size" — never measured, never
fabricated.

Stays a non-status `ClientError` with the existing `__init__` + `__reduce__`
(per the `errors.md` rule). Message reads correctly per mode.

### 7. Resilience interaction (falls out of the hierarchy, no special-casing)

Because `ResponseTooLargeError` is a `ClientError` (not
`StatusError`/`NetworkError`/`TimeoutError`):

- **Retry** (`_RETRYABLE_EXCEPTIONS`): not retryable — an over-cap body recurs;
retrying wastes bandwidth.
- **Circuit breaker**: not a counted failure — hits `except BaseException`, slot
released, neither success nor failure recorded. Cannot trip the breaker.
- **Bulkhead**: releases its slot normally.

**Cap-wins / fail-hard:** an otherwise-retryable 5xx whose body exceeds the cap
trips `_read_capped` before status classification, so it surfaces as
`ResponseTooLargeError` (non-retryable) rather than the `StatusError`. Accepted:
the cap is a hard memory-safety limit, retrying would re-fetch the same giant
body, and producing the `StatusError` would require the very buffering we are
refusing. A pathological case (transient error carrying a multi-GB body); a user
who sets a cap is explicitly refusing it.

## Testing

- **Pure core — Hypothesis** (`tests/test_capped_read_props.py`): over arbitrary
chunk partitions of a body × arbitrary cap, `_accumulate_capped` raises iff
`len(body) > cap` and returns `body` byte-for-byte otherwise (chunk-boundary
independence — the one subtle invariant).
- **Integration (`MockTransport`, sync + async parity):** within-cap passes;
exactly-at-cap passes (boundary); declared `Content-Length` over cap →
`reason="declared"`, zero bytes read; chunked / no `Content-Length` over cap →
`reason="streamed"`; gzip bomb (133 → 100 K) → `reason="streamed"`;
empty/204/HEAD pass; `ValueError` on `cap < 1`.
- **Resilience:** retry does not retry a `ResponseTooLargeError`; breaker does
not trip; an over-cap retryable 5xx surfaces as `ResponseTooLargeError`.
- **`stream()`:** error pre-read is bounded (declared + streamed); user-driven
success streaming is never capped.
- `just lint && just test` green; coverage preserved.

## Risk

- **`.elapsed` dropped on the capped path** (likely × low). Rebuilding the
`Response` via public API loses `.elapsed`, which httpx2 only sets on its own
buffered send. Only affects clients that set a cap *and* read `.elapsed`.
Mitigation: the `cap is None` fast path preserves it for everyone else;
document the caveat in `architecture/client.md`.
- **Breaking removal of `max_error_body_bytes`** (certain × low). A shipped,
exported, documented param disappears. Acceptable pre-1.0; called out in
release notes. No silent behavior change — the name is gone, construction
fails loudly if still passed.
- **Stale `extensions` on the rebuilt Response** (unlikely × low). Mitigated by
`_safe_extensions` dropping `network_stream`.
- **Streaming-path overhead vs `send()`** (certain × low). Only paid when a cap
is set; the fast path is untouched.

## Operations

None — no out-of-repo steps.

## Out of scope

- Deprecation shim for `max_error_body_bytes` (deleted, not aliased).
- Request-body caps, per-connection limits, capping user-driven `stream()`.
Loading