diff --git a/architecture/client.md b/architecture/client.md
index 118d3bb..1f38bf6 100644
--- a/architecture/client.md
+++ b/architecture/client.md
@@ -31,8 +31,19 @@ Client(httpx2_client=httpx2.Client(trust_env=False))
 AsyncClient(httpx2_client=httpx2.AsyncClient(trust_env=False))
 ```
 
-## Bounded error bodies (`max_error_body_bytes`)
+## Bounded response bodies (`max_response_body_bytes`)
 
-Both `Client` and `AsyncClient` accept `max_error_body_bytes: int | None = None`. The default (`None`) is backward-compatible: error bodies are read without a size limit.
+Both `Client` and `AsyncClient` accept `max_response_body_bytes: int | None = None`. The default (`None`) is unbounded; a non-`None` value below `1` is rejected with `ValueError` at construction. The cap is **status-agnostic** (a `200` trips it the same as a `500`) and counts **decoded** bytes — the actual in-memory footprint, and the only measure that catches a compression bomb (a 133-byte gzip body decoding to 100 KB).
 
-When set, `stream()` raises `ResponseTooLargeError` on a 4xx/5xx response whose declared `Content-Length` header exceeds the cap — before the body is read. Responses without a declared `Content-Length` (chunked transfer) are still read unbounded: a hard mid-read cap would require httpx2 private API, which this project forbids.
+The cap bounds memory that httpware buffers on your behalf, at two sites:
+
+- **The non-streaming terminal** (`send()` and the per-verb helpers). When a cap is set, the terminal switches from `httpx2.send(request)` to `send(request, stream=True)` and accumulates decoded bytes through the shared `_read_capped` helper, failing fast with `ResponseTooLargeError` the moment the cap is crossed. When the cap is `None`, the terminal keeps the plain buffered `send()` fast path — zero streaming overhead.
+- **`stream()`'s internal error pre-read** — the 4xx/5xx body httpware reads so `exc.response.content` works is routed through the same `_read_capped`. **User-driven `stream()` iteration is never capped** — you chose streaming to own that memory.
+
+The declared `Content-Length` is used only as an *early reject* (if even the compressed size already exceeds the cap, fail before reading a byte); it is never an early accept, so the accumulator always runs — chunked and bomb bodies are caught, not waved through. `ResponseTooLargeError.reason` is `"declared"` or `"streamed"` accordingly. Entirely public httpx2 API — no private access.
+
+**Bodiless responses bypass the cap.** Responses that carry no message body — to a `HEAD` request, or with status `204`/`304` — buffer nothing, so the cap never applies to them even when they declare a large `Content-Length` (`HEAD` legitimately echoes the entity length). These are returned unchanged, preserving their original headers.
+
+**Rebuilt headers.** The accumulator yields the *decoded* body, so the rebuilt Response drops the wire-encoding headers (`Content-Encoding`, `Transfer-Encoding`, and the now-incorrect compressed `Content-Length`); httpx2 recomputes `Content-Length` from the buffered content. Carrying `Content-Encoding` forward would make httpx2 re-decode already-decoded bytes and raise.
+
+**Caveat:** on the capped path the buffered response is rebuilt via the public `httpx2.Response(content=...)` constructor, which does not carry `.elapsed` (httpx2 only sets it on its own buffered `send()`). Clients that set a cap and read `response.elapsed` will find it absent; the `None`-cap fast path preserves it.
diff --git a/architecture/errors.md b/architecture/errors.md
index c516bb5..152d95b 100644
--- a/architecture/errors.md
+++ b/architecture/errors.md
@@ -18,7 +18,7 @@ The error-mapping table (what `httpx2` exception maps to which `httpware` except
 
 The "no `__init__` override" rule scopes only to `StatusError` subclasses. Non-status `ClientError` subclasses — `DecodeError`, `MissingDecoderError`, `BulkheadFullError`, `RetryBudgetExhaustedError`, `CircuitOpenError`, `ResponseTooLargeError` — deliberately define `__init__` with keyword-only fields.
 
-`ResponseTooLargeError` is raised from `stream()` when `max_error_body_bytes` is set and a 4xx/5xx response's declared `Content-Length` exceeds the cap. It is a non-status `ClientError`; it does not carry a `StatusError`-style positional `response` and is not in `STATUS_TO_EXCEPTION`.
+`ResponseTooLargeError` is raised when `max_response_body_bytes` is set and a response body would exceed the cap — status-agnostic (a `200` can trip it), counting **decoded** bytes. It fires from the non-streaming terminal (`send()`) and from `stream()`'s internal error pre-read; user-driven `stream()` iteration is never capped. The `reason` field discriminates the two trip modes: `"declared"` (the declared `Content-Length` already exceeds the cap, rejected before any byte is read — `content_length` holds it) and `"streamed"` (the decoded body crossed the cap mid-read, the chunked or compression-bomb case, where the true size is unknown by design). It is a non-status `ClientError`; it does not carry a `StatusError`-style positional `response` and is not in `STATUS_TO_EXCEPTION`. Because it is neither a `StatusError`, `NetworkError`, nor `TimeoutError`, it is not retried and does not count toward the circuit breaker.
 
 ## Security: request headers are reachable via `exc.response.request`
 
diff --git a/planning/changes/2026-06-23.03-response-body-cap/design.md b/planning/changes/2026-06-23.03-response-body-cap/design.md
new file mode 100644
index 0000000..5e361d0
--- /dev/null
+++ b/planning/changes/2026-06-23.03-response-body-cap/design.md
@@ -0,0 +1,231 @@
+---
+status: shipped
+date: 2026-06-23
+slug: response-body-cap
+summary: Replace error-only max_error_body_bytes with a status-agnostic, decoded-byte max_response_body_bytes cap enforced by a streaming capped-accumulator terminal.
+supersedes: null
+superseded_by: null
+pr: 78
+outcome: Shipped via #78 — max_error_body_bytes removed (breaking, pre-1.0) for status-agnostic max_response_body_bytes, enforced at the non-streaming terminal and stream()'s error pre-read via a shared _read_capped accumulator counting decoded bytes (catches compression bombs); Content-Length kept as early-reject only. ResponseTooLargeError gained a declared/streamed reason; >=1 validated. Not retried / not breaker-counted; cap-wins on over-cap retryable 5xx. None-cap keeps the plain send() fast path (.elapsed preserved). 756 tests, 100% coverage. Promoted into architecture/client.md + errors.md; release note 0.15.0.
+---
+
+# Design: Status-agnostic response-body cap
+
+## Summary
+
+Replace the shipped `max_error_body_bytes` knob with a status-agnostic
+`max_response_body_bytes` cap that actually bounds memory on the non-streaming
+path. Today the cap only fires inside `stream()`, only on 4xx/5xx, and only as a
+declared-`Content-Length` pre-check — so a non-streaming `send()` buffers the
+whole body before httpware ever gets control, and even `stream()` reads chunked
+or compression-bombed error bodies unbounded. The new design routes both the
+internal terminal and `stream()`'s error pre-read through a single shared
+`_read_capped` helper that streams the response, accumulates **decoded** bytes
+against the cap, and fails fast with `ResponseTooLargeError` the moment the cap
+is crossed. Entirely public httpx2 API — no `httpx2._`. Off by default (`None`).
+
+## Motivation
+
+The 2026-06-14 deep audit flagged (Medium) that `max_error_body_bytes` is not a
+real cap: for a non-streaming `send()`, `httpx2.Client.send(request)` buffers the
+entire body into memory before httpware reaches the decode seam, so there is no
+enforcement point at all on the hot path. The existing guard lives only at
+`stream()` entry and only rejects when `Content-Length` is declared.
+
+Two concrete holes:
+
+1. **The success path is unprotected and is the larger surface.** A typed
+   `send(response_model=X)` against a `200` with a multi-GB body exhausts the
+   heap. Memory exhaustion has no status code; an error-only cap bolts the
+   smaller door and leaves the bigger one open.
+2. **Compression bombs defeat the `Content-Length` pre-check.** Verified: a
+   133-byte gzip body decodes to 100,000 bytes (`aiter_bytes()` yields the
+   *decoded* stream; `Content-Length` reports the *compressed* 133). Real bombs
+   run ~1000:1. A header pre-check waves these straight through.
+
+Feasibility — the reason this was deferred — is resolved: the audit feared a
+true mid-read cap needs httpx2 private API. It does not. `httpx2.{Async,}Client`
+expose `send(request, stream=True)`, `Response.aiter_bytes()/iter_bytes()`, and a
+public `Response(content=...)` constructor. That is the whole mechanism.
+
+## Non-goals
+
+- **A request-body cap.** This bounds response bodies only.
+- **Capping user-driven `stream()` iteration.** When the caller iterates chunks
+  themselves they own the memory; capping it would defeat `stream()`.
+- **A general per-connection limit.** That is httpx2's `limits`; orthogonal.
+- **Reporting the true oversized body size.** When the accumulator trips we stop
+  at the first chunk over the line and do not know (and will not fabricate) the
+  total.
+- **Preserving `.elapsed` on the capped path.** See Risk; an inherent cost of
+  rebuilding the `Response` via public API.
+
+## Design
+
+### 1. One knob: `max_response_body_bytes` (replaces `max_error_body_bytes`)
+
+Both `Client` and `AsyncClient` take `max_response_body_bytes: int | None = None`.
+`None` (default) is unbounded — backward-compatible behavior. The old
+`max_error_body_bytes` is **deleted outright** (no deprecation shim — acceptable
+pre-1.0). Construction validates `>= 1` and raises
+`ValueError("max_response_body_bytes must be >= 1")`, matching the
+`failure_threshold` idiom in `circuit_breaker.py`. `0`/negative are rejected;
+`None` is the only way to disable.
+
+### 2. The cap counts decoded bytes; `Content-Length` is early-reject only
+
+The accumulator counts what `aiter_bytes()` yields (decoded / decompressed),
+because decoded size is the actual memory footprint and is the only thing that
+stops a compression bomb. The declared `Content-Length` header (the *compressed*
+size) is used **only** as an early reject — if even the compressed size already
+exceeds the cap, the decoded body certainly will, so we fail before reading a
+byte. It is **never** an early accept: a small/absent `Content-Length` says
+nothing about decoded size, so the accumulator always runs regardless.
+
+### 3. Shared capped reader + pure accumulator core
+
+A pure core, trivially property-testable:
+
+```python
+def _accumulate_capped(chunks: Iterable[bytes], cap: int) -> bytes:
+    buf = bytearray()
+    for chunk in chunks:
+        buf += chunk
+        if len(buf) > cap:
+            raise _CapExceeded(read=len(buf))   # internal signal
+    return bytes(buf)
+```
+
+`bytearray` grown in place (no transient list + `b"".join` double allocation),
+one `bytes()` at the end. A sync `_read_capped` and async `_read_capped_async`
+wrap it with the early reject and the `Response` rebuild:
+
+```python
+async def _read_capped_async(response, cap, request) -> httpx2.Response:
+    cl = _parse_content_length(response.headers.get("content-length"))
+    if cl is not None and cl > cap:
+        raise ResponseTooLargeError(status_code=response.status_code, limit=cap,
+                                    content_length=cl, reason="declared")
+    try:
+        content = _accumulate_capped_sync_over(response.aiter_bytes(), cap)  # async variant
+    except _CapExceeded:
+        raise ResponseTooLargeError(status_code=response.status_code, limit=cap,
+                                    content_length=cl, reason="streamed")
+    return httpx2.Response(status_code=response.status_code, headers=response.headers,
+                           content=content, request=request,
+                           extensions=_safe_extensions(response.extensions),
+                           history=response.history)
+```
+
+`_read_capped` takes a *Response*, not a client — so it is agnostic to whether
+the response came from the request-based terminal `send(stream=True)` or
+`stream()`'s method+url path. It never closes the stream; the caller owns
+lifecycle. `_safe_extensions` copies `http_version`/`reason_phrase` and drops the
+now-stale `network_stream` (the buffered Response never uses it).
+
+### 4. Terminal: branch on `cap is None`
+
+`_terminal` keeps the plain fast path when the cap is off, so non-cap users pay
+zero streaming overhead and keep `.elapsed`. Only when a cap is set does it
+stream and route through `_read_capped`, owning the stream lifecycle:
+
+```python
+async def _terminal(self, request):
+    async with _httpx2_exception_mapper():
+        if self._max_response_body_bytes is None:
+            response = await self._httpx2_client.send(request)        # unchanged fast path
+        else:
+            resp = await self._httpx2_client.send(request, stream=True)
+            try:
+                response = await _read_capped_async(resp, self._max_response_body_bytes, request)
+            finally:
+                await resp.aclose()
+    _raise_on_status_error(response)
+    return response
+```
+
+### 5. `stream()`: error pre-read routed through the same helper
+
+`stream()`'s existing 4xx/5xx pre-read (`await response.aread()`, guarded today
+by the `Content-Length`-only check) is replaced by `_read_capped`. The user-driven
+success path is untouched. This is the only place `stream()` itself buffers, so
+the cap reaches it there and nowhere else; `exc.response.content` still works,
+now bounded, and chunked/bombed error bodies are caught instead of waved through.
+
+### 6. `ResponseTooLargeError` gains an explicit `reason`
+
+Status-agnostic now (`status_code` can be `200`). Two trip modes carry different
+information, so the discriminator is explicit rather than inferred:
+
+- `limit: int` — the cap (always known).
+- `status_code: int` — always known; distinguishes a 200 trip from a 5xx.
+- `content_length: int | None` — the server's *declared* header, nullable,
+  informational only.
+- `reason: typing.Literal["declared", "streamed"]` — `"declared"` = early reject
+  on `Content-Length`; `"streamed"` = accumulator crossed the cap (the
+  bomb/chunked case). No `bytes_read`/"actual size" — never measured, never
+  fabricated.
+
+Stays a non-status `ClientError` with the existing `__init__` + `__reduce__`
+(per the `errors.md` rule). Message reads correctly per mode.
+
+### 7. Resilience interaction (falls out of the hierarchy, no special-casing)
+
+Because `ResponseTooLargeError` is a `ClientError` (not
+`StatusError`/`NetworkError`/`TimeoutError`):
+
+- **Retry** (`_RETRYABLE_EXCEPTIONS`): not retryable — an over-cap body recurs;
+  retrying wastes bandwidth.
+- **Circuit breaker**: not a counted failure — hits `except BaseException`, slot
+  released, neither success nor failure recorded. Cannot trip the breaker.
+- **Bulkhead**: releases its slot normally.
+
+**Cap-wins / fail-hard:** an otherwise-retryable 5xx whose body exceeds the cap
+trips `_read_capped` before status classification, so it surfaces as
+`ResponseTooLargeError` (non-retryable) rather than the `StatusError`. Accepted:
+the cap is a hard memory-safety limit, retrying would re-fetch the same giant
+body, and producing the `StatusError` would require the very buffering we are
+refusing. A pathological case (transient error carrying a multi-GB body); a user
+who sets a cap is explicitly refusing it.
+
+## Testing
+
+- **Pure core — Hypothesis** (`tests/test_capped_read_props.py`): over arbitrary
+  chunk partitions of a body × arbitrary cap, `_accumulate_capped` raises iff
+  `len(body) > cap` and returns `body` byte-for-byte otherwise (chunk-boundary
+  independence — the one subtle invariant).
+- **Integration (`MockTransport`, sync + async parity):** within-cap passes;
+  exactly-at-cap passes (boundary); declared `Content-Length` over cap →
+  `reason="declared"`, zero bytes read; chunked / no `Content-Length` over cap →
+  `reason="streamed"`; gzip bomb (133 → 100 K) → `reason="streamed"`;
+  empty/204/HEAD pass; `ValueError` on `cap < 1`.
+- **Resilience:** retry does not retry a `ResponseTooLargeError`; breaker does
+  not trip; an over-cap retryable 5xx surfaces as `ResponseTooLargeError`.
+- **`stream()`:** error pre-read is bounded (declared + streamed); user-driven
+  success streaming is never capped.
+- `just lint && just test` green; coverage preserved.
+
+## Risk
+
+- **`.elapsed` dropped on the capped path** (likely × low). Rebuilding the
+  `Response` via public API loses `.elapsed`, which httpx2 only sets on its own
+  buffered send. Only affects clients that set a cap *and* read `.elapsed`.
+  Mitigation: the `cap is None` fast path preserves it for everyone else;
+  document the caveat in `architecture/client.md`.
+- **Breaking removal of `max_error_body_bytes`** (certain × low). A shipped,
+  exported, documented param disappears. Acceptable pre-1.0; called out in
+  release notes. No silent behavior change — the name is gone, construction
+  fails loudly if still passed.
+- **Stale `extensions` on the rebuilt Response** (unlikely × low). Mitigated by
+  `_safe_extensions` dropping `network_stream`.
+- **Streaming-path overhead vs `send()`** (certain × low). Only paid when a cap
+  is set; the fast path is untouched.
+
+## Operations
+
+None — no out-of-repo steps.
+
+## Out of scope
+
+- Deprecation shim for `max_error_body_bytes` (deleted, not aliased).
+- Request-body caps, per-connection limits, capping user-driven `stream()`.
diff --git a/planning/changes/2026-06-23.03-response-body-cap/plan.md b/planning/changes/2026-06-23.03-response-body-cap/plan.md
new file mode 100644
index 0000000..060315d
--- /dev/null
+++ b/planning/changes/2026-06-23.03-response-body-cap/plan.md
@@ -0,0 +1,298 @@
+---
+status: shipped
+date: 2026-06-23
+slug: response-body-cap
+spec: response-body-cap
+pr: 78
+---
+
+# response-body-cap — implementation plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use
+> superpowers:subagent-driven-development (recommended) or
+> superpowers:executing-plans to implement this plan task-by-task. Steps
+> use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Replace error-only `max_error_body_bytes` with a status-agnostic,
+decoded-byte `max_response_body_bytes` cap enforced by a shared streaming
+capped-accumulator on both the terminal and `stream()`'s error pre-read.
+
+**Spec:** [`design.md`](./design.md)
+
+**Branch:** `feat/response-body-cap`
+
+**Commit strategy:** Per-task commits. TDD: each behavioral task writes the
+failing test first, then the implementation.
+
+---
+
+### Task 1: `ResponseTooLargeError` gains `reason`
+
+**Files:**
+- Modify: `src/httpware/errors.py`
+- Modify: `tests/test_errors.py` (or the suite that covers `ResponseTooLargeError`)
+
+Make the error status-agnostic-aware with an explicit trip-mode discriminator.
+No client wiring yet.
+
+- [ ] **Step 1: Write failing tests**
+
+  Assert `ResponseTooLargeError(status_code=200, limit=10, content_length=None,
+  reason="streamed")` constructs, exposes all four fields, and round-trips through
+  `pickle` (exercises `__reduce__`). Add a `reason="declared"` case. Assert the
+  message text differs sensibly per `reason`. Run: `just test tests/test_errors.py`
+  — red.
+
+- [ ] **Step 2: Add the field**
+
+  Add `reason: typing.Literal["declared", "streamed"]` to the class body and
+  `__init__` (keyword-only), thread it into the message and `__reduce__` /
+  `_reconstruct_response_too_large`. Keep it a non-status `ClientError`. Run:
+  `just test tests/test_errors.py` — green.
+
+- [ ] **Step 3: Commit**
+
+  ```bash
+  git add src/httpware/errors.py tests/test_errors.py
+  git commit -m "feat: add reason discriminator to ResponseTooLargeError
+
+  Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
+  ```
+
+---
+
+### Task 2: Pure `_accumulate_capped` core + Hypothesis property test
+
+**Files:**
+- Modify: `src/httpware/client.py`
+- Create: `tests/test_capped_read_props.py`
+
+The one subtle invariant — chunk-boundary independence — isolated behind a pure
+function before any I/O wiring.
+
+- [ ] **Step 1: Write the property test (red)**
+
+  In `tests/test_capped_read_props.py`, use Hypothesis to draw a body (`bytes`)
+  and a partition into chunks, plus a `cap >= 1`. Assert: `_accumulate_capped`
+  returns `body` byte-for-byte when `len(body) <= cap`, and raises `_CapExceeded`
+  when `len(body) > cap` — independent of how the body is split. Annotate test
+  args. Run: `just test tests/test_capped_read_props.py` — red (symbols absent).
+
+- [ ] **Step 2: Implement the core**
+
+  Add module-level `class _CapExceeded(Exception)` (carries `read: int`) and
+  `def _accumulate_capped(chunks: Iterable[bytes], cap: int) -> bytes` using a
+  `bytearray` grown in place, raising `_CapExceeded(read=len(buf))` the moment
+  `len(buf) > cap`. Run: `just test tests/test_capped_read_props.py` — green.
+
+- [ ] **Step 3: Commit**
+
+  ```bash
+  git add src/httpware/client.py tests/test_capped_read_props.py
+  git commit -m "feat: add pure _accumulate_capped core with property test
+
+  Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
+  ```
+
+---
+
+### Task 3: `_read_capped` sync/async wrappers + `_safe_extensions`
+
+**Files:**
+- Modify: `src/httpware/client.py`
+- Modify: `tests/test_client.py` (or a focused `tests/test_capped_read.py`)
+
+Wrap the core with the `Content-Length` early reject and the `Response` rebuild.
+Helpers take a `Response`, not a client; they never close the stream.
+
+- [ ] **Step 1: Write failing unit tests**
+
+  Build streaming responses via `MockTransport` + `httpx2.{Async,}Client` and call
+  `_read_capped` / `_read_capped_async` directly (or through a thin harness):
+  within-cap returns a buffered `Response` with byte-identical `.content`;
+  declared `Content-Length > cap` raises `reason="declared"` having read zero;
+  chunked over-cap raises `reason="streamed"`; gzip bomb (133 → 100 K) raises
+  `reason="streamed"`; rebuilt `Response.extensions` has no `network_stream` but
+  keeps `http_version`. Run — red.
+
+- [ ] **Step 2: Implement**
+
+  Add `_safe_extensions(ext)` (copy, preserve `http_version`/`reason_phrase`, drop
+  `network_stream`), then `_read_capped` (sync, `iter_bytes`) and
+  `_read_capped_async` (async, `aiter_bytes`). Each: parse `Content-Length` via
+  `_parse_content_length`, early-reject → `ResponseTooLargeError(reason="declared")`;
+  feed the byte iterator to `_accumulate_capped`, `except _CapExceeded` →
+  `ResponseTooLargeError(reason="streamed")`; else rebuild
+  `httpx2.Response(status_code=…, headers=…, content=…, request=…,
+  extensions=_safe_extensions(…), history=…)`. Run — green.
+
+- [ ] **Step 3: Commit**
+
+  ```bash
+  git add src/httpware/client.py tests/
+  git commit -m "feat: add shared _read_capped streaming accumulator
+
+  Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
+  ```
+
+---
+
+### Task 4: Rename param, validate, branch the terminal (both clients)
+
+**Files:**
+- Modify: `src/httpware/client.py`
+- Modify: `tests/test_client.py`
+
+Swap `max_error_body_bytes` → `max_response_body_bytes` on `AsyncClient` and
+`Client`; delete the old name entirely; wire the terminal.
+
+- [ ] **Step 1: Write failing tests**
+
+  For both clients: `ValueError` when `max_response_body_bytes < 1` (test `0` and
+  `-1`); a non-streaming `send()` against an over-cap body raises
+  `ResponseTooLargeError` (declared and streamed); within-cap `send()` returns
+  normally with intact `.content`; `max_response_body_bytes=None` leaves behavior
+  unchanged. Run — red.
+
+- [ ] **Step 2: Implement**
+
+  Rename the ctor param + `self._max_*` attr on both clients; add the `>= 1`
+  validation raising `ValueError("max_response_body_bytes must be >= 1")`. In
+  `_terminal` / sync terminal: branch on `is None` — keep plain `send(request)`
+  fast path; else `send(request, stream=True)` inside `try/finally: aclose()`,
+  routed through `_read_capped[_async]`. Keep `_raise_on_status_error` after.
+  Run — green.
+
+- [ ] **Step 3: Commit**
+
+  ```bash
+  git add src/httpware/client.py tests/test_client.py
+  git commit -m "feat!: replace max_error_body_bytes with max_response_body_bytes
+
+  Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
+  ```
+
+---
+
+### Task 5: Route `stream()` error pre-read through `_read_capped`
+
+**Files:**
+- Modify: `src/httpware/client.py`
+- Modify: `tests/test_client.py` (streaming cases)
+
+Replace the `Content-Length`-only block + `await response.aread()` in both
+`stream()` methods with the shared helper; leave user-driven streaming uncapped.
+
+- [ ] **Step 1: Write failing tests**
+
+  In `stream()`: an over-cap 4xx/5xx error body raises `ResponseTooLargeError`
+  (declared and streamed, incl. a chunked/no-`Content-Length` case); a within-cap
+  error still raises the `StatusError` with `exc.response.content` populated; a
+  user iterating a large **2xx** body is never capped. Sync + async. Run — red.
+
+- [ ] **Step 2: Implement**
+
+  In each `stream()` error branch (`400 <= status < 600`): replace the guard +
+  `aread()` with `capped = _read_capped[_async](response, cap, response.request)`
+  then `_raise_on_status_error(capped)`. Only when `cap is not None`; otherwise
+  keep the existing unbounded `aread()`. Do not touch the success `yield`. Run —
+  green.
+
+- [ ] **Step 3: Commit**
+
+  ```bash
+  git add src/httpware/client.py tests/test_client.py
+  git commit -m "feat: bound stream() error pre-read via _read_capped
+
+  Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
+  ```
+
+---
+
+### Task 6: Resilience-interaction tests
+
+**Files:**
+- Modify: `tests/` (retry + circuit-breaker suites)
+
+Lock the fall-out behavior so a future refactor can't silently make
+`ResponseTooLargeError` retryable or breaker-counting.
+
+- [ ] **Step 1: Write tests (expect green)**
+
+  With a retry middleware wrapping an over-cap response: assert exactly one
+  terminal attempt and `ResponseTooLargeError` propagates (not retried). With a
+  circuit breaker: assert a cap trip records neither success nor failure and never
+  opens the breaker. Assert an over-cap **retryable 5xx** surfaces as
+  `ResponseTooLargeError`, not the `StatusError` (cap-wins). Run — green (no prod
+  code change expected; if red, the hierarchy assumption broke — stop and
+  reconcile with the spec).
+
+- [ ] **Step 2: Commit**
+
+  ```bash
+  git add tests/
+  git commit -m "test: lock ResponseTooLargeError resilience semantics
+
+  Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
+  ```
+
+---
+
+### Task 7: Docs, deferred cleanup, index, release notes
+
+**Files:**
+- Modify: `architecture/client.md`, `architecture/errors.md`
+- Modify: `planning/deferred.md`
+- Modify: `planning/changes/README.md` (generated — via `just index`)
+- Create: `planning/releases/<next-version>.md` (if a release is cut)
+- Modify: `design.md`/`plan.md` frontmatter (`status: shipped`, `pr`, `outcome`)
+
+Promote conclusions into the living architecture docs and retire the deferred
+item.
+
+- [ ] **Step 1: Architecture docs**
+
+  Rewrite `architecture/client.md` "Bounded error bodies" → "Bounded response
+  bodies": status-agnostic, decoded-byte, bomb-aware, `Content-Length`
+  early-reject-only, `stream()` interaction, `cap is None` fast path, and the
+  `.elapsed` caveat. Update the `ResponseTooLargeError` entry in
+  `architecture/errors.md` (new `reason`, status-agnostic semantics).
+
+- [ ] **Step 2: Retire the deferred item**
+
+  Remove the "Non-streaming hard response-body cap" bullet from
+  `planning/deferred.md`.
+
+- [ ] **Step 3: Regenerate the index**
+
+  ```bash
+  just index
+  ```
+
+- [ ] **Step 4: Set ship frontmatter + commit**
+
+  Set `status: shipped` + `pr` + `outcome` on both bundle files. Add release
+  notes if a version is cut (note the breaking `max_error_body_bytes` removal).
+
+  ```bash
+  git add architecture/ planning/
+  git commit -m "docs: promote response-body cap into architecture; retire deferred item
+
+  Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
+  ```
+
+---
+
+### Task 8: Full verification
+
+- [ ] **Step 1: Lint + full suite**
+
+  ```bash
+  just lint && just test
+  ```
+
+  Confirm green and coverage preserved. Grep guard:
+  `grep -rE 'httpx2\._' src/httpware/` returns nothing;
+  `grep -rn 'max_error_body_bytes' src/ architecture/` returns nothing.
+
+- [ ] **Step 2: Open the PR** per `finishing-a-development-branch`.
diff --git a/planning/deferred.md b/planning/deferred.md
index 608d205..4624f45 100644
--- a/planning/deferred.md
+++ b/planning/deferred.md
@@ -16,7 +16,3 @@ As of 0.7.0, all planned epics (3, 4, 5, 6) are closed — see the [change Index
 
   - **Count-based window variant** (`window_type="count"`) — time-based + `minimum_calls` already covers the fixed-sample-size rationale, and count-based adds a real staleness downside for HTTP health detection (a low-traffic "last N calls" window can reflect outcomes from minutes ago). Polly v8 *removed* count-based; Hystrix and Envoy are time-based. For a spiky low-volume backend, a longer `window_seconds` + `minimum_calls` is the better tool. Revisit only on concrete Resilience4j-parity demand.
   - **Slow-call-rate dimension** — Resilience4j-only, and redundant with `AsyncTimeout`.
-
-### Documentation
-
-- **Non-streaming hard response-body cap** (2026-06-14 deep audit, Medium) — for a non-streaming `send()`, httpx2 buffers the whole body before httpware reaches the decode seam, so a true cap needs a streaming-with-capped-accumulator rework of the Seam-A terminal. The current `max_error_body_bytes` guard only applies at `stream()` entry and only when `Content-Length` is declared. Revisit trigger: the Seam-A terminal is next reworked, or a concrete large-response abuse is reported. (`src/httpware/client.py`)
diff --git a/planning/releases/0.15.0.md b/planning/releases/0.15.0.md
new file mode 100644
index 0000000..7146fcf
--- /dev/null
+++ b/planning/releases/0.15.0.md
@@ -0,0 +1,62 @@
+# httpware 0.15.0 — status-agnostic response-body cap (`max_response_body_bytes`)
+
+**Minor release. Contains one breaking change** (pre-1.0): the opt-in
+`max_error_body_bytes` parameter is replaced by `max_response_body_bytes`.
+
+This release turns the error-only body guard into a real, status-agnostic memory
+cap that is actually enforced on the non-streaming `send()` path and against
+compression bombs.
+
+## Breaking change
+
+`max_error_body_bytes` is **removed** and replaced by `max_response_body_bytes`
+on both `Client` and `AsyncClient`. There is no compatibility alias — passing the
+old keyword raises `TypeError`.
+
+```python
+# before
+client = AsyncClient(max_error_body_bytes=1_000_000)
+# after
+client = AsyncClient(max_response_body_bytes=1_000_000)
+```
+
+`None` (the default) remains unbounded. A non-`None` value below `1` is now
+rejected with `ValueError` at construction.
+
+## What changed and why
+
+The old `max_error_body_bytes` only fired inside `stream()`, only on 4xx/5xx, and
+only as a declared-`Content-Length` pre-check. For a non-streaming `send()`,
+httpx2 buffered the whole body before httpware got control, so the hot path had
+no cap at all — and a small compressed body could decode to something enormous
+(a 133-byte gzip body decodes to 100 KB; real bombs run ~1000:1) and slip past a
+header check entirely.
+
+`max_response_body_bytes`:
+
+- Is **status-agnostic** — a `200` is capped the same as a `500`. Memory
+  exhaustion has no status code, and the success path is the larger surface.
+- Counts **decoded** bytes (the in-memory footprint), so compression bombs are
+  caught.
+- Is enforced at the non-streaming terminal (`send()` and the per-verb helpers)
+  via a streaming capped-accumulator, and on `stream()`'s internal error
+  pre-read. **User-driven `stream()` iteration is never capped.**
+- Fails fast with `ResponseTooLargeError`, which now carries a `reason` field:
+  `"declared"` (declared `Content-Length` over the cap, rejected before a byte is
+  read) or `"streamed"` (the decoded body crossed the cap mid-read).
+
+The declared `Content-Length` is kept only as an early reject (never an early
+accept), so chunked and bomb bodies are always run through the accumulator.
+
+## Semantics
+
+- `ResponseTooLargeError` is a non-status `ClientError`: it is **not retried** and
+  does **not** count toward the circuit breaker.
+- An otherwise-retryable 5xx whose body exceeds the cap surfaces as
+  `ResponseTooLargeError` (cap-wins / fail-hard), not the status error — retrying
+  would only re-fetch the oversized body.
+- On the capped path the buffered response is rebuilt via the public
+  `httpx2.Response(content=...)` constructor and therefore has no `.elapsed`. The
+  default (`None`-cap) fast path keeps plain `send()` and preserves `.elapsed`.
+
+All public API is honored — no httpx2 private access.
diff --git a/src/httpware/client.py b/src/httpware/client.py
index a3e086b..a6b52bf 100644
--- a/src/httpware/client.py
+++ b/src/httpware/client.py
@@ -2,7 +2,7 @@
 
 import contextlib
 import typing
-from collections.abc import AsyncIterator, Iterator, Sequence
+from collections.abc import AsyncIterator, Iterator, Mapping, Sequence
 from http import HTTPStatus
 
 import httpx2
@@ -32,6 +32,15 @@
 )
 
 
+_MAX_RESPONSE_BODY_BYTES_INVALID = "max_response_body_bytes must be >= 1"
+
+
+def _validate_max_response_body_bytes(cap: int | None) -> None:
+    """Reject a non-None cap below 1. None means unbounded (the default)."""
+    if cap is not None and cap < 1:
+        raise ValueError(_MAX_RESPONSE_BODY_BYTES_INVALID)
+
+
 def _parse_content_length(raw: str | None) -> int | None:
     """Return a non-negative int Content-Length, or None for missing/garbage. Never raises."""
     if raw is None:
@@ -43,6 +52,123 @@ def _parse_content_length(raw: str | None) -> int | None:
     return value if value >= 0 else None
 
 
+class _CapExceeded(Exception):  # noqa: N818 — internal control-flow signal, not a user-facing error
+    """Internal signal: decoded bytes crossed the cap mid-read. Carries bytes read so far."""
+
+    def __init__(self, *, read: int) -> None:
+        self.read = read
+        super().__init__(f"decoded body exceeded cap after {read} bytes")
+
+
+def _accumulate_capped(chunks: typing.Iterable[bytes], cap: int) -> bytes:
+    """Concatenate `chunks`, raising `_CapExceeded` the moment the running total exceeds `cap`.
+
+    Counts decoded bytes (the in-memory footprint). Grown in a single bytearray
+    so there is no transient list-plus-join double allocation.
+    """
+    buf = bytearray()
+    for chunk in chunks:
+        buf += chunk
+        if len(buf) > cap:
+            raise _CapExceeded(read=len(buf))
+    return bytes(buf)
+
+
+def _safe_extensions(extensions: Mapping[str, typing.Any]) -> dict[str, typing.Any]:
+    """Copy response extensions, dropping the now-stale `network_stream`.
+
+    The rebuilt buffered Response never touches its network stream, so carrying a
+    consumed/closed one wholesale is sloppy. `http_version`/`reason_phrase` and
+    any other keys are preserved.
+    """
+    return {key: value for key, value in extensions.items() if key != "network_stream"}
+
+
+# Headers describing the wire encoding of the body. The accumulator yields the
+# DECODED body, so these no longer apply; httpx2 recomputes content-length from
+# the buffered content. Carrying content-encoding forward makes httpx2 try to
+# re-decode already-decoded bytes and raise.
+_WIRE_BODY_HEADERS = ("content-encoding", "content-length", "transfer-encoding")
+_BODILESS_STATUS = frozenset({HTTPStatus.NO_CONTENT, HTTPStatus.NOT_MODIFIED})  # 204, 304
+
+
+def _buffered_headers(headers: httpx2.Headers) -> httpx2.Headers:
+    """Copy `headers`, stripping wire-encoding headers stale after decoding+buffering."""
+    out = httpx2.Headers(headers)
+    for name in _WIRE_BODY_HEADERS:
+        if name in out:
+            del out[name]
+    return out
+
+
+def _response_has_body(method: str, status_code: int) -> bool:
+    """Whether a response carries a message body (RFC 9110 §6.4.1).
+
+    HEAD responses and 204/304 never have a body regardless of a declared
+    Content-Length, so they must never trip the cap.
+    """
+    return method.upper() != "HEAD" and status_code not in _BODILESS_STATUS
+
+
+def _read_capped(response: httpx2.Response, cap: int, request: httpx2.Request) -> httpx2.Response:
+    """Buffer a streaming sync `response` under `cap` decoded bytes; return a buffered Response.
+
+    Raises `ResponseTooLargeError` (reason="declared") if the declared
+    Content-Length already exceeds `cap` — before any byte is read — and
+    (reason="streamed") if the decoded body crosses `cap` mid-read. Does not
+    close `response`; the caller owns the stream lifecycle.
+    """
+    if not _response_has_body(request.method, response.status_code):
+        response.read()  # empty body; preserve the original response (and its headers)
+        return response
+    content_length = _parse_content_length(response.headers.get("content-length"))
+    if content_length is not None and content_length > cap:
+        raise ResponseTooLargeError(
+            status_code=response.status_code, limit=cap, content_length=content_length, reason="declared"
+        )
+    try:
+        content = _accumulate_capped(response.iter_bytes(), cap)
+    except _CapExceeded:
+        raise ResponseTooLargeError(
+            status_code=response.status_code, limit=cap, content_length=content_length, reason="streamed"
+        ) from None
+    return httpx2.Response(
+        status_code=response.status_code,
+        headers=_buffered_headers(response.headers),
+        content=content,
+        request=request,
+        extensions=_safe_extensions(response.extensions),
+        history=response.history,
+    )
+
+
+async def _read_capped_async(response: httpx2.Response, cap: int, request: httpx2.Request) -> httpx2.Response:
+    """Async mirror of `_read_capped` (counts decoded bytes from `aiter_bytes`)."""
+    if not _response_has_body(request.method, response.status_code):
+        await response.aread()  # empty body; preserve the original response (and its headers)
+        return response
+    content_length = _parse_content_length(response.headers.get("content-length"))
+    if content_length is not None and content_length > cap:
+        raise ResponseTooLargeError(
+            status_code=response.status_code, limit=cap, content_length=content_length, reason="declared"
+        )
+    buf = bytearray()
+    async for chunk in response.aiter_bytes():
+        buf += chunk
+        if len(buf) > cap:
+            raise ResponseTooLargeError(
+                status_code=response.status_code, limit=cap, content_length=content_length, reason="streamed"
+            )
+    return httpx2.Response(
+        status_code=response.status_code,
+        headers=_buffered_headers(response.headers),
+        content=bytes(buf),
+        request=request,
+        extensions=_safe_extensions(response.extensions),
+        history=response.history,
+    )
+
+
 def _build_default_decoders() -> tuple[ResponseDecoder, ...]:
     """Construct the default decoder tuple based on installed extras.
 
@@ -94,7 +220,7 @@ class AsyncClient:
     _decoders: tuple[ResponseDecoder, ...]
     _user_middleware: tuple[AsyncMiddleware, ...]
     _dispatch: AsyncNext
-    _max_error_body_bytes: int | None
+    _max_response_body_bytes: int | None
 
     def __init__(  # noqa: PLR0913 — wide constructor is the cost of a single-call API
         self,
@@ -109,8 +235,9 @@ def __init__(  # noqa: PLR0913 — wide constructor is the cost of a single-call
         httpx2_client: httpx2.AsyncClient | None = None,
         decoders: Sequence[ResponseDecoder] | None = None,
         middleware: Sequence[AsyncMiddleware] = (),
-        max_error_body_bytes: int | None = None,
+        max_response_body_bytes: int | None = None,
     ) -> None:
+        _validate_max_response_body_bytes(max_response_body_bytes)
         if httpx2_client is not None:
             forwarded = {
                 "base_url": base_url,
@@ -148,12 +275,20 @@ def __init__(  # noqa: PLR0913 — wide constructor is the cost of a single-call
         self._decoder_resolver = _DecoderResolver(self._decoders)
         self._user_middleware = tuple(middleware)
         self._dispatch = compose_async(self._user_middleware, self._terminal)
-        self._max_error_body_bytes = max_error_body_bytes
+        self._max_response_body_bytes = max_response_body_bytes
 
     async def _terminal(self, request: httpx2.Request) -> httpx2.Response:
+        cap = self._max_response_body_bytes
         try:
             async with _httpx2_exception_mapper():
-                response = await self._httpx2_client.send(request)
+                if cap is None:
+                    response = await self._httpx2_client.send(request)
+                else:
+                    streaming = await self._httpx2_client.send(request, stream=True)
+                    try:
+                        response = await _read_capped_async(streaming, cap, request)
+                    finally:
+                        await streaming.aclose()
         except RuntimeError as exc:
             if self._httpx2_client.is_closed:
                 raise TransportError(str(exc)) from exc
@@ -1015,16 +1150,13 @@ async def stream(  # noqa: PLR0913, C901 — mirrors httpx2 per-method signature
 
         async with _httpx2_exception_mapper(), self._httpx2_client.stream(method, url, **kwargs) as response:
             if HTTPStatus.BAD_REQUEST <= response.status_code < 600:  # noqa: PLR2004 — 600 is the synthetic upper bound for 5xx
-                if self._max_error_body_bytes is not None:
-                    content_length = _parse_content_length(response.headers.get("content-length"))
-                    if content_length is not None and content_length > self._max_error_body_bytes:
-                        raise ResponseTooLargeError(
-                            status_code=response.status_code,
-                            limit=self._max_error_body_bytes,
-                            content_length=content_length,
-                        )
-                await response.aread()  # pre-read body so exc.response.content works
-                _raise_on_status_error(response)
+                cap = self._max_response_body_bytes
+                if cap is None:
+                    await response.aread()  # pre-read body so exc.response.content works
+                    _raise_on_status_error(response)
+                else:
+                    # Bound the error pre-read; raises ResponseTooLargeError when over cap.
+                    _raise_on_status_error(await _read_capped_async(response, cap, response.request))
             yield response
 
     async def __aenter__(self) -> typing.Self:
@@ -1060,7 +1192,7 @@ class Client:
     _decoders: tuple[ResponseDecoder, ...]
     _user_middleware: tuple[Middleware, ...]
     _dispatch: Next
-    _max_error_body_bytes: int | None
+    _max_response_body_bytes: int | None
 
     def __init__(  # noqa: PLR0913 — wide constructor is the cost of a single-call API
         self,
@@ -1075,8 +1207,9 @@ def __init__(  # noqa: PLR0913 — wide constructor is the cost of a single-call
         httpx2_client: httpx2.Client | None = None,
         decoders: Sequence[ResponseDecoder] | None = None,
         middleware: Sequence[Middleware] = (),
-        max_error_body_bytes: int | None = None,
+        max_response_body_bytes: int | None = None,
     ) -> None:
+        _validate_max_response_body_bytes(max_response_body_bytes)
         if httpx2_client is not None:
             forwarded = {
                 "base_url": base_url,
@@ -1114,12 +1247,20 @@ def __init__(  # noqa: PLR0913 — wide constructor is the cost of a single-call
         self._decoder_resolver = _DecoderResolver(self._decoders)
         self._user_middleware = tuple(middleware)
         self._dispatch = compose(self._user_middleware, self._terminal)
-        self._max_error_body_bytes = max_error_body_bytes
+        self._max_response_body_bytes = max_response_body_bytes
 
     def _terminal(self, request: httpx2.Request) -> httpx2.Response:
+        cap = self._max_response_body_bytes
         try:
             with _httpx2_exception_mapper_sync():
-                response = self._httpx2_client.send(request)
+                if cap is None:
+                    response = self._httpx2_client.send(request)
+                else:
+                    streaming = self._httpx2_client.send(request, stream=True)
+                    try:
+                        response = _read_capped(streaming, cap, request)
+                    finally:
+                        streaming.close()
         except RuntimeError as exc:
             if self._httpx2_client.is_closed:
                 raise TransportError(str(exc)) from exc
@@ -2003,14 +2144,11 @@ def stream(  # noqa: PLR0913, C901 — mirrors httpx2 per-method signatures; kwa
 
         with _httpx2_exception_mapper_sync(), self._httpx2_client.stream(method, url, **kwargs) as response:
             if HTTPStatus.BAD_REQUEST <= response.status_code < 600:  # noqa: PLR2004 — 600 is the synthetic upper bound for 5xx
-                if self._max_error_body_bytes is not None:
-                    content_length = _parse_content_length(response.headers.get("content-length"))
-                    if content_length is not None and content_length > self._max_error_body_bytes:
-                        raise ResponseTooLargeError(
-                            status_code=response.status_code,
-                            limit=self._max_error_body_bytes,
-                            content_length=content_length,
-                        )
-                response.read()  # pre-read body so exc.response.content works
-                _raise_on_status_error(response)
+                cap = self._max_response_body_bytes
+                if cap is None:
+                    response.read()  # pre-read body so exc.response.content works
+                    _raise_on_status_error(response)
+                else:
+                    # Bound the error pre-read; raises ResponseTooLargeError when over cap.
+                    _raise_on_status_error(_read_capped(response, cap, response.request))
             yield response
diff --git a/src/httpware/errors.py b/src/httpware/errors.py
index d14f255..a539aee 100644
--- a/src/httpware/errors.py
+++ b/src/httpware/errors.py
@@ -15,7 +15,7 @@
 
 import builtins
 from collections.abc import Mapping
-from typing import Any
+from typing import Any, Literal
 
 import httpx2
 
@@ -320,33 +320,51 @@ def _reconstruct_response_too_large(
     status_code: int,
     limit: int,
     content_length: int | None,
+    reason: 'Literal["declared", "streamed"]',
 ) -> "ResponseTooLargeError":
-    return cls(status_code=status_code, limit=limit, content_length=content_length)
+    return cls(status_code=status_code, limit=limit, content_length=content_length, reason=reason)
 
 
 class ResponseTooLargeError(ClientError):
-    """Raised when an error response body exceeds the client's max_error_body_bytes cap.
-
-    Fires from stream() on a 4xx/5xx whose declared Content-Length exceeds the
-    configured cap, BEFORE the body is read — so the oversized body is never
-    buffered. Only raised when max_error_body_bytes is set (opt-in).
+    """Raised when a response body exceeds the client's max_response_body_bytes cap.
+
+    Status-agnostic: fires on any non-streaming send() and on stream()'s internal
+    error pre-read, counting DECODED bytes. Only raised when
+    max_response_body_bytes is set (opt-in). `reason` discriminates the two trip
+    modes:
+
+    - "declared": the response's declared Content-Length already exceeds the cap,
+      so the body is rejected BEFORE a byte is read (`content_length` holds it).
+    - "streamed": the decoded body crossed the cap mid-read (the chunked or
+      compression-bomb case); `content_length` is whatever the server declared
+      and is unrelated to the cap. The true oversized size is unknown by design.
     """
 
     status_code: int
     limit: int
     content_length: int | None
+    reason: Literal["declared", "streamed"]
 
-    def __init__(self, *, status_code: int, limit: int, content_length: int | None) -> None:
+    def __init__(
+        self,
+        *,
+        status_code: int,
+        limit: int,
+        content_length: int | None,
+        reason: Literal["declared", "streamed"],
+    ) -> None:
         self.status_code = status_code
         self.limit = limit
         self.content_length = content_length
-        super().__init__(
-            f"error response body too large: status={status_code} "
-            f"content_length={content_length} exceeds max_error_body_bytes={limit}"
-        )
+        self.reason = reason
+        if reason == "declared":
+            detail = f"declared content_length={content_length} exceeds max_response_body_bytes={limit}"
+        else:
+            detail = f"decoded body exceeded max_response_body_bytes={limit}"
+        super().__init__(f"response body too large: status={status_code} {detail}")
 
     def __reduce__(self) -> tuple[Any, ...]:
         return (
             _reconstruct_response_too_large,
-            (type(self), self.status_code, self.limit, self.content_length),
+            (type(self), self.status_code, self.limit, self.content_length, self.reason),
         )
diff --git a/tests/test_capped_read.py b/tests/test_capped_read.py
new file mode 100644
index 0000000..66cec47
--- /dev/null
+++ b/tests/test_capped_read.py
@@ -0,0 +1,241 @@
+"""Unit tests for the shared _read_capped wrappers (sync + async).
+
+Drive real streaming responses through MockTransport, then hand the streaming
+Response to _read_capped / _read_capped_async directly — exercising the
+Content-Length early reject, the decoded-byte accumulator, the rebuilt Response,
+and extension sanitisation, independent of client wiring.
+"""
+
+import gzip
+from collections.abc import AsyncIterator
+
+import httpx2
+import pytest
+
+from httpware.client import _read_capped, _read_capped_async
+from httpware.errors import ResponseTooLargeError
+
+
+def _sync_stream(handler: object, method: str = "GET") -> tuple[httpx2.Client, httpx2.Response]:
+    client = httpx2.Client(transport=httpx2.MockTransport(handler))  # ty: ignore[invalid-argument-type]
+    request = client.build_request(method, "https://example.test/x")
+    return client, client.send(request, stream=True)
+
+
+async def _async_stream(handler: object, method: str = "GET") -> tuple[httpx2.AsyncClient, httpx2.Response]:
+    client = httpx2.AsyncClient(transport=httpx2.MockTransport(handler))  # ty: ignore[invalid-argument-type]
+    request = client.build_request(method, "https://example.test/x")
+    return client, await client.send(request, stream=True)
+
+
+# ---- sync ----
+
+
+def test_read_capped_returns_buffered_response_within_cap() -> None:
+    body = b"hello world"
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client, resp = _sync_stream(handler)
+    try:
+        out = _read_capped(resp, 1000, resp.request)
+        assert out.content == body
+        assert out.status_code == 200  # noqa: PLR2004 — mirrors handler
+        assert "network_stream" not in out.extensions
+    finally:
+        resp.close()
+        client.close()
+
+
+def test_read_capped_declared_content_length_over_cap() -> None:
+    body = b"x" * 200
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(500, content=body)
+
+    client, resp = _sync_stream(handler)
+    try:
+        with pytest.raises(ResponseTooLargeError) as caught:
+            _read_capped(resp, 10, resp.request)
+        assert caught.value.reason == "declared"
+        assert caught.value.content_length == 200  # noqa: PLR2004 — len(body)
+        assert caught.value.limit == 10  # noqa: PLR2004 — cap above
+    finally:
+        resp.close()
+        client.close()
+
+
+def test_read_capped_streamed_over_cap_chunked_no_content_length() -> None:
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=(c for c in (b"a" * 50, b"b" * 50)))
+
+    client, resp = _sync_stream(handler)
+    try:
+        with pytest.raises(ResponseTooLargeError) as caught:
+            _read_capped(resp, 10, resp.request)
+        assert caught.value.reason == "streamed"
+        assert caught.value.content_length is None
+    finally:
+        resp.close()
+        client.close()
+
+
+def test_read_capped_within_cap_gzip_returns_decoded_content() -> None:
+    # Regression: rebuilt Response must not re-decompress already-decoded content.
+    raw = gzip.compress(b"A" * 500)
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-encoding": "gzip"}, content=raw)
+
+    client, resp = _sync_stream(handler)
+    try:
+        out = _read_capped(resp, 1_000_000, resp.request)
+        assert out.content == b"A" * 500  # decoded, not re-gzipped/crashed
+        assert "content-encoding" not in out.headers  # stale wire header dropped
+        assert out.headers["content-length"] == "500"  # recomputed from decoded content
+    finally:
+        resp.close()
+        client.close()
+
+
+def test_read_capped_head_with_large_declared_length_not_rejected() -> None:
+    # Regression: a bodiless HEAD response buffers nothing and must not trip the cap.
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-length": "50000000"})
+
+    client, resp = _sync_stream(handler, method="HEAD")
+    try:
+        out = _read_capped(resp, 1000, resp.request)
+        assert out.content == b""
+        assert out.headers["content-length"] == "50000000"  # entity length preserved for HEAD
+    finally:
+        resp.close()
+        client.close()
+
+
+def test_read_capped_gzip_bomb_trips_on_decoded_bytes() -> None:
+    raw = gzip.compress(b"A" * 100_000)
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-encoding": "gzip"}, content=raw)
+
+    client, resp = _sync_stream(handler)
+    try:
+        with pytest.raises(ResponseTooLargeError) as caught:
+            _read_capped(resp, 1000, resp.request)
+        assert caught.value.reason == "streamed"  # compressed CL (small) passed; decoded tripped
+    finally:
+        resp.close()
+        client.close()
+
+
+def test_read_capped_exact_cap_passes() -> None:
+    body = b"x" * 10
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client, resp = _sync_stream(handler)
+    try:
+        assert _read_capped(resp, 10, resp.request).content == body
+    finally:
+        resp.close()
+        client.close()
+
+
+def test_read_capped_empty_body_passes() -> None:
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(204)
+
+    client, resp = _sync_stream(handler)
+    try:
+        assert _read_capped(resp, 1, resp.request).content == b""
+    finally:
+        resp.close()
+        client.close()
+
+
+# ---- async ----
+
+
+async def test_read_capped_async_returns_buffered_response_within_cap() -> None:
+    body = b"hello world"
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client, resp = await _async_stream(handler)
+    try:
+        out = await _read_capped_async(resp, 1000, resp.request)
+        assert out.content == body
+        assert "network_stream" not in out.extensions
+    finally:
+        await resp.aclose()
+        await client.aclose()
+
+
+async def test_read_capped_async_within_cap_gzip_returns_decoded_content() -> None:
+    raw = gzip.compress(b"A" * 500)
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-encoding": "gzip"}, content=raw)
+
+    client, resp = await _async_stream(handler)
+    try:
+        out = await _read_capped_async(resp, 1_000_000, resp.request)
+        assert out.content == b"A" * 500
+        assert "content-encoding" not in out.headers
+        assert out.headers["content-length"] == "500"
+    finally:
+        await resp.aclose()
+        await client.aclose()
+
+
+async def test_read_capped_async_head_with_large_declared_length_not_rejected() -> None:
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-length": "50000000"})
+
+    client, resp = await _async_stream(handler, method="HEAD")
+    try:
+        out = await _read_capped_async(resp, 1000, resp.request)
+        assert out.content == b""
+        assert out.headers["content-length"] == "50000000"
+    finally:
+        await resp.aclose()
+        await client.aclose()
+
+
+async def test_read_capped_async_declared_over_cap() -> None:
+    body = b"x" * 200
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(500, content=body)
+
+    client, resp = await _async_stream(handler)
+    try:
+        with pytest.raises(ResponseTooLargeError) as caught:
+            await _read_capped_async(resp, 10, resp.request)
+        assert caught.value.reason == "declared"
+        assert caught.value.content_length == 200  # noqa: PLR2004 — len(body)
+    finally:
+        await resp.aclose()
+        await client.aclose()
+
+
+async def test_read_capped_async_streamed_over_cap() -> None:
+    async def body() -> AsyncIterator[bytes]:
+        yield b"a" * 50
+        yield b"b" * 50
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body())
+
+    client, resp = await _async_stream(handler)
+    try:
+        with pytest.raises(ResponseTooLargeError) as caught:
+            await _read_capped_async(resp, 70, resp.request)  # trips on the second 50-byte chunk
+        assert caught.value.reason == "streamed"
+    finally:
+        await resp.aclose()
+        await client.aclose()
diff --git a/tests/test_capped_read_props.py b/tests/test_capped_read_props.py
new file mode 100644
index 0000000..f08d50a
--- /dev/null
+++ b/tests/test_capped_read_props.py
@@ -0,0 +1,55 @@
+"""Hypothesis property tests for the pure _accumulate_capped core.
+
+The one subtle invariant of the response-body cap is chunk-boundary
+independence: the accumulator must behave identically no matter how the decoded
+body is split into chunks. It must raise _CapExceeded iff the total decoded
+length exceeds the cap, and otherwise return the body byte-for-byte.
+"""
+
+import pytest
+from hypothesis import given
+from hypothesis import strategies as st
+
+from httpware.client import _accumulate_capped, _CapExceeded
+
+
+def _partition(body: bytes, sizes: list[int]) -> list[bytes]:
+    """Split `body` into chunks following `sizes` (remainder becomes a final chunk)."""
+    chunks: list[bytes] = []
+    pos = 0
+    for size in sizes:
+        if pos >= len(body):
+            break
+        chunks.append(body[pos : pos + size])
+        pos += size
+    if pos < len(body):
+        chunks.append(body[pos:])
+    return chunks
+
+
+@given(
+    body=st.binary(max_size=2048),
+    sizes=st.lists(st.integers(min_value=1, max_value=64), max_size=64),
+    cap=st.integers(min_value=1, max_value=4096),
+)
+def test_accumulate_capped_chunk_boundary_independence(body: bytes, sizes: list[int], cap: int) -> None:
+    chunks = _partition(body, sizes)
+    if len(body) > cap:
+        with pytest.raises(_CapExceeded) as caught:
+            _accumulate_capped(chunks, cap)
+        assert caught.value.read > cap
+    else:
+        assert _accumulate_capped(chunks, cap) == body
+
+
+@given(body=st.binary(min_size=2, max_size=512))
+def test_accumulate_capped_trips_at_one_below_length(body: bytes) -> None:
+    cap = len(body) - 1
+    with pytest.raises(_CapExceeded):
+        _accumulate_capped([body], cap)
+
+
+@given(body=st.binary(max_size=512))
+def test_accumulate_capped_passes_at_exact_length(body: bytes) -> None:
+    cap = max(1, len(body))
+    assert _accumulate_capped([body], cap) == body
diff --git a/tests/test_client_body_cap.py b/tests/test_client_body_cap.py
new file mode 100644
index 0000000..fe9d086
--- /dev/null
+++ b/tests/test_client_body_cap.py
@@ -0,0 +1,206 @@
+"""max_response_body_bytes — non-streaming send() cap + construction validation.
+
+Covers both clients: the terminal buffers under the cap and fails fast with
+ResponseTooLargeError when a response body (any status) exceeds it. stream()
+coverage lives in tests/test_client_stream*.py.
+"""
+
+import gzip
+from collections.abc import AsyncIterator
+
+import httpx2
+import pytest
+
+from httpware import AsyncClient, Client
+from httpware.errors import ResponseTooLargeError
+
+
+def _sync(handler: object, cap: int | None) -> Client:
+    return Client(
+        httpx2_client=httpx2.Client(transport=httpx2.MockTransport(handler)),  # ty: ignore[invalid-argument-type]
+        max_response_body_bytes=cap,
+    )
+
+
+def _async(handler: object, cap: int | None) -> AsyncClient:
+    return AsyncClient(
+        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)),  # ty: ignore[invalid-argument-type]
+        max_response_body_bytes=cap,
+    )
+
+
+# ---- construction validation ----
+
+
+@pytest.mark.parametrize("bad", [0, -1])
+def test_async_rejects_cap_below_one(bad: int) -> None:
+    with pytest.raises(ValueError, match="max_response_body_bytes must be >= 1"):
+        AsyncClient(max_response_body_bytes=bad)
+
+
+@pytest.mark.parametrize("bad", [0, -1])
+def test_sync_rejects_cap_below_one(bad: int) -> None:
+    with pytest.raises(ValueError, match="max_response_body_bytes must be >= 1"):
+        Client(max_response_body_bytes=bad)
+
+
+# ---- sync send() ----
+
+
+def test_sync_send_within_cap_returns_response() -> None:
+    body = b"hello world"
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client = _sync(handler, 1000)
+    request = client.build_request("GET", "https://example.test/x")
+    assert client.send(request).content == body
+    client.close()
+
+
+def test_sync_send_over_cap_declared_on_success() -> None:
+    body = b"x" * 200
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client = _sync(handler, 10)
+    request = client.build_request("GET", "https://example.test/x")
+    with pytest.raises(ResponseTooLargeError) as caught:
+        client.send(request)
+    assert caught.value.reason == "declared"
+    assert caught.value.status_code == 200  # noqa: PLR2004 — status-agnostic: a 200 trips
+    client.close()
+
+
+def test_sync_send_over_cap_streamed_gzip_bomb() -> None:
+    raw = gzip.compress(b"A" * 100_000)
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-encoding": "gzip"}, content=raw)
+
+    client = _sync(handler, 1000)
+    request = client.build_request("GET", "https://example.test/x")
+    with pytest.raises(ResponseTooLargeError) as caught:
+        client.send(request)
+    assert caught.value.reason == "streamed"
+    client.close()
+
+
+def test_sync_send_within_cap_gzip_returns_decoded() -> None:
+    raw = gzip.compress(b"A" * 500)
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-encoding": "gzip"}, content=raw)
+
+    client = _sync(handler, 1_000_000)
+    request = client.build_request("GET", "https://example.test/x")
+    assert client.send(request).content == b"A" * 500  # not re-decompressed/crashed
+    client.close()
+
+
+def test_sync_head_large_declared_length_not_rejected() -> None:
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-length": "50000000"})
+
+    client = _sync(handler, 1000)
+    request = client.build_request("HEAD", "https://example.test/x")
+    response = client.send(request)
+    assert response.content == b""
+    assert response.headers["content-length"] == "50000000"
+    client.close()
+
+
+def test_sync_send_none_cap_unbounded() -> None:
+    body = b"x" * 10_000
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client = _sync(handler, None)
+    request = client.build_request("GET", "https://example.test/x")
+    assert client.send(request).content == body
+    client.close()
+
+
+# ---- async send() ----
+
+
+async def test_async_send_within_cap_returns_response() -> None:
+    body = b"hello world"
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client = _async(handler, 1000)
+    request = client.build_request("GET", "https://example.test/x")
+    assert (await client.send(request)).content == body
+    await client.aclose()
+
+
+async def test_async_send_over_cap_declared() -> None:
+    body = b"x" * 200
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client = _async(handler, 10)
+    request = client.build_request("GET", "https://example.test/x")
+    with pytest.raises(ResponseTooLargeError) as caught:
+        await client.send(request)
+    assert caught.value.reason == "declared"
+    await client.aclose()
+
+
+async def test_async_send_over_cap_streamed_chunked() -> None:
+    async def body() -> AsyncIterator[bytes]:
+        yield b"a" * 50
+        yield b"b" * 50
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body())
+
+    client = _async(handler, 70)
+    request = client.build_request("GET", "https://example.test/x")
+    with pytest.raises(ResponseTooLargeError) as caught:
+        await client.send(request)
+    assert caught.value.reason == "streamed"
+    assert caught.value.content_length is None
+    await client.aclose()
+
+
+async def test_async_send_within_cap_gzip_returns_decoded() -> None:
+    raw = gzip.compress(b"A" * 500)
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-encoding": "gzip"}, content=raw)
+
+    client = _async(handler, 1_000_000)
+    request = client.build_request("GET", "https://example.test/x")
+    assert (await client.send(request)).content == b"A" * 500
+    await client.aclose()
+
+
+async def test_async_head_large_declared_length_not_rejected() -> None:
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, headers={"content-length": "50000000"})
+
+    client = _async(handler, 1000)
+    request = client.build_request("HEAD", "https://example.test/x")
+    response = await client.send(request)
+    assert response.content == b""
+    assert response.headers["content-length"] == "50000000"
+    await client.aclose()
+
+
+async def test_async_send_none_cap_unbounded() -> None:
+    body = b"x" * 10_000
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client = _async(handler, None)
+    request = client.build_request("GET", "https://example.test/x")
+    assert (await client.send(request)).content == body
+    await client.aclose()
diff --git a/tests/test_client_stream.py b/tests/test_client_stream.py
index 3847eb6..b8f689c 100644
--- a/tests/test_client_stream.py
+++ b/tests/test_client_stream.py
@@ -347,12 +347,12 @@ def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
         return httpx2.Response(500, content=body)
 
     client = AsyncClient(
-        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)), max_error_body_bytes=10
+        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)), max_response_body_bytes=10
     )
     with pytest.raises(ResponseTooLargeError) as caught:
         async with client.stream("GET", "https://example.test/x"):
             pytest.fail("unreachable")
-    assert caught.value.limit == 10  # noqa: PLR2004 — mirrors max_error_body_bytes above
+    assert caught.value.limit == 10  # noqa: PLR2004 — mirrors max_response_body_bytes above
     assert caught.value.content_length == 200  # noqa: PLR2004 — len(body) above
     await client.aclose()
 
@@ -364,7 +364,7 @@ def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
         return httpx2.Response(404, content=body)
 
     client = AsyncClient(
-        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)), max_error_body_bytes=1000
+        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)), max_response_body_bytes=1000
     )
     with pytest.raises(NotFoundError) as caught:
         async with client.stream("GET", "https://example.test/x"):
@@ -387,6 +387,58 @@ def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
     await client.aclose()
 
 
+async def test_stream_error_pre_read_streamed_over_cap() -> None:
+    async def body() -> typing.AsyncIterator[bytes]:
+        yield b"a" * 50
+        yield b"b" * 50
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(500, content=body())  # chunked: no Content-Length
+
+    client = AsyncClient(
+        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)), max_response_body_bytes=70
+    )
+    with pytest.raises(ResponseTooLargeError) as caught:
+        async with client.stream("GET", "https://example.test/x"):
+            pytest.fail("unreachable")
+    assert caught.value.reason == "streamed"
+    assert caught.value.content_length is None
+    await client.aclose()
+
+
+async def test_stream_error_pre_read_within_cap_gzip_decoded() -> None:
+    import gzip  # noqa: PLC0415 — local to this regression test
+
+    raw = gzip.compress(b"boom" * 50)
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(500, headers={"content-encoding": "gzip"}, content=raw)
+
+    client = AsyncClient(
+        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)), max_response_body_bytes=1_000_000
+    )
+    with pytest.raises(InternalServerError) as caught:
+        async with client.stream("GET", "https://example.test/x"):
+            pytest.fail("unreachable")
+    assert caught.value.response.content == b"boom" * 50  # decoded, not re-decompressed
+    await client.aclose()
+
+
+async def test_stream_user_driven_success_body_not_capped() -> None:
+    body = b"x" * 100_000
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client = AsyncClient(
+        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)), max_response_body_bytes=10
+    )
+    async with client.stream("GET", "https://example.test/x") as response:
+        chunks = [chunk async for chunk in response.aiter_bytes()]
+    assert b"".join(chunks) == body  # user-driven streaming is never capped
+    await client.aclose()
+
+
 @pytest.mark.parametrize(
     ("raw", "expected"),
     [(None, None), ("123", 123), ("abc", None), ("-5", None), ("0", 0)],
diff --git a/tests/test_client_stream_sync.py b/tests/test_client_stream_sync.py
index 53c62c2..25df725 100644
--- a/tests/test_client_stream_sync.py
+++ b/tests/test_client_stream_sync.py
@@ -314,10 +314,10 @@ def test_stream_raises_response_too_large_when_over_cap_sync() -> None:
     def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
         return httpx2.Response(500, content=body)
 
-    client = Client(httpx2_client=httpx2.Client(transport=httpx2.MockTransport(handler)), max_error_body_bytes=10)
+    client = Client(httpx2_client=httpx2.Client(transport=httpx2.MockTransport(handler)), max_response_body_bytes=10)
     with pytest.raises(ResponseTooLargeError) as caught, client.stream("GET", "https://example.test/x"):
         pytest.fail("unreachable")
-    assert caught.value.limit == 10  # noqa: PLR2004 — mirrors max_error_body_bytes above
+    assert caught.value.limit == 10  # noqa: PLR2004 — mirrors max_response_body_bytes above
     assert caught.value.content_length == 200  # noqa: PLR2004 — len(body) above
     client.close()
 
@@ -328,7 +328,7 @@ def test_stream_reads_error_body_when_under_cap_sync() -> None:
     def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
         return httpx2.Response(404, content=body)
 
-    client = Client(httpx2_client=httpx2.Client(transport=httpx2.MockTransport(handler)), max_error_body_bytes=1000)
+    client = Client(httpx2_client=httpx2.Client(transport=httpx2.MockTransport(handler)), max_response_body_bytes=1000)
     with pytest.raises(NotFoundError) as caught, client.stream("GET", "https://example.test/x"):
         pytest.fail("unreachable")
     assert caught.value.response.content == body
@@ -346,3 +346,45 @@ def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
         pytest.fail("unreachable")
     assert caught.value.response.content == body
     client.close()
+
+
+def test_stream_error_pre_read_streamed_over_cap_sync() -> None:
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(500, content=(c for c in (b"a" * 50, b"b" * 50)))  # chunked: no Content-Length
+
+    client = Client(httpx2_client=httpx2.Client(transport=httpx2.MockTransport(handler)), max_response_body_bytes=70)
+    with pytest.raises(ResponseTooLargeError) as caught, client.stream("GET", "https://example.test/x"):
+        pytest.fail("unreachable")
+    assert caught.value.reason == "streamed"
+    assert caught.value.content_length is None
+    client.close()
+
+
+def test_stream_error_pre_read_within_cap_gzip_decoded_sync() -> None:
+    import gzip  # noqa: PLC0415 — local to this regression test
+
+    raw = gzip.compress(b"boom" * 50)
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(500, headers={"content-encoding": "gzip"}, content=raw)
+
+    client = Client(
+        httpx2_client=httpx2.Client(transport=httpx2.MockTransport(handler)), max_response_body_bytes=1_000_000
+    )
+    with pytest.raises(InternalServerError) as caught, client.stream("GET", "https://example.test/x"):
+        pytest.fail("unreachable")
+    assert caught.value.response.content == b"boom" * 50
+    client.close()
+
+
+def test_stream_user_driven_success_body_not_capped_sync() -> None:
+    body = b"x" * 100_000
+
+    def handler(request: httpx2.Request) -> httpx2.Response:  # noqa: ARG001
+        return httpx2.Response(200, content=body)
+
+    client = Client(httpx2_client=httpx2.Client(transport=httpx2.MockTransport(handler)), max_response_body_bytes=10)
+    with client.stream("GET", "https://example.test/x") as response:
+        chunks = list(response.iter_bytes())
+    assert b"".join(chunks) == body  # user-driven streaming is never capped
+    client.close()
diff --git a/tests/test_errors.py b/tests/test_errors.py
index 2a087ae..fbb9fe3 100644
--- a/tests/test_errors.py
+++ b/tests/test_errors.py
@@ -433,18 +433,36 @@ def test_status_error_message_masks_query_secret() -> None:
 
 
 def test_response_too_large_error_fields_and_message() -> None:
-    exc = ResponseTooLargeError(status_code=500, limit=1024, content_length=2048)
+    exc = ResponseTooLargeError(status_code=500, limit=1024, content_length=2048, reason="declared")
     assert exc.status_code == 500  # noqa: PLR2004 — literal mirrors construction above
     assert exc.limit == 1024  # noqa: PLR2004 — literal mirrors construction above
     assert exc.content_length == 2048  # noqa: PLR2004 — literal mirrors construction above
+    assert exc.reason == "declared"
     assert "1024" in str(exc)
     assert "2048" in str(exc)
 
 
+def test_response_too_large_error_status_agnostic_streamed() -> None:
+    exc = ResponseTooLargeError(status_code=200, limit=10, content_length=None, reason="streamed")
+    assert exc.status_code == 200  # noqa: PLR2004 — literal mirrors construction above
+    assert exc.content_length is None
+    assert exc.reason == "streamed"
+    assert "10" in str(exc)
+
+
+def test_response_too_large_error_message_differs_by_reason() -> None:
+    declared = ResponseTooLargeError(status_code=500, limit=10, content_length=2048, reason="declared")
+    streamed = ResponseTooLargeError(status_code=500, limit=10, content_length=None, reason="streamed")
+    assert str(declared) != str(streamed)
+    assert "2048" in str(declared)
+    assert "2048" not in str(streamed)
+
+
 def test_response_too_large_error_pickle_round_trip() -> None:
-    exc = ResponseTooLargeError(status_code=503, limit=10, content_length=None)
+    exc = ResponseTooLargeError(status_code=503, limit=10, content_length=None, reason="streamed")
     restored = pickle.loads(pickle.dumps(exc))  # noqa: S301 — round-tripping our own exception
     assert isinstance(restored, ResponseTooLargeError)
     assert restored.status_code == 503  # noqa: PLR2004 — literal mirrors construction above
     assert restored.limit == 10  # noqa: PLR2004 — literal mirrors construction above
     assert restored.content_length is None
+    assert restored.reason == "streamed"
diff --git a/tests/test_resilience_body_cap.py b/tests/test_resilience_body_cap.py
new file mode 100644
index 0000000..eadaaee
--- /dev/null
+++ b/tests/test_resilience_body_cap.py
@@ -0,0 +1,70 @@
+"""Resilience interaction with max_response_body_bytes.
+
+ResponseTooLargeError is a non-status ClientError, so it must fall outside the
+retry/circuit-breaker failure classifications. These tests lock that behavior so
+a future refactor can't silently make a cap trip retryable or breaker-counting.
+"""
+
+import httpx2
+import pytest
+
+from httpware import AsyncClient, CircuitState, ResponseTooLargeError
+from httpware.middleware.resilience.circuit_breaker import AsyncCircuitBreaker
+from httpware.middleware.resilience.retry import AsyncRetry
+
+
+class _CountingHandler:
+    """Mock transport that counts calls and always returns the same response."""
+
+    def __init__(self, status: int, body: bytes) -> None:
+        self.status = status
+        self.body = body
+        self.calls = 0
+
+    def __call__(self, request: httpx2.Request) -> httpx2.Response:
+        self.calls += 1
+        return httpx2.Response(self.status, content=self.body, request=request)
+
+
+def _client(handler: _CountingHandler, *, middleware: list[object], cap: int) -> AsyncClient:
+    return AsyncClient(
+        httpx2_client=httpx2.AsyncClient(transport=httpx2.MockTransport(handler)),
+        middleware=middleware,  # ty: ignore[invalid-argument-type]
+        max_response_body_bytes=cap,
+    )
+
+
+async def test_response_too_large_is_not_retried() -> None:
+    handler = _CountingHandler(200, b"x" * 200)
+    client = _client(handler, middleware=[AsyncRetry()], cap=10)
+    request = client.build_request("GET", "https://example.test/x")
+    with pytest.raises(ResponseTooLargeError):
+        await client.send(request)
+    assert handler.calls == 1  # not retried — a single terminal attempt
+    await client.aclose()
+
+
+async def test_over_cap_retryable_5xx_surfaces_as_too_large_not_retried() -> None:
+    # 503 is retryable, but the cap trips first: cap-wins / fail-hard.
+    handler = _CountingHandler(503, b"x" * 200)
+    client = _client(handler, middleware=[AsyncRetry()], cap=10)
+    request = client.build_request("GET", "https://example.test/x")
+    with pytest.raises(ResponseTooLargeError) as caught:
+        await client.send(request)
+    assert caught.value.status_code == 503  # noqa: PLR2004 — the retryable status, surfaced not retried
+    assert handler.calls == 1
+    await client.aclose()
+
+
+async def test_response_too_large_does_not_trip_circuit_breaker() -> None:
+    # failure_threshold=1: one real failure would open the circuit; a cap trip must not.
+    handler = _CountingHandler(500, b"x" * 200)
+    breaker = AsyncCircuitBreaker(failure_threshold=1)
+    client = _client(handler, middleware=[breaker], cap=10)
+    request = client.build_request("GET", "https://example.test/x")
+    for _ in range(3):
+        with pytest.raises(ResponseTooLargeError):
+            await client.send(request)
+    assert breaker.state is CircuitState.CLOSED  # neither success nor failure recorded
+    assert handler.calls == 3  # noqa: PLR2004 — breaker never opened, every call reached the transport
+    await client.aclose()