docs(ai-chat): slim-wire HITL continuations + field-level merge contract

ericallam · ericallam · commit 12d40ac738d9 · 2026-05-23T10:02:00.000+01:00
diff --git a/docs/ai-chat/changelog.mdx b/docs/ai-chat/changelog.mdx
@@ -4,6 +4,35 @@ sidebarTitle: "Changelog"
 description: "Pre-release updates for AI chat agents."
 ---
 
+<Update label="May 23, 2026" description="4.5.0-rc.2" tags={["SDK", "Webapp", "Bug fix"]}>
+
+## HITL continuations — slim wire by default + field-level merge
+
+`chat.addToolOutput(...)` and `chat.addToolApproveResponse(...)` continuations on reasoning-heavy agent loops used to fail two ways: either the wire body crossed the `/in/append` cap (encrypted reasoning blobs + tool input routinely > 512 KiB), or apps that slimmed the wire as a workaround landed a tool call with no `arguments` on the next LLM step (the per-turn merge replaced the hydrated message wholesale instead of overlaying only the new tool-state advance). Both modes are fixed.
+
+The transport (`TriggerChatTransport.sendMessages`, `AgentChat.sendRaw`) now slims the assistant message itself on `submit-message` turns whose assistant carries resolved or approval-responded tool parts. The wire shape ships as `{ id, role: "assistant", parts: [<resolved tool part only>] }` — `state` plus `output` / `errorText` / `approval`, depending on the new state. Everything else (reasoning blobs, prior text, tool `input`, provider metadata) is reconstructed server-side from `hydrateMessages` or the durable snapshot. Continuation payloads typically drop from 600 KiB – 1 MiB to ~1 KiB.
+
+The per-turn merge now overlays only the tool-part state advances (`output-available` / `output-error` / `approval-responded` / `output-denied`) from the wire copy onto the matching hydrated entry. Hydrated `input`, text, reasoning, and provider metadata stay put. The agent still accepts a fuller `UIMessage` on the wire (the merge only reads the resolved fields), so custom transports that ship more don't break — they just waste bytes.
+
+### `hydrateMessages` upsert-by-id
+
+If your `hydrateMessages` hook persists the incoming message, **upsert by id** — don't unconditionally push. HITL continuations ship the existing assistant's id with a slim payload; a blind `stored.push(newMsg)` duplicates the row in the chain you return, the merge updates the first match, and the slim duplicate hits `toModelMessages` with no `input`. The examples in [lifecycle hooks](/ai-chat/lifecycle-hooks#hydratemessages), [Database persistence](/ai-chat/patterns/database-persistence#alternative-hydratemessages), and [Persistence and replay](/ai-chat/patterns/persistence-and-replay) have all been updated.
+
+### `onValidateMessages` slim wire caveat
+
+The slim wire is what arrives in `onValidateMessages` on HITL turns. `validateUIMessages` from `ai` rejects the slim shape (the AI SDK schema requires `input` on resolved tool parts), so filter to user messages first (or skip validation entirely on those turns). See the updated example in [lifecycle hooks](/ai-chat/lifecycle-hooks#onvalidatemessages).
+
+### `/in/append` 413 + precise cap
+
+In parallel:
+
+- The 413 response now carries CORS headers, so browser fetches can read the status instead of failing as opaque `TypeError: Failed to fetch`. App-side retry-on-disconnect loops no longer spin forever on a permanently-rejected payload.
+- The per-record cap is now computed precisely against S2's actual ceiling instead of the conservative 512 KiB floor. Legitimate ~600 – 900 KiB tool outputs (search results, file content) now succeed; pathological all-quote content that would double under JSON escape still rejects cleanly with a clear error.
+
+See the updated [413 row in the client protocol](/ai-chat/client-protocol#step-3-send-messages-stops-and-actions).
+
+</Update>
+
 <Update label="May 21, 2026" description="4.5.0-rc.1" tags={["SDK", "Bug fix"]}>
 
 ## v4.5.0-rc.1 — two bug fixes
diff --git a/docs/ai-chat/client-protocol.mdx b/docs/ai-chat/client-protocol.mdx
@@ -692,7 +692,7 @@ The body is a JSON-serialized [`ChatInputChunk`](#chatinputchunk) — a tagged u
 | `401` | Missing or invalid `Authorization` header. |
 | `403` | Token doesn't carry `write:sessions:{externalId}`. |
 | `409` | The session is closed — `{ "ok": false, "error": "Cannot append to a closed session" }`. |
-| `413` | Body exceeds 512 KiB. A normal `kind: "message"` payload is a few KB; if you hit this you're shipping more than one message per record. |
+| `413` | Body exceeds 1 MiB **or** the wrapped record would exceed S2's ~1 MiB per-record metered ceiling. A normal `kind: "message"` payload is a few KB; if you hit this you're shipping more than one message per record or pushing a single tool output that's itself oversized. Carries CORS headers so browser fetches can read the status. |
 | `500` | Transient backend failure on the durable stream. Safe to retry — appends are idempotent on `(externalId, X-Part-Id)` if you set the optional `X-Part-Id` request header (the built-in clients set it from a UUID). |
 
 <Warning>
@@ -851,7 +851,7 @@ The agent trims trailing assistant messages from its accumulator and re-streams
 
 ### Tool approval responses
 
-When a tool requires approval (`needsApproval: true`), the agent streams the tool call with an `approval-requested` state and completes the turn. After the user approves or denies, send the **updated assistant message** (with `approval-responded` tool parts) back as a `kind: "message"` chunk — singular, not the full chain:
+When a tool requires approval (`needsApproval: true`), the agent streams the tool call with an `approval-requested` state and completes the turn. After the user approves or denies, send the **updated assistant message** back as a `kind: "message"` chunk — singular, not the full chain. The minimum shape the agent reads is just the resolved tool parts:
 
 ```json
 {
@@ -861,12 +861,10 @@ When a tool requires approval (`needsApproval: true`), the agent streams the too
       "id": "asst-msg-1",
       "role": "assistant",
       "parts": [
-        { "type": "text", "text": "I'll send that email for you." },
         {
           "type": "tool-sendEmail",
           "toolCallId": "call-1",
           "state": "approval-responded",
-          "input": { "to": "user@example.com", "subject": "Hello" },
           "approval": { "id": "approval-1", "approved": true }
         }
       ]
@@ -878,7 +876,11 @@ When a tool requires approval (`needsApproval: true`), the agent streams the too
 }
 ```
 
-The agent matches the incoming message by `id` against the rebuilt accumulator. If a match is found, it **replaces** the existing message instead of appending.
+The agent matches the incoming message by `id` against the rebuilt accumulator (or hydrated chain) and **overlays the tool-state advance** onto the matching entry — `state` plus `output` / `errorText` / `approval`, depending on the new state. Hydrated `input`, text, reasoning, and provider metadata stay put. This is what makes the slim shape above sufficient: the agent rebuilds everything else from the snapshot or from your `hydrateMessages` hook.
+
+The same shape applies to HITL `addToolOutput` answers — substitute `state: "output-available"` and `output: <result>` for the approval pair above. Single-tool HITL `addToolOutput` continuation payloads are typically ~1 KiB on the wire.
+
+The built-in transports (`TriggerChatTransport`, `AgentChat`) ship the slim shape by default on `submit-message` continuations. Custom transports can ship a fuller `UIMessage` — the agent still only reads the resolved tool-part fields — but the slim shape is the most efficient and avoids brushing the per-record cap on reasoning-heavy turns.
 
 <Note>
   The message `id` must match the one the agent assigned during streaming. `TriggerChatTransport` keeps IDs in sync automatically. Custom transports should use the `messageId` from the stream's `start` chunk.
@@ -938,7 +940,7 @@ To bridge that gap, the head-start route handler ships **full UIMessage history*
 
 Two reasons this exception is safe:
 
-1. **The route handler runs against the customer's own HTTP endpoint**, not `/realtime/v1/sessions/{id}/in/append`. The 512 KiB body cap on the realtime route doesn't apply.
+1. **The route handler runs against the customer's own HTTP endpoint**, not `/realtime/v1/sessions/{id}/in/append`. The per-record cap on the realtime route doesn't apply.
 2. **`headStartMessages` is only honored on `trigger: "handover-prepare"`**. The runtime ignores the field on every other trigger — the one-message-per-record rule still holds for normal turns.
 
 After turn 1 completes, the snapshot is written and turn 2+ run as a normal single-message-per-record chat.
@@ -1067,7 +1069,7 @@ No. `seq_num` is monotonic across the entire session — turn 1 might emit seq 0
 </Expandable>
 
 <Expandable title="What's the maximum size of a single `.in/append` body?">
-512 KiB. A typical `kind: "message"` is a few KB. If you're brushing the cap you're shipping more than one message per record, which the protocol forbids. The headStart path (`trigger: "handover-prepare"`) sends through the customer's own HTTP route handler, not `.in/append`, so the cap doesn't apply there.
+The HTTP body is capped at 1 MiB as a DoS guard. The actual ceiling is at the storage layer: each `.in/append` becomes a single S2 record, metered as `8 + body_bytes_after_JSON_wrap`, capped at 1 MiB. So the practical limit on the raw HTTP body sits around ~1023 KiB for content with low JSON-escape overhead (ASCII, base64) and ~512 KiB for content that escapes heavily (all quotes / backslashes). A typical `kind: "message"` is a few KiB. If you're brushing the cap you're either shipping a single tool output that's itself oversized — see [Large payloads](/ai-chat/patterns/large-payloads) — or you're shipping more than one message per record, which the protocol forbids. The 413 response carries CORS headers so browser fetches can read the status. The headStart path (`trigger: "handover-prepare"`) sends through the customer's own HTTP route handler, not `.in/append`, so the cap doesn't apply there.
 </Expandable>
 
 ## See also
diff --git a/docs/ai-chat/lifecycle-hooks.mdx b/docs/ai-chat/lifecycle-hooks.mdx
@@ -242,14 +242,22 @@ import { validateUIMessages } from "ai";
 export const myChat = chat.agent({
   id: "my-chat",
   onValidateMessages: async ({ messages }) => {
-    return validateUIMessages({ messages, tools: chatTools });
+    const userMessages = messages.filter((m) => m.role === "user");
+    if (userMessages.length > 0) {
+      await validateUIMessages({ messages: userMessages, tools: chatTools });
+    }
+    return messages;
   },
   run: async ({ messages, signal }) => {
     return streamText({ model: anthropic("claude-sonnet-4-5"), messages, tools: chatTools, abortSignal: signal });
   },
 });
 ```
 
+<Warning>
+  On HITL continuations (`addToolOutput` / `addToolApproveResponse`) the assistant entry in `messages` is **slim** — `state` + `output` / `errorText` / `approval` only, no `input` or other parts. `validateUIMessages` against the AI SDK schema rejects that shape (the schema requires `input` on resolved tool parts), so filter to user messages first (or skip validation entirely on those turns). The example above does the filter.
+</Warning>
+
 <Note>
   `onValidateMessages` fires **before** `onTurnStart` and message accumulation. If you need to validate messages loaded from a database, do the loading in `onChatStart` or `onPreload` and let `onValidateMessages` validate the full incoming set each turn.
 </Note>
@@ -278,14 +286,24 @@ export const myChat = chat.agent({
     const record = await db.chat.findUnique({ where: { id: chatId } });
     const stored = record?.messages ?? [];
 
-    // Append the new user message and persist
+    // Upsert the incoming message by id. On HITL continuations
+    // (`addToolOutput` / `addToolApproveResponse`) the incoming wire
+    // shares the id of an existing assistant in `stored` — `push`ing
+    // unconditionally would duplicate the row. The runtime merges the
+    // resolution onto the existing entry; new ids (typically a fresh
+    // user message) get appended.
     if (trigger === "submit-message" && incomingMessages.length > 0) {
       const newMsg = incomingMessages[incomingMessages.length - 1]!;
-      stored.push(newMsg);
-      await db.chat.update({
-        where: { id: chatId },
-        data: { messages: stored },
-      });
+      const existingIdx = newMsg.id
+        ? stored.findIndex((m) => m.id === newMsg.id)
+        : -1;
+      if (existingIdx === -1) {
+        stored.push(newMsg);
+        await db.chat.update({
+          where: { id: chatId },
+          data: { messages: stored },
+        });
+      }
     }
 
     return stored;
@@ -298,7 +316,7 @@ export const myChat = chat.agent({
 
 **Lifecycle position:** `onValidateMessages` → **`hydrateMessages`** → `onChatStart` (chat's first message only) → `onTurnStart` → `run()`
 
-After the hook returns, any incoming wire message whose ID matches a hydrated message is auto-merged. This makes [tool approvals](/ai-chat/frontend#tool-approvals) work transparently with hydration.
+After the hook returns, the runtime overlays the wire's tool-state advances (`output-available` / `output-error` / `approval-responded` / `output-denied`) onto matching hydrated entries by id. Everything else on the hydrated entry — text, reasoning, tool `input`, providerMetadata — stays put. This makes [tool approvals](/ai-chat/frontend#tool-approvals) and HITL `addToolOutput` continuations work transparently: ship a slim resolution on the wire, the agent merges the new state onto your DB-backed copy.
 
 <Note>
   `hydrateMessages` also fires for [action](/ai-chat/actions) turns (`trigger: "action"`) with empty `incomingMessages`. This lets the action handler work with the latest DB state.
diff --git a/docs/ai-chat/patterns/database-persistence.mdx b/docs/ai-chat/patterns/database-persistence.mdx
@@ -184,9 +184,20 @@ export const myChat = chat.agent({
     const record = await db.chat.findUnique({ where: { id: chatId } });
     const stored = record?.messages ?? [];
 
+    // Upsert by id. HITL continuations (addToolOutput /
+    // addToolApproveResponse) ship the existing assistant's id with a
+    // slim payload — push-without-check duplicates the row, the
+    // runtime merges only the first match, and the duplicate slim copy
+    // hits `toModelMessages` with no `input`.
     if (trigger === "submit-message" && incomingMessages.length > 0) {
-      stored.push(incomingMessages[incomingMessages.length - 1]!);
-      await db.chat.update({ where: { id: chatId }, data: { messages: stored } });
+      const newMsg = incomingMessages[incomingMessages.length - 1]!;
+      const existingIdx = newMsg.id
+        ? stored.findIndex((m) => m.id === newMsg.id)
+        : -1;
+      if (existingIdx === -1) {
+        stored.push(newMsg);
+        await db.chat.update({ where: { id: chatId }, data: { messages: stored } });
+      }
     }
 
     return stored;
diff --git a/docs/ai-chat/patterns/persistence-and-replay.mdx b/docs/ai-chat/patterns/persistence-and-replay.mdx
@@ -139,9 +139,18 @@ export const myChat = chat.agent({
   hydrateMessages: async ({ chatId, trigger, incomingMessages }) => {
     const stored = (await db.chat.findUnique({ where: { id: chatId } }))?.messages ?? [];
 
+    // Upsert by id — HITL continuations ship the existing assistant's
+    // id with a slim payload; the runtime overlays the new state.
+    // See lifecycle-hooks for the full pattern + rationale.
     if (trigger === "submit-message" && incomingMessages.length > 0) {
-      stored.push(incomingMessages[0]!);
-      await db.chat.update({ where: { id: chatId }, data: { messages: stored } });
+      const newMsg = incomingMessages[0]!;
+      const existingIdx = newMsg.id
+        ? stored.findIndex((m) => m.id === newMsg.id)
+        : -1;
+      if (existingIdx === -1) {
+        stored.push(newMsg);
+        await db.chat.update({ where: { id: chatId }, data: { messages: stored } });
+      }
     }
 
     return stored;
diff --git a/docs/ai-chat/patterns/trusted-edge-signals.mdx b/docs/ai-chat/patterns/trusted-edge-signals.mdx
@@ -115,7 +115,7 @@ The body is a JSON-serialized `ChatInputChunk`. The proxy parses it, checks `kin
 }
 ```
 
-Both bodies stay well under the [512 KiB cap on `/in/append`](/ai-chat/client-protocol#step-3-send-messages-stops-and-actions) — a typical trust object is ~200 bytes.
+Both bodies stay well under the [per-record cap on `/in/append`](/ai-chat/client-protocol#step-3-send-messages-stops-and-actions) — a typical trust object is ~200 bytes.
 
 Other paths — `.out` SSE, `/api/v1/auth/jwt/claims`, anything else — pass through the proxy untouched. The SSE stream in particular must not be buffered; preserve the response body as-is.
 

Original file line number	Diff line number	Diff line change
@@ -115,7 +115,7 @@ The body is a JSON-serialized `ChatInputChunk`. The proxy parses it, checks `kin
`115`	`115`	`}`
`116`	`116`	```
`117`	`117`
`118`		-Both bodies stay well under the [512 KiB cap on `/in/append`](/ai-chat/client-protocol#step-3-send-messages-stops-and-actions) — a typical trust object is ~200 bytes.
	`118`	+Both bodies stay well under the [per-record cap on `/in/append`](/ai-chat/client-protocol#step-3-send-messages-stops-and-actions) — a typical trust object is ~200 bytes.
`119`	`119`
`120`	`120`	Other paths — `.out` SSE, `/api/v1/auth/jwt/claims`, anything else — pass through the proxy untouched. The SSE stream in particular must not be buffered; preserve the response body as-is.
`121`	`121`