[Bug] Provider API error 400: assistant message with 'tool_calls' missing corresponding tool response (tool_call_id Read:158)

## Bug Description

Kimi Code repeatedly fails with a provider API error that makes the session unusable. The error claims an assistant message containing `tool_calls` is not followed by the required tool response messages.

## Error Message

```
400 an assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: Read:158
```

## Environment

- **Kimi Code version**: 0.14.0
- **Install source**: homebrew
- **OS**: darwin arm64
- **Node.js version**: 26.3.0
- **Wire protocol version**: 1.4
- **Model**: kimi-code/kimi-for-coding
- **Shell**: /bin/zsh
- **Terminal**: zed 1.6.3+stable.306.601ecb3ee5c16940191818ee7f244837abf6983c
- **Session ID**: session_c44fe4ac-ae8b-4e56-a54a-4d37be7a1c47
- **Session title**: based on @.omo what nexts?
- **Workspace**: /Users/irfandi/.local/share/opencode/worktree/91fb77f4c6c843e75d4b0755a138a24dd7e2e7fe/clever-cabin

## Reproduction / Observed behavior

1. Resume an existing session.
2. The assistant emits a tool call with id `Read:158`.
3. Subsequent LLM requests fail with HTTP 400 because the message history is missing the corresponding tool response for `Read:158`.
4. The failure persists across retries and turns, blocking the session completely.

Log excerpt from `logs/kimi-code.log`:

```
2026-06-13T02:33:12.563Z INFO  llm request  turnStep=9.18
2026-06-13T02:33:13.599Z WARN  llm request failed  turnStep=9.18 attempt=1/3 model=kimi-code/kimi-for-coding errorName=APIStatusError errorMessage="400 an assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: Read:158" statusCode=400
2026-06-13T02:33:13.600Z ERROR turn failed  turnId=9
...
2026-06-13T02:38:05.687Z WARN  llm request failed  turnStep=1.1 attempt=1/3 model=kimi-code/kimi-for-coding errorName=APIStatusError errorMessage="..." statusCode=400
2026-06-13T02:38:28.037Z ERROR compaction failed
  APIStatusError: 400 ... Read:158
```

The error also occurs during compaction, suggesting the corrupted/conversation state is being persisted.

## Expected behavior

- The client should either:
  - Ensure every `tool_calls` assistant message is followed by the matching tool response before sending the next request, or
  - Detect and repair the inconsistent turn state rather than repeatedly sending the malformed history to the provider.
- A session should be able to resume without hitting a permanent 400 loop.

## Related issues and PRs

This appears to be a **known class of bug** in `MoonshotAI/kimi-code` and the broader Kimi/Moonshot ecosystem. Below are the most relevant existing reports and attempted fixes.

### Directly related issues in `MoonshotAI/kimi-code`

| Issue | Status | Relationship |
|---|---|---|
| [#269](https://github.com/MoonshotAI/kimi-code/issues/269) | open | Same error after force-interrupt during tool execution; resume hits 400. Root cause diagnosed as dirty `pendingToolResultIds` + `project()` not filtering incomplete tool sequences. |
| [#660](https://github.com/MoonshotAI/kimi-code/issues/660) | open | "Impossible to resume crashed sessions" — OOM/force-kill during tool execution leaves session unresumable with `tool_call_ids did not have response messages`. |
| [#701](https://github.com/MoonshotAI/kimi-code/issues/701) | open | Same 400 on session resume with open tool calls; includes a [proposed fix commit](https://github.com/thecannabisapp/kimi-code/commit/0291e891f0d304add982598e6bcba992dd27042e) on an external fork. |
| [#520](https://github.com/MoonshotAI/kimi-code/issues/520) | open | Related flow: thinking model returns reasoning-only completion after a tool call (`APIEmptyResponseError`). |

### Related PRs in `MoonshotAI/kimi-code`

| PR | Status | Relationship |
|---|---|---|
| [#664](https://github.com/MoonshotAI/kimi-code/pull/664) | open | Direct fix for #660 — adds `trimTrailingOpenToolExchange()` in `project()` and `cleanupOrphanedToolCalls()` in `ContextMemory` after resume. |
| [#273](https://github.com/MoonshotAI/kimi-code/pull/273) | closed | Earlier fix attempt for #269 — synthesizes missing `tool.result` messages at replay time. |
| [#553](https://github.com/MoonshotAI/kimi-code/pull/553) | open | UX-side mitigation: auto-undo interrupted prompts when a turn produces no output. |

### Related reports in other Moonshot / Kimi projects

| Project | Issue | Notes |
|---|---|---|
| `MoonshotAI/kimi-cli` | [#1977](https://github.com/MoonshotAI/kimi-cli/issues/1977) | Exact same error (`Shell:58` tool_call_id missing). |
| `MoonshotAI/kimi-cli` | [#1299](https://github.com/MoonshotAI/kimi-cli/issues/1299) | Same error on `kimi --continue`; corrupted session history (disk full). Workaround: start a new session without `--continue`. |
| `MoonshotAI/kimi-cli` | [#1171](https://github.com/MoonshotAI/kimi-cli/issues/1171) | Malformed `tool_calls[].function.arguments` JSON poisons session history permanently. |
| `MoonshotAI/kimi-cli` | [#2165](https://github.com/MoonshotAI/kimi-cli/issues/2165) | Corrupted `context.jsonl` makes session unrecoverable. |

### Cross-project reports (same OpenAI-compatible error)

| Project | Issue | Notes |
|---|---|---|
| `openai/codex` | [#8479](https://github.com/openai/codex/issues/8479) | Parallel `tool_calls` with lost responses during session interruption. |
| `pydantic/pydantic-ai` | [#562](https://github.com/pydantic/pydantic-ai/issues/562), [#2360](https://github.com/pydantic/pydantic-ai/issues/2360) | Workaround: `parallel_tool_calls=False`. |
| `google/adk-python` | [#153](https://github.com/google/adk-python/issues/153), [#187](https://github.com/google/adk-python/issues/187) | Multi-agent orchestration loses `tool_call_id`s. |
| `microsoft/semantic-kernel` | [#9443](https://github.com/microsoft/semantic-kernel/issues/9443) (Java), [#7626](https://github.com/microsoft/semantic-kernel/issues/7626) (.NET) | Framework injects user messages between tool responses. |

## Root cause analysis (from local debug zip)

I exported the debug zip and inspected `logs/kimi-code.log`, `agents/main/wire.jsonl`, `state.json`, and `manifest.json`.

### Timeline of the crash

| Time (UTC) | Event |
|---|---|
| 2026-06-12 11:39 | Session created. |
| ~11:40–02:24 | 8 long turns with many tool calls and 6 compactions. |
| 02:30:45 | Turn 9 begins. User prompt: "Resume the active goal." |
| 02:33:12 | **Turn 9, step 17**: model decides to `Read` `config.ts`. |
| 02:33:12 | **Critical race**: a `turn.steer` event (background task completion notification) is injected mid-step, creating a new turn 10 while turn 9 is still in-flight. |
| 02:33:12 | Turn 9, step 17's Read tool call completes, but the tool result appears to be split across the turn boundary created by the steer. |
| 02:33:13 | **400 error** on turn 9 step 18: `Read:158` tool_call_id has no matching tool response. |
| 02:33:30 | Turn 10 step 2 also fails with the same 400. |
| 02:38:05 | After session restart, the very first turn fails with the same 400. |
| 02:38:28 | Even **manual compaction fails** with the same 400, confirming the corrupted context is persisted. |

### Compaction history

Six compaction events occurred before the crash; the last one completed successfully at 22:43:12. A manual compaction attempt at 02:38:27 was cancelled because the compaction itself hit the 400 error.

### Hypothesis

The `Read:158` tool result was **orphaned by a `turn.steer` event that fired while a Read tool call was in-flight**. The steer injected new user messages and spawned a new turn 10 on top of the still-running turn 9. When the context was reconstructed for turn 9's next API call, the assistant message containing the `Read:158` tool_call was preserved, but its matching tool result was attributed to the wrong turn/context slot or dropped across the turn boundary. This is a concurrency bug in context reconstruction when background-task notifications interrupt an active step.

## Debug artifacts

Exported debug zip:

```
/Users/irfandi/.local/share/opencode/worktree/91fb77f4c6c843e75d4b0755a138a24dd7e2e7fe/clever-cabin/session_c44fe4ac-ae8b-4e56-a54a-4d37be7a1c47.zip
```

Contains:
- `manifest.json`
- `state.json`
- `logs/kimi-code.log`
- `logs/global/kimi-code.log`
- `agents/main/wire.jsonl` (8757 lines of wire-protocol events)

Please let me know if you need the zip uploaded somewhere or additional logs.

## Suggested fixes / mitigations

1. **Defer `turn.steer` injection** until the current step's tool calls have completed and their results are committed to the same context slot.
2. **Validate tool_call/tool_result pairing** in `project()` / context reconstruction before sending to the API; drop or synthesize orphaned pairs rather than sending malformed history.
3. **On session resume**, run `cleanupOrphanedToolCalls()` / `trimTrailingOpenToolExchange()` even when the orphaned tool calls are no longer at the very end of the history (e.g. because a new user prompt was appended after them).
4. **Prevent compaction from persisting corrupted context**: if an API call fails with this specific 400, do not compact until the context is repaired.

## Additional context

This happened after the user chose to resume an active goal. The session had already accumulated many turns (turn 7.x, 8.x, 9.x) before the failure. The `Read:158` tool call ID is the Kimi/Moonshot API-level ID for the 158th `Read` tool call in the session; it is distinct from Kimi Code's internal wire UUIDs.


Project	Issue	Notes
`MoonshotAI/kimi-cli`	#1977	Exact same error (`Shell:58` tool_call_id missing).
`MoonshotAI/kimi-cli`	#1299	Same error on `kimi --continue`; corrupted session history (disk full). Workaround: start a new session without `--continue`.
`MoonshotAI/kimi-cli`	#1171	Malformed `tool_calls[].function.arguments` JSON poisons session history permanently.
`MoonshotAI/kimi-cli`	#2165	Corrupted `context.jsonl` makes session unrecoverable.

Project	Issue	Notes
`openai/codex`	#8479	Parallel `tool_calls` with lost responses during session interruption.
`pydantic/pydantic-ai`	#562, #2360	Workaround: `parallel_tool_calls=False`.
`google/adk-python`	#153, #187	Multi-agent orchestration loses `tool_call_id`s.
`microsoft/semantic-kernel`	#9443 (Java), #7626 (.NET)	Framework injects user messages between tool responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Provider API error 400: assistant message with 'tool_calls' missing corresponding tool response (tool_call_id Read:158) #705

Bug Description

Error Message

Environment

Reproduction / Observed behavior

Expected behavior

Related issues and PRs

Directly related issues in `MoonshotAI/kimi-code`

Related PRs in `MoonshotAI/kimi-code`

Related reports in other Moonshot / Kimi projects

Cross-project reports (same OpenAI-compatible error)

Root cause analysis (from local debug zip)

Timeline of the crash

Compaction history

Hypothesis

Debug artifacts

Suggested fixes / mitigations

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue	Status	Relationship
#269	open	Same error after force-interrupt during tool execution; resume hits 400. Root cause diagnosed as dirty `pendingToolResultIds` + `project()` not filtering incomplete tool sequences.
#660	open	"Impossible to resume crashed sessions" — OOM/force-kill during tool execution leaves session unresumable with `tool_call_ids did not have response messages`.
#701	open	Same 400 on session resume with open tool calls; includes a proposed fix commit on an external fork.
#520	open	Related flow: thinking model returns reasoning-only completion after a tool call (`APIEmptyResponseError`).

PR	Status	Relationship
#664	open	Direct fix for #660 — adds `trimTrailingOpenToolExchange()` in `project()` and `cleanupOrphanedToolCalls()` in `ContextMemory` after resume.
#273	closed	Earlier fix attempt for #269 — synthesizes missing `tool.result` messages at replay time.
#553	open	UX-side mitigation: auto-undo interrupted prompts when a turn produces no output.

Time (UTC)	Event
2026-06-12 11:39	Session created.
~11:40–02:24	8 long turns with many tool calls and 6 compactions.
02:30:45	Turn 9 begins. User prompt: "Resume the active goal."
02:33:12	Turn 9, step 17: model decides to `Read` `config.ts`.
02:33:12	Critical race: a `turn.steer` event (background task completion notification) is injected mid-step, creating a new turn 10 while turn 9 is still in-flight.
02:33:12	Turn 9, step 17's Read tool call completes, but the tool result appears to be split across the turn boundary created by the steer.
02:33:13	400 error on turn 9 step 18: `Read:158` tool_call_id has no matching tool response.
02:33:30	Turn 10 step 2 also fails with the same 400.
02:38:05	After session restart, the very first turn fails with the same 400.
02:38:28	Even manual compaction fails with the same 400, confirming the corrupted context is persisted.

[Bug] Provider API error 400: assistant message with 'tool_calls' missing corresponding tool response (tool_call_id Read:158) #705

Description

Bug Description

Error Message

Environment

Reproduction / Observed behavior

Expected behavior

Related issues and PRs

Directly related issues in MoonshotAI/kimi-code

Related PRs in MoonshotAI/kimi-code

Related reports in other Moonshot / Kimi projects

Cross-project reports (same OpenAI-compatible error)

Root cause analysis (from local debug zip)

Timeline of the crash

Compaction history

Hypothesis

Debug artifacts

Suggested fixes / mitigations

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Directly related issues in `MoonshotAI/kimi-code`

Related PRs in `MoonshotAI/kimi-code`