Summary
When using the Messages Batch API with forced tool_use (tool_choice={"type":"tool","name":"..."}) on claude-opus-4-7, the returned tool input is frequently wrapped in a single key $PARAMETER_NAME instead of containing the schema's actual property names at the top level. The same prompt and schema, sent via client.messages.create (sync), returns the correct top-level keys 100% of the time.
Environment
- SDK:
anthropic (Python), latest stable
- Model:
claude-opus-4-7
- Endpoint:
client.messages.batches.create / .retrieve / .results
- Tool config:
tool_choice={"type":"tool","name":"save_topic_synthesis"} (forced)
Expected vs Observed
Expected:
tool_use.input == {
"tldr": "...",
"mechanism_summary": "...",
"evidence_strength": "emerging",
...
}
Observed (most items):
tool_use.input == {
"$PARAMETER_NAME": {
"tldr": "...",
"mechanism_summary": "...",
...
}
}
The literal string $PARAMETER_NAME does not appear anywhere in our prompt, tool schema, or system message.
Distribution from a single batch (10 messages, all stop_reason=tool_use)
| Request |
Top-level keys |
Output tokens |
| 1 |
$PARAMETER_NAME (full inner) |
4573 |
| 2 |
$PARAMETER_NAME (full inner) |
5668 |
| 3 |
$PARAMETER_NAME (full inner) |
5221 |
| 4 |
$PARAMETER_NAME (full inner) |
5513 |
| 5 |
$PARAMETER_NAME (full inner) |
4853 |
| 6 |
mixed: $PARAMETER_NAME + top-level tldr/mechanism_summary |
5525 |
| 7 |
topic (different wrap key, single-key dict) |
4929 |
| 8 |
$PARAMETER_NAME (empty inner dict) |
49 |
| 9 |
top-level (correct) |
5353 |
| 10 |
top-level (correct) |
5915 |
8/10 had non-conforming top-level shape; 2/10 conformed. Pattern repeated in a second independent batch with different inputs (same 10 messages re-submitted on a different day after schema/prompt changes): again 1 item came back stuck ($PARAMETER_NAME: {}, ~50 output tokens, smallest input).
Possibly related: tool_use that stops early with empty wrapper
Item 8 above: stop_reason="tool_use" but only 49 output tokens emitted, and the $PARAMETER_NAME wrapper was an empty dict. Recurring across our two batches: the failing item is consistently the one with the smallest prompt input (~10K tokens vs. ~17K average). Both stop_reason and tool_use block presence suggest the model thinks it completed successfully.
Reproducer (sketch)
import anthropic
client = anthropic.Anthropic()
schema = {
"type": "object",
"properties": {
"tldr": {"type": "string", "description": "..."},
"mechanism_summary": {"type": "string"},
"evidence_strength": {"type": "string", "enum": ["consolidated","emerging","speculative"]},
# ...several more required string/array fields...
},
"required": ["tldr", "mechanism_summary", "evidence_strength", ...],
}
batch = client.messages.batches.create(requests=[
{
"custom_id": f"item-{i}",
"params": {
"model": "claude-opus-4-7",
"max_tokens": 8000,
"system": [{"type": "text", "text": "<system prompt>", "cache_control": {"type": "ephemeral"}}],
"tools": [{
"name": "save_topic_synthesis",
"description": "Save structured composition.",
"input_schema": schema,
}],
"tool_choice": {"type": "tool", "name": "save_topic_synthesis"},
"messages": [{"role": "user", "content": "<long prompt with ~15-20K tokens of structured input>"}],
},
}
for i in range(10)
])
# wait for batch.processing_status == "ended", then for line in client.messages.batches.results(batch.id):
# inspect line.result.message.content[0].input
# Expect: most items wrap content in {"$PARAMETER_NAME": {...}}
Impact
Without client-side defensive unwrapping, the majority of batch responses fail strict downstream validation (Pydantic, JSON Schema validators) because expected fields are nested one level too deep. Forces every batch consumer to add a workaround like:
if len(input) == 1:
sole = next(iter(input.values()))
if isinstance(sole, dict) and "<sentinel_field>" in sole:
input = sole
Compounded with the stuck-early issue, the practical failure rate on a fresh batch is ~10% even with that workaround.
Workaround in place on our side
Detect len(input)==1 and unwrap; on <sentinel>=None (stuck output), re-issue the same request as a synchronous messages.create and accept the result. This pattern is described and committed at our repo if useful as reference.
Ask
Could the SDK / Batch API surface either (a) normalize the response shape to match sync mode, or (b) document the wrap behavior so consumers know to expect it? Currently the SDK type hints suggest top-level fields per the declared schema, which is misleading.
Update (2026-05-29): retested on claude-opus-4-8 + minimal reproducer
Re-tested after the 4.8 release. The $PARAMETER_NAME single-key wrapper no longer reproduces on claude-opus-4-8 (0/10 on the original workload that previously showed 8/10). However, the root issue — tool-input serialization leaking into tool_use.input — persists in a new form, and a minimal reproducer narrows the trigger.
Trigger isolated: a nested array<object> sub-schema. A flattened schema (scalars / arrays-of-scalars only) was clean in 28/28 runs across both models and both transports. The nested schema breaks on both.
Failure mode by version:
opus-4-7: top-level wrapped in a single non-schema key — $PARAMETER_NAME, topic, or input (the last not in the original report).
opus-4-8: internal tool-call markup (<parameter name="..."> / </parameter>) leaks into a string field value, absorbing the following sibling field — so a required field silently disappears while the object still looks well-formed.
Corrections to the original report:
- Not batch-only. It also reproduces via
client.messages.create (sync), at a lower rate than batch. The original "sync 100% correct" was likely a small-sample / schema-specific effect.
- Wrapper-key set on 4.7 includes
input in addition to $PARAMETER_NAME and topic.
Environment: SDK anthropic 0.104.1, anthropic-version: 2023-06-01. Reproducer script and per-request IDs available on request.
Summary
When using the Messages Batch API with forced
tool_use(tool_choice={"type":"tool","name":"..."}) onclaude-opus-4-7, the returned tool input is frequently wrapped in a single key$PARAMETER_NAMEinstead of containing the schema's actual property names at the top level. The same prompt and schema, sent viaclient.messages.create(sync), returns the correct top-level keys 100% of the time.Environment
anthropic(Python), latest stableclaude-opus-4-7client.messages.batches.create/.retrieve/.resultstool_choice={"type":"tool","name":"save_topic_synthesis"}(forced)Expected vs Observed
Expected:
Observed (most items):
The literal string
$PARAMETER_NAMEdoes not appear anywhere in our prompt, tool schema, or system message.Distribution from a single batch (10 messages, all
stop_reason=tool_use)$PARAMETER_NAME(full inner)$PARAMETER_NAME(full inner)$PARAMETER_NAME(full inner)$PARAMETER_NAME(full inner)$PARAMETER_NAME(full inner)$PARAMETER_NAME+ top-leveltldr/mechanism_summarytopic(different wrap key, single-key dict)$PARAMETER_NAME(empty inner dict)8/10 had non-conforming top-level shape; 2/10 conformed. Pattern repeated in a second independent batch with different inputs (same 10 messages re-submitted on a different day after schema/prompt changes): again 1 item came back stuck (
$PARAMETER_NAME: {}, ~50 output tokens, smallest input).Possibly related: tool_use that stops early with empty wrapper
Item 8 above:
stop_reason="tool_use"but only 49 output tokens emitted, and the$PARAMETER_NAMEwrapper was an empty dict. Recurring across our two batches: the failing item is consistently the one with the smallest prompt input (~10K tokens vs. ~17K average). Bothstop_reasonandtool_useblock presence suggest the model thinks it completed successfully.Reproducer (sketch)
Impact
Without client-side defensive unwrapping, the majority of batch responses fail strict downstream validation (Pydantic, JSON Schema validators) because expected fields are nested one level too deep. Forces every batch consumer to add a workaround like:
Compounded with the stuck-early issue, the practical failure rate on a fresh batch is ~10% even with that workaround.
Workaround in place on our side
Detect
len(input)==1and unwrap; on<sentinel>=None(stuck output), re-issue the same request as a synchronousmessages.createand accept the result. This pattern is described and committed at our repo if useful as reference.Ask
Could the SDK / Batch API surface either (a) normalize the response shape to match sync mode, or (b) document the wrap behavior so consumers know to expect it? Currently the SDK type hints suggest top-level fields per the declared schema, which is misleading.
Update (2026-05-29): retested on
claude-opus-4-8+ minimal reproducerRe-tested after the 4.8 release. The
$PARAMETER_NAMEsingle-key wrapper no longer reproduces onclaude-opus-4-8(0/10 on the original workload that previously showed 8/10). However, the root issue — tool-input serialization leaking intotool_use.input— persists in a new form, and a minimal reproducer narrows the trigger.Trigger isolated: a nested
array<object>sub-schema. A flattened schema (scalars / arrays-of-scalars only) was clean in 28/28 runs across both models and both transports. The nested schema breaks on both.Failure mode by version:
opus-4-7: top-level wrapped in a single non-schema key —$PARAMETER_NAME,topic, orinput(the last not in the original report).opus-4-8: internal tool-call markup (<parameter name="...">/</parameter>) leaks into a string field value, absorbing the following sibling field — so arequiredfield silently disappears while the object still looks well-formed.Corrections to the original report:
client.messages.create(sync), at a lower rate than batch. The original "sync 100% correct" was likely a small-sample / schema-specific effect.inputin addition to$PARAMETER_NAMEandtopic.Environment: SDK
anthropic0.104.1,anthropic-version: 2023-06-01. Reproducer script and per-request IDs available on request.