Skip to content

Batch API: tool_use input wrapped in $PARAMETER_NAME placeholder key (claude-opus-4-7) #1607

@ffishman

Description

@ffishman

Summary

When using the Messages Batch API with forced tool_use (tool_choice={"type":"tool","name":"..."}) on claude-opus-4-7, the returned tool input is frequently wrapped in a single key $PARAMETER_NAME instead of containing the schema's actual property names at the top level. The same prompt and schema, sent via client.messages.create (sync), returns the correct top-level keys 100% of the time.

Environment

  • SDK: anthropic (Python), latest stable
  • Model: claude-opus-4-7
  • Endpoint: client.messages.batches.create / .retrieve / .results
  • Tool config: tool_choice={"type":"tool","name":"save_topic_synthesis"} (forced)

Expected vs Observed

Expected:

tool_use.input == {
  "tldr": "...",
  "mechanism_summary": "...",
  "evidence_strength": "emerging",
  ...
}

Observed (most items):

tool_use.input == {
  "$PARAMETER_NAME": {
    "tldr": "...",
    "mechanism_summary": "...",
    ...
  }
}

The literal string $PARAMETER_NAME does not appear anywhere in our prompt, tool schema, or system message.

Distribution from a single batch (10 messages, all stop_reason=tool_use)

Request Top-level keys Output tokens
1 $PARAMETER_NAME (full inner) 4573
2 $PARAMETER_NAME (full inner) 5668
3 $PARAMETER_NAME (full inner) 5221
4 $PARAMETER_NAME (full inner) 5513
5 $PARAMETER_NAME (full inner) 4853
6 mixed: $PARAMETER_NAME + top-level tldr/mechanism_summary 5525
7 topic (different wrap key, single-key dict) 4929
8 $PARAMETER_NAME (empty inner dict) 49
9 top-level (correct) 5353
10 top-level (correct) 5915

8/10 had non-conforming top-level shape; 2/10 conformed. Pattern repeated in a second independent batch with different inputs (same 10 messages re-submitted on a different day after schema/prompt changes): again 1 item came back stuck ($PARAMETER_NAME: {}, ~50 output tokens, smallest input).

Possibly related: tool_use that stops early with empty wrapper

Item 8 above: stop_reason="tool_use" but only 49 output tokens emitted, and the $PARAMETER_NAME wrapper was an empty dict. Recurring across our two batches: the failing item is consistently the one with the smallest prompt input (~10K tokens vs. ~17K average). Both stop_reason and tool_use block presence suggest the model thinks it completed successfully.

Reproducer (sketch)

import anthropic
client = anthropic.Anthropic()

schema = {
    "type": "object",
    "properties": {
        "tldr": {"type": "string", "description": "..."},
        "mechanism_summary": {"type": "string"},
        "evidence_strength": {"type": "string", "enum": ["consolidated","emerging","speculative"]},
        # ...several more required string/array fields...
    },
    "required": ["tldr", "mechanism_summary", "evidence_strength", ...],
}

batch = client.messages.batches.create(requests=[
    {
        "custom_id": f"item-{i}",
        "params": {
            "model": "claude-opus-4-7",
            "max_tokens": 8000,
            "system": [{"type": "text", "text": "<system prompt>", "cache_control": {"type": "ephemeral"}}],
            "tools": [{
                "name": "save_topic_synthesis",
                "description": "Save structured composition.",
                "input_schema": schema,
            }],
            "tool_choice": {"type": "tool", "name": "save_topic_synthesis"},
            "messages": [{"role": "user", "content": "<long prompt with ~15-20K tokens of structured input>"}],
        },
    }
    for i in range(10)
])
# wait for batch.processing_status == "ended", then for line in client.messages.batches.results(batch.id):
#   inspect line.result.message.content[0].input
# Expect: most items wrap content in {"$PARAMETER_NAME": {...}}

Impact

Without client-side defensive unwrapping, the majority of batch responses fail strict downstream validation (Pydantic, JSON Schema validators) because expected fields are nested one level too deep. Forces every batch consumer to add a workaround like:

if len(input) == 1:
    sole = next(iter(input.values()))
    if isinstance(sole, dict) and "<sentinel_field>" in sole:
        input = sole

Compounded with the stuck-early issue, the practical failure rate on a fresh batch is ~10% even with that workaround.

Workaround in place on our side

Detect len(input)==1 and unwrap; on <sentinel>=None (stuck output), re-issue the same request as a synchronous messages.create and accept the result. This pattern is described and committed at our repo if useful as reference.

Ask

Could the SDK / Batch API surface either (a) normalize the response shape to match sync mode, or (b) document the wrap behavior so consumers know to expect it? Currently the SDK type hints suggest top-level fields per the declared schema, which is misleading.


Update (2026-05-29): retested on claude-opus-4-8 + minimal reproducer

Re-tested after the 4.8 release. The $PARAMETER_NAME single-key wrapper no longer reproduces on claude-opus-4-8 (0/10 on the original workload that previously showed 8/10). However, the root issue — tool-input serialization leaking into tool_use.input — persists in a new form, and a minimal reproducer narrows the trigger.

Trigger isolated: a nested array<object> sub-schema. A flattened schema (scalars / arrays-of-scalars only) was clean in 28/28 runs across both models and both transports. The nested schema breaks on both.

Failure mode by version:

  • opus-4-7: top-level wrapped in a single non-schema key — $PARAMETER_NAME, topic, or input (the last not in the original report).
  • opus-4-8: internal tool-call markup (<parameter name="..."> / </parameter>) leaks into a string field value, absorbing the following sibling field — so a required field silently disappears while the object still looks well-formed.

Corrections to the original report:

  1. Not batch-only. It also reproduces via client.messages.create (sync), at a lower rate than batch. The original "sync 100% correct" was likely a small-sample / schema-specific effect.
  2. Wrapper-key set on 4.7 includes input in addition to $PARAMETER_NAME and topic.

Environment: SDK anthropic 0.104.1, anthropic-version: 2023-06-01. Reproducer script and per-request IDs available on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions