Skip to content

Self-hosted agent_toolset read tool raises UnicodeDecodeError on binary files (images/PDFs) instead of returning content blocks #1637

@arcaputo3

Description

@arcaputo3

Summary

The self-hosted agent_toolset_20260401 read tool (src/anthropic/lib/tools/agent_toolset.py, beta_read_tool) decodes every file as UTF-8 via target.read_text(). Reading a binary file (image or PDF) raises an uncaught UnicodeDecodeError, surfaced to the model as a raw tool error — even though the tool-result contract already supports image/document content blocks.

So a Managed Agents agent running in a self-hosted environment (client.beta.environments.work.worker(...).handle_item() / SessionToolRunner) cannot read an image or PDF. This bites the document skills (docx/pdf/pptx/xlsx) that render slides/pages to images for visual QA — every such read fails. (The hosted product and Claude Code's Read both handle images, so this is specific to the open-source self-hosted toolset.)

Version

anthropic==0.103.1. Still present on main / v0.105.2beta_read_tool is read_text()-only there too. I couldn't find an existing issue for it.

Repro

  1. Self-hosted CMA agent with the agent_toolset_20260401 toolset (a StandardSandbox / agent_toolset_20260401 tool).

  2. The agent creates or has an image, e.g. /workspace/slide-1.jpg.

  3. The agent calls read(file_path="/workspace/slide-1.jpg").

  4. The tool raises (surfaced as the tool result):

    UnicodeDecodeError('utf-8', b'\xff\xd8\xff\xe0...', 0, 1, 'invalid start byte')
    

    (\xff\xd8 is the JPEG SOI marker.) The same happens for PDFs (%PDF).

Root cause

beta_read_tool's inner read:

text = target.read_text()   # UTF-8; raises UnicodeDecodeError on binary

Only ToolError and OSError are caught; UnicodeDecodeError (a ValueError) propagates uncaught.

Why this is straightforward to fix

The tool-result type already supports content blocks, and the runner forwards them:

  • BetaFunctionToolResultType = Union[str, Iterable[BetaContent]] (_beta_functions.py)
  • ToolError's own docstring shows an image block example
  • _beta_session_runner._to_session_content already forwards image / document / search_result blocks through to the session

So read can simply return an image/document block for binary files.

Suggested fix

Detect image/PDF files (by extension and/or magic bytes) and return a base64 content block instead of decoding as text:

# image/jpeg|png|gif|webp -> "image"; application/pdf -> "document"
media = _binary_media_type(target)
if media is not None:
    data = base64.standard_b64encode(target.read_bytes()).decode("ascii")
    kind = "document" if media == "application/pdf" else "image"
    return [{"type": kind, "source": {"type": "base64", "media_type": media, "data": data}}]
text = target.read_text()
...

A separate (larger) size cap for binary makes sense, since images routinely exceed the 256 KB READ_MAX_BYTES text cap. Happy to open a PR if useful.

Interim workaround

For anyone else self-hosting CMA: we monkeypatch agent_toolset.beta_read_tool in our worker before handle_item() to do exactly the above (binary → base64 image/document block, text → delegate to the original read). Restores visual QA of rendered output for the document skills.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions