fix: handle binary files in read tool (images/PDFs) by vihaan-kk · Pull Request #1638 · anthropics/anthropic-sdk-python

vihaan-kk · 2026-06-02T23:39:28Z

Changes

The beta_read_tool in agent_toolset.py called target.read_text() on every file, which decodes bytes as UTF-8. Reading a binary file (image or PDF) raises an uncaught UnicodeDecodeError since only ToolError and OSError were caught, UnicodeDecodeError is a ValueError and propagated uncaught to the model as a raw tool error.

Fix

Added a _binary_media_type helper that detects binary files by extension. When a binary file is detected, read returns a base64-encoded image or document content block instead of attempting text decoding. The existing text path is unchanged.

Supported types:

Images: .jpg, .jpeg, .png, .gif, .webp → image block
Documents: .pdf → document block

Testing

All existing tests pass (3216 Pydantic v2, 3119 Pydantic v1 on Python
3.9; 3253 on Python 3.14)
./scripts/lint passes clean (pyright, mypy, ruff)
./scripts/format applied

Notes

The view_range slicing logic is text-only and is correctly bypassed for binary files since the binary path returns early
The return type annotation on read was updated from str to Any to reflect that binary files return a list content block
The 256 KiB size cap still applies to binary files before the binary check, a separate larger cap for images could be a follow-up

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds binary-aware handling to the read tool so it can return base64-encoded images/PDFs instead of attempting to decode them as text.

Changes:

Introduces a binary media type lookup (_BINARY_MEDIA_TYPES + _binary_media_type).
Updates read to return base64 “image”/“document” blocks for supported binary types.
Changes read return annotation from str to Any.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vihaan-kk · 2026-06-02T23:56:22Z

+            media = _binary_media_type(target)
+            if media is not None:
+                data = base64.standard_b64encode(target.read_bytes()).decode("ascii")
+                kind = "document" if media == "application/pdf" else "image"
+                return [{"type": kind, "source": {"type": "base64", "media_type": media, "data": data}}]
            text = target.read_text()


Good point, noted this in the PR description as a follow-up. A separate lower size cap for base64 responses makes sense but felt out of scope for this fix.

                    f"read: {file_path} is {st.st_size} bytes, exceeds {limit}-byte limit. "
                    "Use bash (head/tail/sed) to read a slice."
                )
+            media = _binary_media_type(target)
+            if media is not None:
+                data = base64.standard_b64encode(target.read_bytes()).decode("ascii")
+                kind = "document" if media == "application/pdf" else "image"
+                return [{"type": kind, "source": {"type": "base64", "media_type": media, "data": data}}]


fix: handle binary files in read tool (images/PDFs)

66bf366

Copilot AI review requested due to automatic review settings June 2, 2026 23:39

vihaan-kk requested a review from a team as a code owner June 2, 2026 23:39

Copilot AI reviewed Jun 2, 2026

View reviewed changes

fix: catch UnicodeDecodeError for unrecognized binary files

ae6a5dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle binary files in read tool (images/PDFs)#1638

fix: handle binary files in read tool (images/PDFs)#1638
vihaan-kk wants to merge 2 commits into
anthropics:mainfrom
vihaan-kk:fix/read-tool-unicode-decode-error

vihaan-kk commented Jun 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

vihaan-kk Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vihaan-kk commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Fix

Testing

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

vihaan-kk Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vihaan-kk commented Jun 2, 2026 •

edited

Loading