Skip to content

Fix "JSON.generate: UTF-8 string passed as BINARY" warning for text attachments#762

Open
andreaslillebo wants to merge 3 commits into
crmne:mainfrom
andreaslillebo:text-attachment-encoding-tag
Open

Fix "JSON.generate: UTF-8 string passed as BINARY" warning for text attachments#762
andreaslillebo wants to merge 3 commits into
crmne:mainfrom
andreaslillebo:text-attachment-encoding-tag

Conversation

@andreaslillebo
Copy link
Copy Markdown
Contributor

Fix "JSON.generate: UTF-8 string passed as BINARY" warning for text attachments

Problem

Sending a text attachment with non-ASCII content trips a deprecation warning that becomes a hard error in json 3.0:

File.write('hei.txt', 'Hei på deg')

chat = RubyLLM.chat
chat.ask 'What does this say?', with: 'hei.txt'
# warning: JSON.generate: UTF-8 string passed as BINARY, this will raise an encoding error in json 3.0

Cause

The path the bytes take from hei.txt to JSON.generate:

# lib/ruby_llm/attachment.rb
def load_content_from_path
  @content = File.binread(@source)                                          # => ASCII-8BIT
end

# lib/ruby_llm/attachment.rb
def for_llm
  case type
  when :text
    "<file name='#{filename}' mime_type='#{mime_type}'>#{content}</file>"   # => ASCII-8BIT
  else
    "data:#{mime_type};base64,#{encoded}"
  end
end

# Faraday's :json request middleware calls JSON.generate on the request body.
# json/common.rb:445: warning: JSON.generate: UTF-8 string passed as BINARY, ...

File.binread reads bytes as ASCII-8BIT. The tag rides through for_llm's string interpolation into the request body, where JSON.generate rejects it. The other three content loaders (load_content_from_io, load_content_from_active_storage, fetch_content) hit the same chain.

Solution

Re-tag the content as UTF-8 once the mime type is known to be text-like:

@content&.force_encoding(Encoding::UTF_8) if text?

Test

Adds a non-ASCII fixture (spec/fixtures/multilingual.txt) and a regression test:

it 'serializes non-ASCII text/* attachment content as JSON' do
  attachment = RubyLLM::Attachment.new('spec/fixtures/multilingual.txt')

  expect { JSON.generate(text: attachment.content) }.not_to output.to_stderr
end

The test fails before the fix with the json gem's deprecation warning, and passes after.

andreaslillebo and others added 3 commits May 7, 2026 11:18
…ttachments

All four content-loading paths in Attachment return ASCII-8BIT-tagged
strings (File.binread, binmode tempfile reads, ActiveStorage downloads,
Faraday response bodies). For text/* attachments the bytes are valid
UTF-8 but carry the binary tag, which propagates into the request body
and trips json's deprecation warning (hard error in json 3.0).

Re-tag the content as UTF-8 once the mime type is known to be text-like.
Disable Metrics/PerceivedComplexity inline on Attachment#content,
matching the established pattern in lib/ruby_llm/error.rb, models.rb,
and stream_accumulator.rb. Use described_class in the new spec.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.05%. Comparing base (4942d6c) to head (3485a5f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #762   +/-   ##
=======================================
  Coverage   87.05%   87.05%           
=======================================
  Files         119      119           
  Lines        5594     5595    +1     
  Branches     1407     1408    +1     
=======================================
+ Hits         4870     4871    +1     
  Misses        724      724           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants