Skip to content

Stream Anthropic responses uncompressed#771

Open
xymbol wants to merge 1 commit into
crmne:mainfrom
xymbol:fix/anthropic-streaming-compression
Open

Stream Anthropic responses uncompressed#771
xymbol wants to merge 1 commit into
crmne:mainfrom
xymbol:fix/anthropic-streaming-compression

Conversation

@xymbol
Copy link
Copy Markdown
Contributor

@xymbol xymbol commented May 12, 2026

What this does

Net::HTTP requests gzip and auto-inflates by default. Cloudflare gzips Anthropic's SSE with infrequent deflate flushes, batching chunk delivery into 2 bursts and pushing first-chunk arrival to ~15s on a 22s response.

Setting Accept-Encoding: identity on streaming requests bypasses Net::HTTP's inflater. Scoped to Anthropic streaming; non-streaming responses still benefit from gzip.

Measured on claude-haiku-4-5, 1500-word completion. Sparkline: each char = 1s, digit = chunks delivered, _ = zero.

TTFT chunks/sec
before 15.1s 9____9
after 1.17s 84528454645464555__92

Type of change

  • Bug fix

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • Existing Anthropic cassettes replay clean with the patched code — VCR's default matcher is [:method, :uri], and Net::HTTP still inflates a recorded gzipped response on its own. Re-recording with rake vcr:record[anthropic] is optional and not included here; the only load-bearing diff would be the Accept-Encoding request header value.
  • New spec at spec/ruby_llm/providers/anthropic/streaming_spec.rb asserts the request header (fails without the fix, passes with).
  • No documentation changes needed.

AI-generated code

  • I used AI tools to help diagnose and write this
  • I have reviewed and understand all generated code

API changes

  • No API changes

Related

anthropics/anthropic-sdk-ruby#182 — the official Anthropic Ruby SDK has the same Net::HTTP auto-inflate bug and applies the same one-header fix (Accept-Encoding: identity) on its streaming endpoints.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.06%. Comparing base (4942d6c) to head (5f7a121).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #771   +/-   ##
=======================================
  Coverage   87.05%   87.06%           
=======================================
  Files         119      119           
  Lines        5594     5596    +2     
  Branches     1407     1407           
=======================================
+ Hits         4870     4872    +2     
  Misses        724      724           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xymbol xymbol force-pushed the fix/anthropic-streaming-compression branch 3 times, most recently from 8e01ac9 to e400fd4 Compare May 12, 2026 15:49
Net::HTTP auto-inflates the upstream gzip, which buffers SSE chunks
until Cloudflare flushes its deflate state — turning ~100 events into 2
bursts and pushing first-chunk arrival from ~1s to ~15s on a 22s
response.

Set Accept-Encoding: identity on streaming requests. Non-streaming
responses keep gzip.
@xymbol xymbol force-pushed the fix/anthropic-streaming-compression branch from e400fd4 to 5f7a121 Compare May 12, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant