Skip to content

Auto-chunk TEI embedding requests to respect server batch size limit#4029

Merged
aponcedeleonch merged 1 commit intomainfrom
aponcedeleonch/tei-batch-chunking
Mar 6, 2026
Merged

Auto-chunk TEI embedding requests to respect server batch size limit#4029
aponcedeleonch merged 1 commit intomainfrom
aponcedeleonch/tei-batch-chunking

Conversation

@aponcedeleonch
Copy link
Member

@aponcedeleonch aponcedeleonch commented Mar 6, 2026

Summary

TEI (Text Embeddings Inference), which the Optimizer uses for embeddings, does not automatically batch requests that exceed its max_client_batch_size — it simply rejects them with a 422 error. This bug hadn't surfaced because we were indexing fewer than 32 tools (the default server limit). Once that threshold was exceeded, upserts started failing.

This PR fixes the issue by:

  • Querying the TEI /info endpoint at client creation to discover max_client_batch_size
  • Splitting EmbedBatch requests into chunks that respect the server limit
  • Falling back gracefully to a default of 32 when /info is unavailable

Test plan

  • All existing TEI client tests pass
  • New tests for batch chunking (single batch, exact fit, multi-chunk, many chunks)
  • New test verifying early stop on chunk error
  • New tests for /info endpoint fetching (success, zero value, missing field, server error, invalid JSON, connection refused)
  • New test for newTEIClient with /info integration and fallback behavior

🤖 Generated with Claude Code

@github-actions github-actions bot added the size/M Medium PR: 300-599 lines changed label Mar 6, 2026
@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.59%. Comparing base (edafa62) to head (ab76655).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4029      +/-   ##
==========================================
- Coverage   68.61%   68.59%   -0.03%     
==========================================
  Files         444      444              
  Lines       45187    45253      +66     
==========================================
+ Hits        31007    31040      +33     
- Misses      11780    11813      +33     
  Partials     2400     2400              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TEI (Text Embeddings Inference) does not automatically batch embedding
requests — it rejects requests that exceed max_client_batch_size with a
422 error. This wasn't caught earlier because the Optimizer was indexing
fewer than 32 tools (the default limit). Once we exceeded that threshold,
upserts started failing.

Query the TEI /info endpoint at client creation to discover
max_client_batch_size, then split EmbedBatch requests into chunks
that fit within the limit. Falls back to a default of 32 when
/info is unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aponcedeleonch aponcedeleonch force-pushed the aponcedeleonch/tei-batch-chunking branch from 00fd472 to ab76655 Compare March 6, 2026 11:41
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Mar 6, 2026
@aponcedeleonch aponcedeleonch merged commit b33a55c into main Mar 6, 2026
58 of 59 checks passed
@aponcedeleonch aponcedeleonch deleted the aponcedeleonch/tei-batch-chunking branch March 6, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants