Summary
The proxy fails with a 500 error when generating embeddings via gemini_cli (e.g., gemini-embedding-001). Requests are incorrectly routed to the chat completion endpoint instead of an embedding endpoint.
Technical Details
- Routing Bug: In
rotator_library/client.py, the _execute_with_retry method hardcodes provider_plugin.acompletion for custom providers, ignoring whether the original call was for embeddings.
- Missing Implementation:
GeminiCliProvider in gemini_cli_provider.py does not implement aembedding.
- Endpoint Mismatch: Requests are sent to
:streamGenerateContent, which returns 404/400 for embedding models, resulting in a 500 error for the client.
Steps to Reproduce
curl http://localhost:8000/v1/embeddings \
-H "Authorization: Bearer <token>" \
-d '{"input": "test", "model": "gemini_cli/gemini-embedding-001"}'
Suggested Fix
- Update client.py to check the api_call type before delegation.
- Implement aembedding in GeminiCliProvider using the Google :embedContent endpoint.