Skip to content

Add LiteLLM provider into ‎responses_api_models#990

Open
imxj wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
imxj:feat/support_litellm_responses
Open

Add LiteLLM provider into ‎responses_api_models#990
imxj wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
imxj:feat/support_litellm_responses

Conversation

@imxj
Copy link
Copy Markdown

@imxj imxj commented Apr 1, 2026

Summary

  • Add new responses_api_models/litellm_model server for LiteLLM proxy endpoints
  • LiteLLMModelServer extends openai_model's SimpleModelServer, overriding only responses() to normalize LiteLLM proxy quirks
  • Normalize chat.completion hybrid responses to standard Responses API format (LiteLLM may downgrade /v1/responses internally)
  • Fix reasoning.effort="none" (string) to null Pydantic validation error for native Responses API responses
  • openai_model remains clean -- no proxy-specific workarounds

Test

  • Tested with GPT-5.4 via LiteLLM-backed endpoint -- native response format, reasoning fix applied
  • Tested with Opus 4.6 via LiteLLM-backed endpoint -- chat.completion hybrid normalized
  • End-to-end rollout collection verified with both models
  • 11 unit tests for litellm_model (normalization + server integration)
  • openai_model existing tests still pass after revert

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@imxj imxj force-pushed the feat/support_litellm_responses branch 2 times, most recently from 805aaef to f0558e5 Compare April 1, 2026 00:42
@imxj imxj changed the title Add LiteLLM response normalization for Responses API proxy compatibility Add LiteLLM provider into ‎responses_api_models Apr 7, 2026
imxj and others added 2 commits April 6, 2026 18:52
LiteLLM proxies may return chat.completion format or hybrid response
objects when called via /v1/responses. This normalizes those responses
to the expected Responses API shape so downstream NeMoGymResponse
validation succeeds. Also fixes reasoning.effort="none" validation
error for native Responses API responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Jin Xu <jinx@nvidia.com>
Revert openai_model/app.py to its clean state and move the LiteLLM
response normalization logic into a new responses_api_models/litellm_model
that extends SimpleModelServer. This keeps proxy-specific workarounds
(reasoning.effort="none" fix, chat.completion->response normalization)
isolated from the native OpenAI model server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Jin Xu <jinx@nvidia.com>
@imxj imxj force-pushed the feat/support_litellm_responses branch from 105e7c6 to 307b580 Compare April 7, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant