You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The OpenAI Chat Completions frontend provides full compatibility with the OpenAI Chat Completions API specification. This is the most commonly used frontend, compatible with most OpenAI SDKs, coding agents (Cursor, Windsurf, Cline), and LLM-aware applications.
Endpoints
Method
Path
Description
POST
/v1/chat/completions
Create a chat completion
GET
/v1/models
List available models
Supported Request Parameters
Required Parameters
Parameter
Type
Description
model
string
Model identifier
messages
array
Array of message objects
Optional Parameters
Generation Control
Parameter
Type
Description
max_tokens
integer
Maximum tokens to generate
max_completion_tokens
integer
Maximum completion tokens (newer parameter)
temperature
number
Sampling temperature (0.0-2.0)
top_p
number
Nucleus sampling parameter (0.0-1.0)
n
integer
Number of completions to generate
stop
string/array
Stop sequences
presence_penalty
number
Presence penalty (-2.0 to 2.0)
frequency_penalty
number
Frequency penalty (-2.0 to 2.0)
logit_bias
object
Token bias adjustments
logprobs
boolean
Return log probabilities
top_logprobs
integer
Number of top logprobs to return
seed
integer
Random seed for reproducibility
Tool Calling
Parameter
Type
Description
tools
array
Array of tool/function definitions
tool_choice
string/object
Tool selection: none, auto, required, or specific tool
parallel_tool_calls
boolean
Allow parallel tool execution
Response Format
Parameter
Type
Description
response_format
object
Response format specification
response_format.type
string
text, json_object, or json_schema
Streaming
Parameter
Type
Description
stream
boolean
Enable streaming responses
stream_options
object
Streaming configuration
Advanced
Parameter
Type
Description
user
string
User identifier
service_tier
string
Service tier preference
reasoning_effort
string
Reasoning effort for o-series models
modalities
array
Output modalities (text, audio)
audio
object
Audio output configuration
prediction
object
Predicted output for speculative decoding
Proxy-Specific
Parameter
Type
Description
session_id
string
Session identifier for proxy tracking
agent
string
Agent identifier
extra_body
object
Additional parameters passed to backend
extra_body also supports proxy-only routing hints. For composite selectors that
use [max_context=N], you can set extra_body.request_context_tokens to provide
an exact token count for the current request instead of relying on heuristic
token estimation.