The proxy supports WebSocket transport for OpenAI's Responses API, enabling low-latency, persistent connections optimized for tool-call-heavy workflows.
WebSocket mode provides:
- Up to 40% faster execution for workflows with 20+ tool calls
- Persistent connections with 60-minute session support
- Connection-local caching for
previous_response_idoptimization - Reduced overhead for multi-turn conversations
WebSocket transport is available for:
- OpenAI Responses API (
wss://api.openai.com/v1/responses) - Official OpenAI WebSocket endpoint - OpenAI Codex Connector (
wss://chatgpt.com/backend-api/codex/responses) - Opportunistic support for ChatGPT backend API
Both backends share the same WebSocket client infrastructure and can be enabled independently via configuration.
# config/config.yaml
responses_api:
websocket:
frontend_enabled: true # Allow clients to connect via WebSocket
frontend_path: "/v1/responses" # WebSocket endpoint path
connection_timeout: 3600 # 60 minutes
max_connections: 100 # Maximum concurrent connectionsConfigure your OpenAI backend to use WebSocket transport:
backends:
- backend_type: openai-responses
api_key: ${OPENAI_API_KEY}
use_websocket: true # Enable WebSocket backend transportOr programmatically:
from src.connectors.openai import OpenAIConnector
connector = OpenAIConnector(client, config)
connector.enable_websocket(True)The Codex connector also supports WebSocket transport:
backends:
openai_codex:
enabled: true
extra:
codex:
websocket:
enabled: true # Enable WebSocket for Codex backendOr via environment variable:
OPENAI_CODEX_WEBSOCKET_ENABLED=1Note: The Codex WebSocket implementation uses opportunistic support - it will connect to wss://chatgpt.com/backend-api/codex/responses and automatically falls back to HTTP/SSE if WebSocket is unavailable.
import asyncio
import json
import websockets
async def use_responses_websocket():
# Connect to proxy WebSocket endpoint
async with websockets.connect(
"ws://localhost:8000/v1/responses",
extra_headers={
"Authorization": "Bearer YOUR_PROXY_KEY",
},
) as websocket:
# Send response.create event
request_event = {
"type": "response.create",
"model": "gpt-4o",
"input": "Analyze this codebase and suggest improvements",
"tools": [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a file from the codebase",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
},
},
}
],
}
await websocket.send(json.dumps(request_event))
# Receive streaming events
while True:
message = await websocket.recv()
event = json.loads(message)
event_type = event.get("type")
if event_type == "response.content_part.delta":
# Handle content delta
delta = event.get("delta", {})
content = delta.get("content", "")
print(content, end="", flush=True)
elif event_type == "response.done":
# Response complete
response = event.get("response", {})
print(f"\n\nResponse ID: {response.get('id')}")
break
elif event_type == "error":
# Handle error
error = event.get("error", {})
print(f"Error: {error.get('message')}")
break
asyncio.run(use_responses_websocket())Use previous_response_id to continue conversations:
async def multi_turn_conversation():
async with websockets.connect(
"ws://localhost:8000/v1/responses",
extra_headers={"Authorization": "Bearer YOUR_PROXY_KEY"},
) as websocket:
# First turn
request1 = {
"type": "response.create",
"model": "gpt-4o",
"input": "What is 2+2?",
}
await websocket.send(json.dumps(request1))
# Get response ID
while True:
event = json.loads(await websocket.recv())
if event.get("type") == "response.done":
first_response_id = event["response"]["id"]
break
# Second turn with previous_response_id
request2 = {
"type": "response.create",
"model": "gpt-4o",
"input": "Now multiply that by 3",
"previous_response_id": first_response_id, # Continue conversation
}
await websocket.send(json.dumps(request2))
# Process second response...Create a new response or continue from a previous response:
{
"type": "response.create",
"model": "gpt-4o",
"input": "Your prompt here",
"previous_response_id": "resp_123",
"tools": [],
"max_output_tokens": 1000
}Note: Transport-specific fields like stream and background are not used in WebSocket mode.
Streaming content delta:
{
"type": "response.content_part.delta",
"delta": {
"content": "Hello"
}
}Output item completed:
{
"type": "response.output_item.done",
"item": {
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "Complete message"}]
}
}Response generation complete:
{
"type": "response.done",
"response": {
"id": "resp_123",
"object": "response",
"created": 1234567890,
"model": "gpt-4o",
"output": [...],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}
}Error occurred:
{
"type": "error",
"status": 400,
"error": {
"code": "previous_response_not_found",
"message": "Previous response with id 'resp_123' not found.",
"param": "previous_response_id"
}
}The previous_response_id is not in the connection-local cache:
- Cause: Cache miss, eviction, or using ID from different connection
- Solution: Start a new response chain or use HTTP fallback
Connection has been open for 60 minutes (OpenAI limit):
- Cause: Connection timeout reached
- Solution: Close and reconnect, then continue with
previous_response_idif responses are stored
WebSocket transport is most beneficial for:
- Tool-call-heavy workflows: 20+ tool calls per conversation
- Agentic coding tasks: Multiple file reads, writes, and analysis
- Long-running sessions: Extended back-and-forth with the model
| Workflow Type | HTTP/SSE | WebSocket | Improvement |
|---|---|---|---|
| Simple query | ~200ms | ~180ms | ~10% |
| 5 tool calls | ~2.5s | ~2.0s | ~20% |
| 20 tool calls | ~10s | ~6s | ~40% |
Improvements vary based on network latency and tool complexity
- Connect: Client opens WebSocket connection to
/v1/responses - Authenticate: Bearer token sent in
Authorizationheader - Session: Connection-local cache maintained for 60 minutes
- Requests: Multiple
response.createevents can be sent - Responses: Streaming events sent back for each request
- Timeout: After 60 minutes, connection must be closed and reopened
- Disconnect: Client closes connection or timeout reached
- Connection duration: Maximum 60 minutes (OpenAI limit)
- No multiplexing: One request at a time per connection (sequential)
- Cache scope:
previous_response_idcache is connection-local only - Store compatibility: Use
store=falseor Zero Data Retention (ZDR) for privacy
Test WebSocket transport with the included demo script:
# Test direct OpenAI connection
python scripts/demo_responses_websocket.py --mode direct
# Test through proxy
python scripts/demo_responses_websocket.py --mode proxy --proxy-url ws://localhost:8000/v1/responses
# Multi-turn conversation
python scripts/demo_responses_websocket.py --mode direct --turns 3- Ensure
responses_api.websocket.frontend_enabled: truein config - Verify proxy is running and accessible
- Check firewall rules for WebSocket connections
- Verify API key is valid
- Check
Authorizationheader format:Bearer YOUR_KEY - Ensure key has access to Responses API
- Response may have been evicted from cache
- Different connection than original request
- Start new conversation or use HTTP fallback