The LLM Interactive Proxy includes a WebSocket server that implements the Codebuff protocol, allowing Codebuff clients to connect and route their LLM requests through the proxy's backend infrastructure.
Codebuff is a coding agent platform that uses AI models to assist with software development. The proxy's Codebuff backend compatibility feature enables:
- WebSocket-based communication using the Codebuff protocol
- Session management with heartbeat monitoring
- Streaming LLM responses
- File context initialization
- Topic-based subscription management
- Integration with all existing proxy backends (OpenAI, Anthropic, Gemini, etc.)
Add the following to your config.yaml:
codebuff:
enabled: true # Enable the Codebuff WebSocket server
websocket_path: "/ws" # WebSocket endpoint path
heartbeat_timeout_seconds: 60 # Client heartbeat timeout
session_cleanup_hours: 1 # Inactive session cleanup interval
max_connections: 1000 # Maximum concurrent connections
max_message_size_bytes: 1048576 # Maximum message size (1MB)| Parameter | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Enable/disable the Codebuff WebSocket server |
websocket_path |
string | "/ws" |
WebSocket endpoint path |
heartbeat_timeout_seconds |
integer | 60 |
Seconds without ping before connection timeout |
session_cleanup_hours |
integer | 1 |
Hours of inactivity before session cleanup |
max_connections |
integer | 1000 |
Maximum concurrent WebSocket connections |
max_message_size_bytes |
integer | 1048576 |
Maximum message size in bytes (1MB) |
Start the proxy with Codebuff enabled:
# Using configuration file
python -m src.core.cli --config config/my_config.yaml
# Or with CLI parameters
python -m src.core.cli --default-backend openai:gpt-4oThe WebSocket server will be available at ws://localhost:8000/ws (or your configured host/port).
Configure your Codebuff client to connect to the proxy:
# Example Codebuff client configuration
codebuff --backend-url ws://localhost:8000/ws- Route all Codebuff traffic through the proxy: Start the proxy with
codebuff.enabled: trueand point your Codebuff client tows://localhost:8000/ws. - Force a specific backend/model: Configure
default_backendin your config file (for exampleopenai:gpt-4o) so every Codebuff prompt uses the selected provider. - Enable debugging captures: Start the proxy with wire capture enabled and inspect traffic using
scripts/inspect_cbor_capture.pyto debug model/tool issues. - Tighten limits: Adjust
heartbeat_timeout_secondsandmax_message_size_bytesto align with your infrastructure constraints.
- Standardizing enterprise access: Keep Codebuff clients unmodified while enforcing backend routing, quotas, and audit logging centrally.
- Tooling validation: Safely test new tool-call behaviors (like file editing) by capturing and replaying streams without touching client code.
- Session-aware assistance: Preserve file context and conversation history for multi-turn coding flows, even when Codebuff reconnects.
- Compliance and observability: Capture usage, logs, and wire data in one place to support governance and debugging.
The Codebuff protocol uses JSON messages over WebSocket. All messages follow a structured format with type-based routing.
- Connect: Client establishes WebSocket connection to
/ws - Identify: Client sends
identifymessage with session ID - Ping: Client sends periodic
pingmessages for heartbeat - Actions: Client sends
actionmessages (prompt, init, subscribe, etc.) - Responses: Server sends
ackandactionresponses - Disconnect: Connection closes, session cleaned up
Identify Message
{
"type": "identify",
"txid": 1,
"clientSessionId": "session-123"
}Ping Message
{
"type": "ping",
"txid": 2
}Prompt Action
{
"type": "action",
"txid": 3,
"data": {
"type": "prompt",
"promptId": "prompt-456",
"prompt": "Write a hello world function",
"fingerprintId": "client-fingerprint",
"model": "gpt-4",
"sessionState": {},
"toolResults": []
}
}Init Action
{
"type": "action",
"txid": 4,
"data": {
"type": "init",
"fingerprintId": "client-fingerprint",
"fileContext": {
"files": [
{"path": "main.py", "content": "..."}
]
}
}
}Subscribe/Unsubscribe
{
"type": "subscribe",
"txid": 5,
"topics": ["updates", "notifications"]
}Acknowledgment
{
"type": "ack",
"txid": 1,
"success": true,
"error": null
}Response Chunk (Streaming)
{
"type": "action",
"data": {
"type": "response-chunk",
"userInputId": "prompt-456",
"chunk": "def hello_world():\n"
}
}Prompt Response (Final)
{
"type": "action",
"data": {
"type": "prompt-response",
"promptId": "prompt-456",
"sessionState": {},
"toolCalls": null,
"toolResults": null,
"output": null
}
}Error Response
{
"type": "action",
"data": {
"type": "prompt-error",
"userInputId": "prompt-456",
"message": "Backend unavailable",
"error": "Connection timeout",
"remainingBalance": null
}
}Each WebSocket connection maintains a session with:
- Unique session ID
- Conversation history
- File context (from init action)
- Subscription topics
- Last-seen timestamp for heartbeat monitoring
Sessions are automatically cleaned up after the configured inactivity period.
Clients must send periodic ping messages to keep the connection alive. If no ping is received within heartbeat_timeout_seconds, the connection is terminated and the session is cleaned up.
LLM responses are streamed in real-time as response-chunk actions. Each chunk includes:
- The user input ID for correlation
- A text chunk from the LLM response
When streaming completes, a final prompt-response action is sent with the complete session state.
Clients can initialize a session with file context using the init action. This context is stored in the session and made available for subsequent prompt actions, allowing the LLM to understand the codebase.
Clients can subscribe to topics to receive targeted messages. The subscription system supports:
- Multiple topics per client
- Dynamic subscribe/unsubscribe
- Automatic cleanup on disconnect
The Codebuff backend integrates seamlessly with the proxy's existing infrastructure:
- Backend Factory: Routes requests to appropriate backends based on model name
- Format Conversion: Converts between Codebuff and OpenAI message formats
- Middleware: Applies all existing response middleware
- Accounting: Tracks usage and attributes costs to fingerprint IDs
- Error Handling: Uses the proxy's exception hierarchy
- Heartbeat Timeout: Connection terminated after
heartbeat_timeout_secondswithout ping - Max Connections: New connections rejected when
max_connectionsis reached - Message Size: Messages exceeding
max_message_size_bytesare rejected
- JSON Parsing: Invalid JSON returns
ackwithsuccess=false - Schema Validation: Invalid message structure returns
ackwith error details - Unknown Message Type: Unrecognized types return
ackwith error
- Backend Unavailable: Returns
prompt-errorwith backend status - Unsupported Model: Returns
prompt-errorindicating model not supported - Streaming Error: Returns
prompt-errorwith error details - Timeout: Returns
prompt-errorafter request timeout
The Codebuff backend logs:
- Connection events (connect, disconnect)
- Message types and session IDs
- Errors with full context
- Heartbeat timeouts
Sensitive information (auth tokens, full message contents) is never logged.
Key metrics to monitor:
- Active WebSocket connections
- Messages per second
- Average response time
- Error rate
- Session count
- Memory usage per connection
In the current MVP implementation:
- Auth tokens are accepted but not validated
- Fingerprint IDs are tracked for attribution
- Usage values are dummy placeholders
Future versions will add:
- Token validation against user database
- Rate limiting per user
- Real usage tracking and billing
All inputs are validated:
- JSON schema validation for all messages
- Sanitization of user-provided strings
- Model name validation against allowed list
- Message size limits enforced
Logs exclude:
- Auth tokens
- Full message contents (only types and IDs logged)
- Sensitive file contents
- User credentials
Problem: Client cannot connect to WebSocket
Solutions:
- Verify
codebuff.enabled: truein configuration - Check that proxy is running and accessible
- Verify WebSocket path matches client configuration
- Check firewall rules allow WebSocket connections
Problem: Connections frequently timeout
Solutions:
- Increase
heartbeat_timeout_secondsin configuration - Verify client is sending ping messages regularly
- Check network stability between client and server
Problem: Messages rejected with validation errors
Solutions:
- Verify message format matches protocol specification
- Check JSON is valid and properly formatted
- Ensure all required fields are present
- Review error message for specific validation failures
Problem: Prompts fail with backend errors
Solutions:
- Verify backend is configured and accessible
- Check API keys are set correctly
- Ensure model name is supported by backend
- Review backend-specific documentation
The current MVP implementation has the following limitations:
Not Implemented:
- Tool call support (tool-call-request/response flow)
- File access support (read-files request/response)
- MCP tool support
- Real authentication and authorization
- Real usage tracking and billing
- Session persistence across restarts
- WebSocket compression
These features are planned for future releases.
import asyncio
import websockets
import json
async def codebuff_client():
uri = "ws://localhost:8000/ws"
async with websockets.connect(uri) as websocket:
# 1. Identify
await websocket.send(json.dumps({
"type": "identify",
"txid": 1,
"clientSessionId": "my-session"
}))
response = await websocket.recv()
print(f"Identify: {response}")
# 2. Initialize with file context
await websocket.send(json.dumps({
"type": "action",
"txid": 2,
"data": {
"type": "init",
"fingerprintId": "my-client",
"fileContext": {
"files": [{"path": "main.py", "content": "# My code"}]
}
}
}))
response = await websocket.recv()
print(f"Init: {response}")
# 3. Send prompt
await websocket.send(json.dumps({
"type": "action",
"txid": 3,
"data": {
"type": "prompt",
"promptId": "prompt-1",
"prompt": "Write a hello world function",
"fingerprintId": "my-client",
"model": "gpt-4",
"sessionState": {},
"toolResults": []
}
}))
# 4. Receive streaming response
while True:
response = await websocket.recv()
data = json.loads(response)
if data["type"] == "action":
action_type = data["data"]["type"]
if action_type == "response-chunk":
print(data["data"]["chunk"], end="", flush=True)
elif action_type == "prompt-response":
print("\nComplete!")
break
elif action_type == "prompt-error":
print(f"\nError: {data['data']['message']}")
break
asyncio.run(codebuff_client())IMPORTANT LEGAL NOTICE - READ CAREFULLY BEFORE USING THE CODEBUFF-COMPATIBLE BACKEND
-
Non-Affiliation: This project is an independent open-source initiative. It is not affiliated with, endorsed by, authorized by, or in any way officially connected to Codebuff or any of their subsidiaries or affiliates. All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.
-
No Liability: The authors, contributors, and maintainers of this project hold no responsibility or liability for any consequences arising from the use of this backend in violation of these rules, or for any violations of third-party Terms of Service resulting from such use.
-
User Responsibility: You accept full responsibility for ensuring your use of this tool complies with all applicable laws and third-party agreements.
-
Compliance with Provider Terms: Users of the Codebuff-compatible backend connector are strictly required to respect all related Terms of Service (ToS) and other agreements with Codebuff and any backend providers. You are solely responsible for verifying that your use of this software is compatible with those agreements.
-
Indemnification: You agree to indemnify, defend, and hold harmless the authors and contributors of this project from and against any and all claims, liabilities, damages, losses, or expenses, including legal fees and costs, arising out of or in any way connected with your access to or use of the Codebuff-compatible backend.
If you do not agree to these terms, do not use the Codebuff-compatible backend interface.
- Configuration Guide - Complete configuration reference
- Backend Overview - Available backends
- Wire Capture - Debugging WebSocket traffic
- Session Management - Session configuration