The LLM Interactive Proxy includes a comprehensive usage tracking system that monitors all traffic passing through the proxy. This feature provides detailed insights into token consumption, request patterns, performance metrics, and costs across all backends and models.
The usage tracking system captures metrics at four measurement points to provide complete visibility into both original (verbatim) and modified (mutated) traffic:
- Client to Proxy (CTP): Original request from client before any proxy modifications
- Proxy to Backend (PTB): Modified request sent to backend after proxy transformations
- Backend to Proxy (BTP): Original response from backend before proxy modifications
- Proxy to Client (PTC): Modified response sent to client after proxy transformations
This multi-point tracking enables you to:
- Compare what clients send vs. what backends receive (mutation impact on prompts)
- Compare what backends return vs. what clients receive (mutation impact on responses)
- Reconcile proxy calculations with backend billing (proxy vs. backend-reported usage)
- Verbatim tokens: Original token counts before proxy modifications
- Mutated tokens: Token counts after proxy transformations (command injection, content filtering, etc.)
- Backend-reported tokens: Token counts reported by the LLM provider for billing reconciliation
- Extended token details: Reasoning tokens (thinking models), cached tokens, audio tokens
- Request and response counts per backend, model, and frontend
- HTTP status code tracking and error rate monitoring
- Tool call tracking with tool names and counts
- Session and turn tracking for conversation analysis
- Time to First Token (TTFT) for streaming responses
- Proxy processing time (overhead measurement)
- Total request duration
- Statistical aggregations: min, max, average, p50, p95, p99
- Backend-reported costs per request (when available)
- Upstream inference costs for BYOK (Bring Your Own Key) scenarios
- Cost aggregation by backend, model, session, and time period
Filter and aggregate statistics by:
- Backend type (openai, anthropic, gemini, etc.)
- Model name
- Frontend type (OpenAI API, Anthropic API, etc.)
- Traffic leg (CTP, PTB, BTP, PTC)
- User agent or application
- Proxy user
- Date and time dimensions (day, week, month, hour of day)
Usage tracking is enabled by default. Configure it in your config.yaml:
usage_tracking:
enabled: true
persistence_path: "./var/usage_data.json"
flush_interval_seconds: 30.0
max_records_in_memory: 100000- enabled: Enable or disable usage tracking (default:
true) - persistence_path: Path to store usage data (default:
"./var/usage_data.json") - flush_interval_seconds: How often to persist data to disk (default:
30.0) - max_records_in_memory: Maximum records to keep in memory before archival (default:
100000)
You can also configure via environment variables:
export USAGE_TRACKING_ENABLED=true
export USAGE_TRACKING_PERSISTENCE_PATH="./var/usage_data.json"
export USAGE_TRACKING_FLUSH_INTERVAL_SECONDS=30.0
export USAGE_TRACKING_MAX_RECORDS_IN_MEMORY=100000Query usage statistics via REST API.
Security Note: By default, the proxy binds to
127.0.0.1:8000(localhost only) for security. To allow external access, explicitly configure the host with--host 0.0.0.0or sethost: "0.0.0.0"in your configuration file. When exposing the API externally, ensure proper authentication is enabled.
GET /v1/usage/statsQuery parameters:
backend_type: Filter by backend (e.g.,openai,anthropic)model: Filter by model name (e.g.,gpt-4,claude-3-5-sonnet)frontend_type: Filter by frontend API typeleg: Filter by traffic leg (CTP,PTB,BTP,PTC)user_agent: Filter by user agent stringproxy_user: Filter by proxy user identifierstart_date: Start of date range (ISO 8601 format)end_date: End of date range (ISO 8601 format)hour_of_day: Filter by hour (0-23)day_of_week: Filter by day (0=Monday, 6=Sunday)
Example:
# Get stats for OpenAI GPT-4 in the last 24 hours
curl "http://localhost:8000/v1/usage/stats?backend_type=openai&model=gpt-4&start_date=2025-12-02T00:00:00Z"Response:
{
"request_count": 150,
"response_count": 148,
"unique_sessions": 12,
"total_turns": 150,
"total_prompt_tokens": 45000,
"total_completion_tokens": 15000,
"total_tokens": 60000,
"tokens_per_session": 5000.0,
"completion_tokens_per_second": 2.5,
"total_tokens_per_second": 10.0,
"total_tool_calls": 45,
"ttft_stats": {
"count": 148,
"min_ms": 120.5,
"max_ms": 2500.0,
"avg_ms": 450.2,
"p50_ms": 380.0,
"p95_ms": 1200.0,
"p99_ms": 2000.0
},
"proxy_processing_stats": {
"count": 148,
"min_ms": 5.2,
"max_ms": 50.0,
"avg_ms": 12.5,
"p50_ms": 10.0,
"p95_ms": 25.0,
"p99_ms": 40.0
},
"duration_stats": {
"count": 148,
"min_ms": 500.0,
"max_ms": 15000.0,
"avg_ms": 3500.0,
"p50_ms": 3000.0,
"p95_ms": 8000.0,
"p99_ms": 12000.0
},
"status_code_counts": {
"200": 148,
"429": 2
},
"filters": {
"backend_type": "openai",
"model": "gpt-4"
},
"time_window_seconds": 86400.0
}GET /v1/usage/recentQuery parameters:
limit: Maximum number of records to return (default: 100, max: 1000)session_id: Filter by session ID- All filter parameters from
/v1/usage/stats
Example:
# Get last 50 usage records for a specific session
curl "http://localhost:8000/v1/usage/recent?session_id=session-123&limit=50"GET /v1/usage/exportQuery parameters:
start_date: Start of date range (required)end_date: End of date range (required)- All filter parameters from
/v1/usage/stats
Example:
# Export all usage data for December 2025
curl "http://localhost:8000/v1/usage/export?start_date=2025-12-01T00:00:00Z&end_date=2025-12-31T23:59:59Z" > usage_december.jsonCompare proxy-calculated tokens with backend-reported tokens to identify discrepancies:
# Get stats with backend-reported usage
curl "http://localhost:8000/v1/usage/stats?backend_type=openai&model=gpt-4"The response includes both proxy-calculated tokens and backend-reported usage for comparison.
Track Time to First Token (TTFT) and identify slow responses:
# Get stats for the last hour
curl "http://localhost:8000/v1/usage/stats?start_date=$(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)"Check the ttft_stats section for percentile breakdowns.
Analyze costs by backend, model, and time period:
# Get stats for Anthropic Claude models
curl "http://localhost:8000/v1/usage/stats?backend_type=anthropic"Backend-reported costs are included when available from the provider.
Track which tools are being called and how often:
# Get recent records with tool calls
curl "http://localhost:8000/v1/usage/recent?limit=100" | jq '.records[] | select(.tool_call_count > 0)'Analyze token consumption per session:
# Get stats for a specific session
curl "http://localhost:8000/v1/usage/stats?session_id=session-123"Check tokens_per_session for average consumption.
Track HTTP status codes to identify issues:
# Get status code breakdown
curl "http://localhost:8000/v1/usage/stats" | jq '.status_code_counts'Usage data is stored in memory for fast access and periodically persisted to disk:
- In-memory storage: Thread-safe data structure for low-latency recording
- Periodic persistence: Data is flushed to disk every 30 seconds (configurable)
- Startup recovery: Previously persisted data is loaded on proxy startup
- Graceful shutdown: Data is persisted when the proxy shuts down
By default, usage data is stored at ./var/usage_data.json. This file contains:
{
"version": 1,
"last_flush": "2025-12-03T10:30:00Z",
"records": [...],
"sessions": [...],
"aggregated_stats": {...}
}Configure max_records_in_memory to control how many records are kept in memory. When this limit is reached, older records are archived or removed based on your retention policy.
Usage records do NOT include:
- Message content (prompts or completions)
- API keys or authentication tokens
- Personal identifiable information (PII)
Usage records DO include:
- Token counts and timing metrics
- Model and backend identifiers
- Session IDs and turn numbers
- User agent and proxy user (if provided)
- Tool names (but not tool arguments)
The usage API endpoints are protected by the same authentication mechanism as the main proxy API. Ensure proper API key authentication is configured.
To disable usage tracking entirely:
usage_tracking:
enabled: falseOr via environment variable:
export USAGE_TRACKING_ENABLED=falseWhen disabled:
- No usage data is recorded
- API endpoints return empty results
- No performance overhead from tracking
If memory usage is high, reduce max_records_in_memory:
usage_tracking:
max_records_in_memory: 50000If disk writes are slow, increase flush_interval_seconds:
usage_tracking:
flush_interval_seconds: 60.0If usage data is missing after restart:
- Check that
persistence_pathis writable - Verify the file exists at the configured path
- Check proxy logs for persistence errors
- Wire Capture: Record full HTTP requests/responses for debugging
- Replacement Metrics: Track model replacement activation rates
- Session Management: Intelligent session handling and state management
For developers integrating usage tracking into custom controllers or middleware, see the Usage Tracking Integration Guide.