This document describes the usage tracking infrastructure for developers who need to integrate usage tracking into custom controllers, middleware, or backend connectors.
For end-users: See the Usage Tracking User Guide for configuration, API usage, and monitoring.
The usage tracking system provides comprehensive monitoring of all traffic passing through the proxy, enabling detailed analysis, billing reconciliation, and performance monitoring. The system tracks usage at four measurement points to provide full observability of both verbatim (original) and mutated (modified) traffic.
This guide covers:
- Service architecture and dependency injection
- Integration patterns for controllers and middleware
- Helper functions for common use cases
- Testing strategies
- Performance considerations
Usage tracking is configured through the UsageTrackingConfig class in AppConfig:
from src.core.config.app_config import AppConfig
config = AppConfig.from_env()
# Access usage tracking configuration
print(config.usage_tracking.enabled) # True by default
print(config.usage_tracking.persistence_path) # "./var/usage_data.json"
print(config.usage_tracking.flush_interval_seconds) # 30.0
print(config.usage_tracking.max_records_in_memory) # 100000Configuration options:
enabled: Whether detailed usage tracking is enabled (default: True)persistence_path: Path for persistence file (default: "./var/usage_data.json")flush_interval_seconds: Interval for periodic persistence (default: 30.0)max_records_in_memory: Maximum records to keep in memory (default: 100000)
Three main services are registered in the DI container:
Thread-safe storage with periodic persistence to disk.
from src.core.services.in_memory_usage_store import InMemoryUsageStore
store = service_provider.get_required_service(InMemoryUsageStore)Service for recording usage metrics at request/response boundaries.
from src.core.interfaces.usage_recording_interface import IUsageRecordingService
usage_service = service_provider.get_required_service(IUsageRecordingService)
# Record a request
record_id = await usage_service.record_request(
session_id="session-123",
backend_type="openai",
model="gpt-4",
frontend_type="openai",
leg=TrafficLeg.CLIENT_TO_PROXY,
prompt_tokens=100,
user_agent="MyApp/1.0",
proxy_user="user@example.com",
)
# Complete the record with response data
await usage_service.record_response(
record_id=record_id,
completion_tokens=50,
http_status_code=200,
tool_call_count=2,
tool_names=["search", "calculate"],
ttft_ms=150.0,
proxy_processing_ms=10.0,
total_duration_ms=500.0,
)Service for aggregating usage statistics with filtering.
from src.core.interfaces.statistics_service_interface import IStatisticsService
from src.core.domain.statistics_filter import StatisticsFilter
stats_service = service_provider.get_required_service(IStatisticsService)
# Get aggregated statistics
filter = StatisticsFilter(backend_type="openai", model="gpt-4")
stats = await stats_service.get_aggregated_stats(filter)
print(f"Total requests: {stats.request_count}")
print(f"Total tokens: {stats.total_tokens}")
print(f"Tokens per session: {stats.tokens_per_session}")The UsageTrackingMiddleware captures timing and user context at the request/response boundaries:
from src.core.app.middleware.usage_tracking_middleware import UsageTrackingMiddleware
# Middleware is automatically registered when usage_tracking.enabled = True
# It captures:
# - Request start time
# - User-agent header
# - Proxy user header
# - Response end time
# - Total durationThe middleware stores timing information in request.state for downstream use:
request.state.request_start_time: Request start timestamprequest.state.user_agent: User-agent stringrequest.state.proxy_user: Proxy user identifierrequest.state.response_end_time: Response end timestamprequest.state.total_duration_ms: Total request duration in milliseconds
Helper functions are provided for controllers to record usage:
from src.core.app.helpers.usage_recording_helper import (
record_request_usage,
record_response_usage,
extract_tool_calls_from_response,
extract_backend_reported_usage,
)
# In a controller:
async def handle_request(request: Request, ...):
# Record request
record_id = await record_request_usage(
usage_service=usage_service,
request=request,
session_id=session_id,
backend_type="openai",
model="gpt-4",
frontend_type="openai",
leg=TrafficLeg.CLIENT_TO_PROXY,
prompt_tokens=100,
)
# Process request...
response = await process_request(...)
# Extract tool calls and backend usage
tool_call_count, tool_names = extract_tool_calls_from_response(response)
backend_usage = extract_backend_reported_usage(response)
# Record response
await record_response_usage(
usage_service=usage_service,
request=request,
record_id=record_id,
completion_tokens=50,
http_status_code=200,
tool_call_count=tool_call_count,
tool_names=tool_names,
backend_reported_usage=backend_usage,
)The system tracks usage at four measurement points:
- CLIENT_TO_PROXY (CTP): Verbatim tokens from client request before proxy modifications
- PROXY_TO_BACKEND (PTB): Mutated tokens sent to backend after proxy modifications
- BACKEND_TO_PROXY (BTP): Verbatim tokens from backend response before proxy modifications
- PROXY_TO_CLIENT (PTC): Mutated tokens sent to client after proxy modifications
Additionally, backend-reported usage is captured separately for reconciliation.
Usage statistics can be queried via REST API endpoints (implemented in src/core/app/routes/usage_routes.py):
GET /v1/usage/stats: Get aggregated statistics with filteringGET /v1/usage/recent: Get recent usage recordsGET /v1/usage/export: Export usage data as JSON
To integrate usage tracking in a controller:
- Resolve the
IUsageRecordingServicefrom the DI container - Use the helper functions to record request/response usage
- Extract tool calls and backend-reported usage from responses
- Record timing information from
request.state
Example:
from src.core.app.helpers.usage_recording_helper import (
record_request_usage,
record_response_usage,
)
from src.core.domain.traffic_leg import TrafficLeg
from src.core.interfaces.usage_recording_interface import IUsageRecordingService
class MyController:
def __init__(self, usage_service: IUsageRecordingService):
self._usage_service = usage_service
async def handle_chat_completion(self, request: Request, ...):
# Record request
record_id = await record_request_usage(
usage_service=self._usage_service,
request=request,
session_id=session_id,
backend_type=backend_type,
model=model,
frontend_type="openai",
leg=TrafficLeg.CLIENT_TO_PROXY,
prompt_tokens=prompt_tokens,
)
# Process request...
response = await self._process_request(...)
# Record response
await record_response_usage(
usage_service=self._usage_service,
request=request,
record_id=record_id,
completion_tokens=completion_tokens,
http_status_code=200,
)
return responseThe usage tracking infrastructure includes comprehensive tests:
- Property-based tests:
tests/property/test_usage_tracking_domain_properties.py - Service tests:
tests/property/test_usage_recording_service_properties.py - Integration tests:
tests/integration/test_usage_tracking_integration.py - Unit tests:
tests/unit/test_statistics_aggregation_service.py
Run all tests:
./.venv/Scripts/python.exe -m pytest tests/property/test_usage_tracking_domain_properties.py -v
./.venv/Scripts/python.exe -m pytest tests/integration/test_usage_tracking_integration.py -vTo disable usage tracking, set enabled: false in the configuration:
usage_tracking:
enabled: falseOr via environment variable:
export USAGE_TRACKING_ENABLED=falseWhen disabled, the services will not be registered in the DI container, and the middleware will not be added to the application.
The in-memory store keeps up to max_records_in_memory records (default: 100,000). Each record is approximately 1-2 KB, so the default configuration uses ~100-200 MB of memory.
To reduce memory usage:
- Decrease
max_records_in_memory - Increase
flush_interval_secondsto persist more frequently - Implement custom archival logic for old records
All usage tracking services are thread-safe:
InMemoryUsageStoreusesthreading.RLockfor concurrent access- Services can be safely called from multiple request handlers
- No external locking is required when using the services
Usage tracking adds minimal overhead:
- Request recording: <1ms per request
- Response recording: <1ms per response
- Statistics aggregation: <10ms for typical queries
- Persistence: Asynchronous, does not block request handling
Always use the helper functions instead of calling services directly:
# Good
await record_request_usage(usage_service, request, ...)
# Avoid
record_id = await usage_service.record_request(...)Always extract and record backend-reported usage for reconciliation:
backend_usage = extract_backend_reported_usage(response)
await record_response_usage(
usage_service=usage_service,
record_id=record_id,
backend_reported_usage=backend_usage,
...
)Usage tracking should never break request processing:
try:
await record_request_usage(...)
except Exception as e:
logger.error(f"Failed to record usage: {e}")
# Continue processing requestAlways test usage tracking integration:
async def test_controller_records_usage(usage_service_mock):
controller = MyController(usage_service=usage_service_mock)
await controller.handle_request(...)
# Verify usage was recorded
usage_service_mock.record_request.assert_called_once()
usage_service_mock.record_response.assert_called_once()- Check that
usage_tracking.enabled = truein configuration - Verify services are registered in DI container
- Check logs for errors during service initialization
- Ensure helper functions are being called in controllers
- Verify tokenization is using the correct model
- Check that verbatim/mutated tokens are captured at correct points
- Compare proxy-calculated vs backend-reported tokens
- Review token extraction logic in helper functions
- Check
max_records_in_memoryconfiguration - Monitor record count with
len(store._records) - Implement custom archival for old records
- Consider using SQLite persistence for large deployments
- User Guide: Usage Tracking and Statistics
- Design Document:
.kiro/specs/detailed-usage-tracking/design.md - Requirements Document:
.kiro/specs/detailed-usage-tracking/requirements.md - API Routes:
src/core/app/routes/usage_routes.py - Services:
src/core/services/usage_recording_service.py,src/core/services/statistics_aggregation_service.py - Domain Models:
src/core/domain/usage_record.py,src/core/domain/traffic_leg.py