Monitoring and Analytics Overview

The LLM Interactive Proxy provides comprehensive monitoring and analytics capabilities to help you understand usage patterns, track costs, optimize performance, and ensure backend availability.

Available Features

Backend Health Checks

Backend Health Checks monitors the health of backend API endpoints and automatically excludes unhealthy backends from request routing.

Key Capabilities:

ICMP ping checks for network reachability
HTTP probe checks for application-level health
Circuit breaker to exclude unhealthy backends from routing
Real-time backend notifications on health changes
REST API for health status monitoring

Use Cases:

Automatic failover when backend APIs become unavailable
Multi-region deployment with intelligent routing
Proactive monitoring and alerting
Preventing requests to unreachable endpoints

Quick Start:

# Check health status
curl "http://localhost:8000/internal/health" | jq '.endpoint_health'

# View summary
curl "http://localhost:8000/internal/health" | jq '.endpoint_health.summary'

Usage Tracking and Statistics

Usage Tracking and Statistics provides detailed monitoring of all traffic passing through the proxy.

Key Capabilities:

Multi-point token tracking (verbatim and mutated)
Performance metrics (TTFT, latency, throughput)
Cost tracking and billing reconciliation
Request/response monitoring
Tool call tracking
Multi-dimensional filtering and aggregation

Use Cases:

Billing reconciliation with provider invoices
Performance monitoring and optimization
Cost analysis by backend, model, and time period
Tool usage analysis
Session and conversation analysis
Error rate monitoring

Quick Start:

# Get aggregated statistics
curl "http://localhost:8000/v1/usage/stats?backend_type=openai&model=gpt-4"

# Get recent usage records
curl "http://localhost:8000/v1/usage/recent?limit=50"

# Export usage data
curl "http://localhost:8000/v1/usage/export?start_date=2025-12-01T00:00:00Z&end_date=2025-12-31T23:59:59Z"

Replacement Metrics

Replacement Metrics tracks activation rates and effectiveness of the random model replacement feature.

Key Capabilities:

Track replacement activation rates
Monitor turn counts before replacement
Analyze opt-out patterns
Measure replacement effectiveness

Wire Capture

Wire Capture records full HTTP requests and responses for debugging.

Key Capabilities:

Record complete request/response cycles
Analyze traffic patterns
Debug integration issues
Replay captured sessions

Comparison Matrix

Feature	Token Tracking	Cost Tracking	Performance Metrics	Request Details	Real-time API	Health Monitoring
Health Checks	-	-	✓ (latency)	-	✓	✓ (ping, HTTP)
Usage Tracking	✓ (4 points)	✓	✓ (TTFT, latency)	✓ (full details)	✓	-
Replacement Metrics	-	-	-	✓ (replacements)	✓	-
Wire Capture	-	-	-	✓ (full payload)	-	-

Configuration

Enable All Monitoring Features

# Health checks (enabled by default)
health_check:
  enabled: true
  circuit_breaker_enabled: true
  notify_backends: true
  ping:
    enabled: true
    interval_seconds: 30.0
    failure_threshold: 3
  http:
    enabled: true
    interval_seconds: 60.0
    failure_threshold: 2

# Usage tracking (enabled by default)
usage_tracking:
  enabled: true
  persistence_path: "./var/usage_data.json"
  flush_interval_seconds: 30.0

# Wire capture (disabled by default)
wire_capture:
  enabled: true
  output_dir: "./var/wire_captures"

Environment Variables

# Usage tracking
export USAGE_TRACKING_ENABLED=true
export USAGE_TRACKING_PERSISTENCE_PATH="./var/usage_data.json"

# Wire capture
export WIRE_CAPTURE_ENABLED=true
export WIRE_CAPTURE_OUTPUT_DIR="./var/wire_captures"

Security Considerations

Default Binding

For security, the proxy binds to 127.0.0.1:8000 (localhost only) by default. This means:

REST API endpoints are only accessible from the local machine
External access requires explicit configuration
Authentication is enforced when binding to external interfaces

To allow external access:

# config.yaml
host: "0.0.0.0"  # Bind to all interfaces
auth:
  disable_auth: false  # Keep authentication enabled

Or via command line:

python -m src.core.cli --host 0.0.0.0

Warning: When exposing the API externally, always ensure authentication is enabled to prevent unauthorized access to usage data.

Usage Examples

Query Aggregated Statistics

# Get statistics for a specific backend
curl "http://localhost:8000/v1/usage/stats?backend_type=openai"

# Get statistics with multiple filters
curl "http://localhost:8000/v1/usage/stats?backend_type=anthropic&model=claude-3-5-sonnet&start_date=2025-12-01T00:00:00Z"

Export Usage Data

# Export usage data for a date range
curl "http://localhost:8000/v1/usage/export?start_date=2025-12-01T00:00:00Z&end_date=2025-12-31T23:59:59Z" > usage_data.json

Monitor Performance

# Check TTFT and latency metrics
curl "http://localhost:8000/v1/usage/stats" | jq '.ttft_stats, .duration_stats'

Use Cases

1. Billing Reconciliation

Use Usage Tracking to compare proxy-calculated tokens with backend-reported tokens:

curl "http://localhost:8000/v1/usage/stats?backend_type=openai" | jq '{
  proxy_tokens: .total_tokens,
  backend_tokens: .backend_reported_total_tokens,
  difference: (.total_tokens - .backend_reported_total_tokens)
}'

2. Performance Monitoring

Track Time to First Token (TTFT) and identify slow responses:

curl "http://localhost:8000/v1/usage/stats" | jq '.ttft_stats'

3. Cost Analysis

Analyze costs by backend and model:

curl "http://localhost:8000/v1/usage/stats?backend_type=anthropic" | jq '{
  total_cost: .total_cost,
  requests: .request_count,
  cost_per_request: (.total_cost / .request_count)
}'

4. Debugging Integration Issues

Use Wire Capture to record and analyze problematic requests:

# Enable wire capture
python -m src.core.cli --wire-capture-enabled

# Analyze captured traffic
python scripts/inspect_cbor_capture.py var/wire_captures_cbor/capture_file.cbor --detect-issues

5. Model Replacement Analysis

Track replacement effectiveness with Replacement Metrics:

curl "http://localhost:8000/v1/replacement/metrics"

Best Practices

1. Regular Monitoring

Set up automated monitoring to track key metrics:

Token consumption trends
Error rates (HTTP status codes)
Performance degradation (TTFT increases)
Cost anomalies

2. Data Retention

Configure appropriate retention policies:

usage_tracking:
  max_records_in_memory: 100000  # Adjust based on traffic volume
  flush_interval_seconds: 30.0   # Balance between durability and performance

3. Privacy Considerations

Usage tracking does NOT record:

Message content (prompts or completions)
API keys or authentication tokens
Personal identifiable information (PII)

4. Performance Impact

All monitoring features are designed for minimal overhead:

Usage tracking: <1ms per request
Wire capture: ~2-5ms per request (when enabled)
Replacement metrics: <0.1ms per request

Troubleshooting

High Memory Usage

If usage tracking consumes too much memory:

Reduce max_records_in_memory
Increase flush_interval_seconds
Implement custom archival logic

Missing Data

If usage data is missing after restart:

Check persistence_path is writable
Verify file exists at configured path
Check proxy logs for persistence errors

Slow Queries

If statistics queries are slow:

Add more specific filters to reduce result set
Use date range filters to limit scope
Consider implementing database persistence for large deployments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring and Analytics Overview

Available Features

Backend Health Checks

Usage Tracking and Statistics

Replacement Metrics

Wire Capture

Comparison Matrix

Configuration

Enable All Monitoring Features

Environment Variables

Security Considerations

Default Binding

Usage Examples

Query Aggregated Statistics

Export Usage Data

Monitor Performance

Use Cases

1. Billing Reconciliation

2. Performance Monitoring

3. Cost Analysis

4. Debugging Integration Issues

5. Model Replacement Analysis

Best Practices

1. Regular Monitoring

2. Data Retention

3. Privacy Considerations

4. Performance Impact

Troubleshooting

High Memory Usage

Missing Data

Slow Queries

Related Documentation

See Also

FilesExpand file tree

monitoring-overview.md

Latest commit

History

monitoring-overview.md

File metadata and controls

Monitoring and Analytics Overview

Available Features

Backend Health Checks

Usage Tracking and Statistics

Replacement Metrics

Wire Capture

Comparison Matrix

Configuration

Enable All Monitoring Features

Environment Variables

Security Considerations

Default Binding

Usage Examples

Query Aggregated Statistics

Export Usage Data

Monitor Performance

Use Cases

1. Billing Reconciliation

2. Performance Monitoring

3. Cost Analysis

4. Debugging Integration Issues

5. Model Replacement Analysis

Best Practices

1. Regular Monitoring

2. Data Retention

3. Privacy Considerations

4. Performance Impact

Troubleshooting

High Memory Usage

Missing Data

Slow Queries

Related Documentation

See Also