Connection Activity Monitoring

The LLM Proxy provides real-time visibility into active connections through backend connectors. This feature is useful for debugging, monitoring traffic patterns, and understanding system load.

Configuration

Activity monitoring is disabled by default for performance reasons. When enabled, it tracks active connections with minimal overhead using thread-safe atomic operations.

Enabling Activity Tracking

Activity tracking can be enabled via CLI flag, environment variable, or config file. The precedence order is: CLI flag > environment variable > config file.

CLI Flag (Highest Priority)

.venv/Scripts/python.exe -m src.core.cli --enable-activity-tracking

Environment Variable

# Linux/macOS
export ENABLE_ACTIVITY_TRACKING=1

# Windows PowerShell
$env:ENABLE_ACTIVITY_TRACKING = "1"

# Windows CMD
set ENABLE_ACTIVITY_TRACKING=1

Config File (config/config.yaml)

enable_activity_tracking: true

Verifying Configuration

You can verify that activity tracking is enabled by checking the diagnostics endpoint:

curl http://localhost:8000/v1/diagnostics | jq '.activity_tracking_enabled'

Returns true if enabled, false if disabled.

Performance Considerations

When disabled (default), activity tracking has zero overhead - the tracker is not registered in the DI container and no tracking code runs.

When enabled, the overhead is minimal:

O(1) counter updates using threading.Lock
Shallow copies for snapshots to minimize lock contention
No per-chunk logging unless DEBUG level is enabled

Use Cases

Debugging: Identify stuck or long-running requests by monitoring connection duration
Traffic Analysis: Understand which backends and models are most active
Load Monitoring: Track concurrent connections to detect capacity issues
Performance Tuning: Identify slow backends by comparing RX/TX rates

Usage Examples

Query Activity via curl

# Get full diagnostics with activity data
curl http://localhost:8000/v1/diagnostics | jq '.instances[].activity'

# Get just the activity summary
curl http://localhost:8000/v1/diagnostics/activity

Monitor with the CLI Tool

# Real-time monitoring
.venv/Scripts/python.exe scripts/inspect_activity.py --watch

# One-time check with custom endpoint
.venv/Scripts/python.exe scripts/inspect_activity.py http://myproxy:9000

Programmatic Access

from src.core.services.connection_activity_tracker import get_activity_tracker

tracker = get_activity_tracker()
snapshot = tracker.get_global_snapshot()

print(f"Active connections: {snapshot.total_active_connections}")
print(f"Total RX: {snapshot.total_bytes_rx} bytes")
print(f"Total TX: {snapshot.total_bytes_tx} bytes")

Overview

Activity monitoring tracks:

Active Connections: Currently transmitting streaming and non-streaming requests
RX/TX Counters: Bytes received from backend and transmitted to client
Session Isolation: Each connection is tracked by its unique session ID
Per-Backend Metrics: Activity is aggregated per backend instance

Accessing Activity Data

Via Diagnostics API

The /v1/diagnostics endpoint includes activity information for each backend instance:

curl http://localhost:8000/v1/diagnostics

Response includes:

{
  "timestamp": 1701792000.123,
  "instances": [
    {
      "name": "openai.1",
      "connector_type": "openai",
      "is_rate_limited": false,
      "is_functional": true,
      "validation_errors": [],
      "models": [...],
      "activity": {
        "active_connections": 2,
        "connections": [
          {
            "session_id": "abc123...",
            "connection_type": "streaming",
            "started_at": 1701791990.456,
            "duration_seconds": 9.667,
            "model": "gpt-4",
            "bytes_rx": 15360,
            "bytes_tx": 14820
          }
        ],
        "total_bytes_rx": 15360,
        "total_bytes_tx": 14820
      }
    }
  ],
  "global_activity": {
    "total_active_connections": 2,
    "total_bytes_rx": 32000,
    "total_bytes_tx": 31500
  }
}

Lightweight Activity Endpoint

For polling scenarios, use the dedicated activity endpoint:

curl http://localhost:8000/v1/diagnostics/activity

Response:

{
  "total_active_connections": 2,
  "total_bytes_rx": 32000,
  "total_bytes_tx": 31500
}

CLI Tool: inspect_activity.py

A command-line tool is provided for real-time activity monitoring:

# One-time snapshot
.venv/Scripts/python.exe scripts/inspect_activity.py

# Watch mode (auto-refresh)
.venv/Scripts/python.exe scripts/inspect_activity.py --watch

# Custom refresh interval (2 seconds)
.venv/Scripts/python.exe scripts/inspect_activity.py --watch --interval 2

# Raw JSON output
.venv/Scripts/python.exe scripts/inspect_activity.py --raw

# Custom proxy URL
.venv/Scripts/python.exe scripts/inspect_activity.py http://localhost:9000

Watch Mode

When the rich library is installed, watch mode provides a live-updating table:

┌─────────────────────────────── Connection Activity ────────────────────────────────┐
│ Active Connections: 2  |  Total RX: 15.0 KB  |  Total TX: 14.5 KB  |  Updated: 14:30:05 │
└────────────────────────────────────────────────────────────────────────────────────┘

                              Active Connections
┌───────────┬───────────────┬───────────┬─────────────────┬──────────┬─────────┬─────────┐
│ Backend   │ Session       │ Type      │ Model           │ Duration │      RX │      TX │
├───────────┼───────────────┼───────────┼─────────────────┼──────────┼─────────┼─────────┤
│ openai.1  │ abc123def4... │ streaming │ gpt-4           │    12.3s │ 15.0 KB │ 14.5 KB │
│ anthropic │ xyz789abc1... │ streaming │ claude-3-sonnet │     5.7s │  8.2 KB │  8.0 KB │
└───────────┴───────────────┴───────────┴─────────────────┴──────────┴─────────┴─────────┘

Refreshing every 1.0s... (Ctrl+C to stop)

Performance Considerations

Activity tracking is designed for minimal performance impact:

Atomic Operations: Uses threading.Lock for thread-safe counter updates
O(1) Updates: Counter increments are simple integer additions
Lazy Snapshots: Snapshot data is copied only when requested
No Per-Chunk Logging: Activity updates don't generate log entries unless DEBUG level is enabled

Implementation Details

Connection Types

streaming: Server-Sent Events (SSE) streaming responses
non_streaming: Standard JSON responses

Byte Counters

bytes_rx: Bytes received from the backend LLM provider
bytes_tx: Bytes transmitted to the client

Note: Byte counters reflect raw data transfer and may differ from token counts.

Session Isolation

Each connection is uniquely identified by:

Backend instance name (e.g., openai.1)
Session ID from the request

This prevents double-counting when the same session makes multiple requests.

Automatic Cleanup

Connections are automatically removed when:

The request completes (streaming ends or response is sent)
An error occurs during processing
The connection times out (configurable, default 5 minutes)

Troubleshooting

No Activity Shown

Verify the proxy is running and accessible
Check that requests are being routed through the proxy
Ensure backends are initialized (check is_functional in diagnostics)

Stale Connections

If connections appear stuck:

Check backend health in /v1/diagnostics
Review proxy logs for connection errors
The stale connection cleanup runs periodically (5-minute default timeout)

High Connection Count

A high number of active connections may indicate:

Slow backend responses
Client not consuming streams
Network issues causing request backlog

Security

The diagnostics endpoint is restricted to localhost access only. Remote access will receive a 403 Forbidden response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection Activity Monitoring

Configuration

Enabling Activity Tracking

CLI Flag (Highest Priority)

Environment Variable

Config File (config/config.yaml)

Verifying Configuration

Performance Considerations

Use Cases

Usage Examples

Query Activity via curl

Monitor with the CLI Tool

Programmatic Access

Overview

Accessing Activity Data

Via Diagnostics API

Lightweight Activity Endpoint

CLI Tool: inspect_activity.py

Watch Mode

Performance Considerations

Implementation Details

Connection Types

Byte Counters

Session Isolation

Automatic Cleanup

Troubleshooting

No Activity Shown

Stale Connections

High Connection Count

Security

FilesExpand file tree

activity-monitoring.md

Latest commit

History

activity-monitoring.md

File metadata and controls

Connection Activity Monitoring

Configuration

Enabling Activity Tracking

CLI Flag (Highest Priority)

Environment Variable

Config File (config/config.yaml)

Verifying Configuration

Performance Considerations

Use Cases

Usage Examples

Query Activity via curl

Monitor with the CLI Tool

Programmatic Access

Overview

Accessing Activity Data

Via Diagnostics API

Lightweight Activity Endpoint

CLI Tool: inspect_activity.py

Watch Mode

Performance Considerations

Implementation Details

Connection Types

Byte Counters

Session Isolation

Automatic Cleanup

Troubleshooting

No Activity Shown

Stale Connections

High Connection Count

Security