|
| 1 | +# KickTalk OpenTelemetry Setup |
| 2 | + |
| 3 | +This directory contains the OpenTelemetry observability stack for KickTalk application monitoring, including distributed tracing, metrics collection, and log aggregation. |
| 4 | + |
| 5 | +## Architecture |
| 6 | + |
| 7 | +- **OpenTelemetry Collector**: Receives, processes, and exports telemetry data |
| 8 | +- **Jaeger**: Distributed tracing backend and UI |
| 9 | +- **Prometheus**: Metrics storage and querying |
| 10 | +- **Grafana**: Visualization and dashboards |
| 11 | +- **Redis**: Optional caching for telemetry data |
| 12 | + |
| 13 | +## Quick Start |
| 14 | + |
| 15 | +1. **Start the observability stack:** |
| 16 | + ```bash |
| 17 | + docker-compose -f docker-compose.otel.yml up -d |
| 18 | + ``` |
| 19 | + |
| 20 | +2. **Access the services:** |
| 21 | + - **Grafana Dashboard**: http://localhost:3000 (admin/admin) |
| 22 | + - **Jaeger UI**: http://localhost:16686 |
| 23 | + - **Prometheus**: http://localhost:9090 |
| 24 | + - **OTEL Collector Health**: http://localhost:13133 |
| 25 | + |
| 26 | +3. **Configure KickTalk** to send telemetry to: |
| 27 | + - **OTLP gRPC**: `http://localhost:4317` |
| 28 | + - **OTLP HTTP**: `http://localhost:4318` |
| 29 | + |
| 30 | +## Configuration |
| 31 | + |
| 32 | +### OTEL Collector (`collector-config.yml`) |
| 33 | + |
| 34 | +The collector is configured to: |
| 35 | +- **Receive** telemetry via OTLP (gRPC/HTTP) |
| 36 | +- **Process** data with batching, memory limiting, and attribute filtering |
| 37 | +- **Export** traces to Jaeger, metrics to Prometheus, and logs to files |
| 38 | + |
| 39 | +Key features: |
| 40 | +- **Privacy-focused**: Automatically filters sensitive data (tokens, auth info) |
| 41 | +- **Resource attribution**: Adds service.name, version, environment tags |
| 42 | +- **Performance optimized**: Batching and memory limits configured |
| 43 | + |
| 44 | +### Prometheus (`prometheus.yml`) |
| 45 | + |
| 46 | +Scrapes metrics from: |
| 47 | +- OTEL Collector internal metrics |
| 48 | +- KickTalk application metrics (port 9464) |
| 49 | +- Jaeger metrics for tracing health |
| 50 | + |
| 51 | +### Grafana Dashboards |
| 52 | + |
| 53 | +Pre-configured dashboards for: |
| 54 | +- **KickTalk Overview**: Application health, connections, message throughput |
| 55 | +- **Memory & Performance**: Resource usage, API response times |
| 56 | +- **Connection Health**: WebSocket stability, reconnection rates |
| 57 | + |
| 58 | +## Application Integration |
| 59 | + |
| 60 | +To integrate KickTalk with this observability stack, the application needs to: |
| 61 | + |
| 62 | +1. **Install OTEL SDK** packages for Node.js/Electron |
| 63 | +2. **Configure exporters** to send data to `localhost:4317` |
| 64 | +3. **Implement metrics** for key application events |
| 65 | +4. **Add tracing** to critical code paths |
| 66 | + |
| 67 | +## Metrics to Implement |
| 68 | + |
| 69 | +### Connection Metrics |
| 70 | +- `kicktalk_websocket_connections_active` - Active WebSocket connections |
| 71 | +- `kicktalk_websocket_reconnections_total` - Connection reconnection events |
| 72 | +- `kicktalk_connection_errors_total` - Connection failure events |
| 73 | + |
| 74 | +### Message Metrics |
| 75 | +- `kicktalk_messages_sent_total` - Messages sent by user |
| 76 | +- `kicktalk_messages_received_total` - Messages received from chat |
| 77 | +- `kicktalk_message_send_duration_seconds` - Message send latency |
| 78 | + |
| 79 | +### Resource Metrics |
| 80 | +- `kicktalk_memory_usage_bytes` - Application memory consumption |
| 81 | +- `kicktalk_cpu_usage_percent` - CPU utilization |
| 82 | +- `kicktalk_open_handles_total` - File/socket handles |
| 83 | + |
| 84 | +### API Metrics |
| 85 | +- `kicktalk_api_request_duration_seconds` - API response times |
| 86 | +- `kicktalk_api_requests_total` - API request counts by endpoint/status |
| 87 | + |
| 88 | +## Traces to Implement |
| 89 | + |
| 90 | +### User Actions |
| 91 | +- Message sending flow (input → validation → API → confirmation) |
| 92 | +- Chatroom joining/leaving |
| 93 | +- Settings changes |
| 94 | + |
| 95 | +### System Operations |
| 96 | +- WebSocket connection establishment |
| 97 | +- API calls (Kick, 7TV) |
| 98 | +- Emote loading and caching |
| 99 | + |
| 100 | +### Error Scenarios |
| 101 | +- Connection failures and recovery |
| 102 | +- API timeouts and retries |
| 103 | +- Memory leak detection points |
| 104 | + |
| 105 | +## Privacy & Security |
| 106 | + |
| 107 | +The collector configuration includes privacy protections: |
| 108 | +- **Automatic filtering** of authentication tokens |
| 109 | +- **Local-only operation** by default |
| 110 | +- **Configurable data retention** periods |
| 111 | +- **No PII collection** in standard metrics |
| 112 | + |
| 113 | +## Development Usage |
| 114 | + |
| 115 | +### View Real-time Metrics |
| 116 | +```bash |
| 117 | +# Watch collector logs |
| 118 | +docker-compose -f docker-compose.otel.yml logs -f otel-collector |
| 119 | + |
| 120 | +# Query Prometheus directly |
| 121 | +curl http://localhost:9090/api/v1/query?query=up |
| 122 | + |
| 123 | +# Check collector health |
| 124 | +curl http://localhost:13133 |
| 125 | +``` |
| 126 | + |
| 127 | +### Custom Dashboards |
| 128 | + |
| 129 | +Add custom dashboard JSON files to `otel/grafana/dashboards/` and they'll be automatically loaded into Grafana. |
| 130 | + |
| 131 | +### Testing Telemetry |
| 132 | + |
| 133 | +Send test traces/metrics to the collector: |
| 134 | +```bash |
| 135 | +# Test OTLP HTTP endpoint |
| 136 | +curl -X POST http://localhost:4318/v1/traces \ |
| 137 | + -H "Content-Type: application/json" \ |
| 138 | + -d '{"resourceSpans":[...]}' |
| 139 | +``` |
| 140 | + |
| 141 | +## Production Considerations |
| 142 | + |
| 143 | +For production deployment: |
| 144 | +- Use external Prometheus/Jaeger instances |
| 145 | +- Configure authentication for Grafana |
| 146 | +- Set up alerting rules in Prometheus |
| 147 | +- Implement log rotation and retention policies |
| 148 | +- Consider using OTEL Collector in agent/gateway mode |
| 149 | + |
| 150 | +## Stopping the Stack |
| 151 | + |
| 152 | +```bash |
| 153 | +docker-compose -f docker-compose.otel.yml down |
| 154 | +``` |
| 155 | + |
| 156 | +To remove all data: |
| 157 | +```bash |
| 158 | +docker-compose -f docker-compose.otel.yml down -v |
| 159 | +``` |
0 commit comments