Real-time monitoring, debugging, and operational guides
π Back to Main | π Documentation Index | ποΈ Architecture
Discogsography provides comprehensive monitoring and observability features to track system health, performance, and processing progress. This guide covers dashboards, debugging tools, metrics, and operational procedures.
The web-based dashboard provides real-time monitoring of all system components through a WebSocket-powered interface.
# Start all services
docker-compose up -d
# Access dashboard
open http://localhost:8003- Real-time status of all microservices
- Health check endpoints (β healthy, β unhealthy)
- Uptime tracking for each service
- Auto-refresh via WebSocket updates
Pipeline Services Monitored by Dashboard:
Discogs pipeline:
- Extractor (http://extractor-discogs:8000/health)
- Graphinator (http://graphinator:8001/health)
- Tableinator (http://tableinator:8002/health)
MusicBrainz pipeline (auto-hidden when not deployed):
- Extractor MusicBrainz (http://extractor-musicbrainz:8000/health β separate container, each extractor listens on port 8000 inside its own container)
- Brainzgraphinator (http://brainzgraphinator:8011/health)
- Brainztableinator (http://brainztableinator:8010/health)
Other service health endpoints (not monitored by Dashboard, available for manual checks):
- Dashboard (http://localhost:8003/health)
- API (http://localhost:8004/health or http://localhost:8005/health)
- Explore (http://localhost:8007/health)
- Insights (http://localhost:8009/health β internal only in Docker Compose, not exposed to host)
- Message counts per queue (artists, labels, releases, masters)
- Consumer counts - active consumers per queue
- Message rates - messages/second throughput
- Queue depth trends - historical visualization
- Stall detection - alerts when queues stop processing
Neo4j Metrics:
- Node counts by type (Artist, Label, Release, Master, Genre, Style)
- Relationship counts
- Database size
- Connection pool status
PostgreSQL Metrics:
- Record counts per table
- Table sizes and index sizes
- Connection pool status
- Query performance stats
- Recent events from all services
- Processing updates with timestamps
- Error notifications with severity levels
- Filterable by service and log level
- Auto-scroll for live updates
The dashboard includes a login-gated admin panel at /admin for managing extractions and dead-letter queues. The monitoring dashboard at / remains public.
# Create an admin account (one-time setup)
docker exec -it discogsography-api-1 admin-setup \
--email admin@example.com --password <min-8-chars>
# Access admin panel
open http://localhost:8003/adminSee Admin Guide for full details.
- Extraction Control β Trigger a full reprocessing of Discogs data (
POST /admin/api/extractions/trigger). Manual triggers always force reprocessing regardless of existing state markers. - MusicBrainz Extraction Control β Trigger a full reprocessing of MusicBrainz data (
POST /admin/api/extractions/trigger-musicbrainz). Only shown when the MusicBrainz pipeline is deployed. - Extraction History β View past extractions with status, duration, record counts, and error messages. Auto-refreshes every 30 seconds.
- DLQ Management β Purge dead-letter queues when messages are known-bad or after fixing the root cause of processing failures.
The admin panel frontend (/admin) is served from the dashboard service. Admin API calls are proxied through the dashboard to the API service β the dashboard's admin_proxy router forwards requests with the JWT Authorization header and returns responses unchanged. Authentication and authorization are handled entirely by the API service.
The dashboard uses WebSocket for real-time updates:
// Connect to WebSocket
const ws = new WebSocket('ws://localhost:8003/ws');
// Receive updates
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(data);
};Update Types:
service_health: Service status changesqueue_metrics: Queue depth and consumer updatesdatabase_stats: Database record countsactivity_log: New log entries
# Check for errors in all service logs
just check-errors
# Or directly with Python
uv run python utilities/check_errors.pyOutput:
- Counts errors by service
- Shows recent error messages
- Groups similar errors
- Highlights critical issues
# Real-time queue monitoring
just monitor
# Or directly with Python
uv run python utilities/monitor_queues.pyOutput:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RabbitMQ Queue Monitor β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Queue: artists_queue
ββ Messages: 1,234
ββ Consumers: 2
ββ Rate: 45.2 msg/s
ββ Status: β
Processing
Queue: releases_queue
ββ Messages: 5,678
ββ Consumers: 2
ββ Rate: 123.4 msg/s
ββ Status: β
Processing
...
# Comprehensive system monitoring
just system-monitor
# Or directly with Python
uv run python utilities/system_monitor.pyFeatures:
- CPU and memory usage per service
- Disk I/O statistics
- Network throughput
- Database connection counts
- Processing rates and bottlenecks
# All services
just logs
# Or with docker-compose
docker-compose logs -f
# Specific service
docker-compose logs -f extractor-discogs extractor-musicbrainz
docker-compose logs -f graphinator
docker-compose logs -f tableinator
docker-compose logs -f dashboard# Errors only
docker-compose logs | grep "ERROR"
docker-compose logs | grep "β"
# Warnings and errors
docker-compose logs | grep -E "(WARNING|ERROR)"
docker-compose logs | grep -E "(β οΈ|β)"
# Success messages
docker-compose logs | grep "β
"
# Database queries (DEBUG level only)
docker-compose logs dashboard | grep "π Executing"Each service tracks and logs processing statistics:
π Starting Extractor
π₯ Downloading artists data dump
π Processed 10,000 artists (1,234 msg/s)
π Processed 50,000 artists (1,456 msg/s)
β
Completed artists processing: 100,000 total
Key Metrics:
- Records/second processing rate
- Total records processed
- Skipped records (duplicates)
- Failed records
- Download speed (MB/s)
π Connected to Neo4j
π° Connected to RabbitMQ
π Processing artists queue
π Created 1,000 Artist nodes (234 nodes/s)
πΎ Updated 50 existing Artist nodes
β
Completed processing
Key Metrics:
- Nodes created/updated per second
- Relationships created per second
- Transaction batch sizes
- Queue processing rates
- Deduplication hits
π Connected to PostgreSQL
π° Connected to RabbitMQ
π Processing releases queue
π Inserted 5,000 releases (567 records/s)
β© Skipped 123 duplicates
β
Completed processing
Key Metrics:
- Records inserted/second
- Duplicate records skipped
- Batch insert sizes
- Index creation time
- Table sizes
-- Node counts by type
MATCH (n)
RETURN labels(n)[0] as type, count(n) as count
ORDER BY count DESC;
-- Relationship counts
MATCH ()-[r]->()
RETURN type(r) as relationship, count(r) as count
ORDER BY count DESC;
-- Database size
CALL apoc.meta.stats() YIELD labels, relTypesCount, nodeCount, relCount;-- Record counts
SELECT 'artists' as table_name, COUNT(*) FROM artists
UNION ALL SELECT 'labels', COUNT(*) FROM labels
UNION ALL SELECT 'releases', COUNT(*) FROM releases
UNION ALL SELECT 'masters', COUNT(*) FROM masters;
-- Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Active connections
SELECT count(*) FROM pg_stat_activity
WHERE datname = 'discogsography';Access RabbitMQ Management UI:
open http://localhost:15672Login: discogsography / discogsography
Available Metrics:
- Queue depth (messages ready)
- Consumer count per queue
- Message rates (publish/deliver)
- Connection counts
- Channel counts
- Memory usage
API Access:
# Queue overview
curl -u discogsography:discogsography \
http://localhost:15672/api/queues
# Specific queue
curl -u discogsography:discogsography \
http://localhost:15672/api/queues/%2F/artists_queueThe Insights service runs scheduled batch analytics and exposes results via the API proxy. Monitor its operation through:
Health Check (internal only β port 8009 is not exposed to the host in Docker Compose):
# From within the Docker network:
docker exec discogsography-insights-1 curl -s http://localhost:8009/health
# Response: {"status": "healthy"}Computation Status:
# Check latest computation run via API proxy
curl http://localhost:8004/api/insights/statusKey Metrics:
- Computation run timestamps and duration
- Result counts per insight type (top-artists, genre-trends, label-longevity, this-month, data-completeness)
- Redis cache hit/miss rates (when
REDIS_HOSTis configured) - Schedule interval (
INSIGHTS_SCHEDULE_HOURS, default: 24h)
Log Monitoring:
# Watch Insights service logs
docker-compose logs -f insights
# Check for computation completions
docker-compose logs insights | grep "Computation"# Connect to Redis
docker-compose exec redis redis-cli
# Get info
INFO stats
INFO memory
INFO keyspace
# Monitor commands
MONITOR
# Get key count
DBSIZE
# Check specific keys
KEYS dashboard:*
TTL dashboard:service_healthAll services expose HTTP health check endpoints:
# Extractor
curl http://localhost:8000/health
# Response: {"status": "healthy"}
# Graphinator
curl http://localhost:8001/health
# Response: {"status": "healthy"}
# Tableinator
curl http://localhost:8002/health
# Response: {"status": "healthy"}
# Dashboard
curl http://localhost:8003/health
# Response: {"status": "healthy"}
# API (health check port)
curl http://localhost:8005/health
# Response: {"status": "healthy", "service": "api", ...}
# Explore
curl http://localhost:8007/health
# Response: {"status": "healthy"}
# Brainztableinator
curl http://localhost:8010/health
# Response: {"status": "healthy"}
# Brainzgraphinator
curl http://localhost:8011/health
# Response: {"status": "healthy"}
#!/bin/bash
# check-all-health.sh
services=(
"Extractor:8000"
"Graphinator:8001"
"Tableinator:8002"
"Dashboard:8003"
"API:8005"
"Explore:8007"
"Brainztableinator:8010"
"Brainzgraphinator:8011"
)
for service in "${services[@]}"; do
name="${service%%:*}"
port="${service##*:}"
response=$(curl -s http://localhost:$port/health)
if [[ $response == *"healthy"* ]]; then
echo "β
$name is healthy"
else
echo "β $name is unhealthy"
fi
doneNeo4j:
# Check connectivity
curl http://localhost:7474
# Query test
echo "RETURN 1 as test;" | \
cypher-shell -u neo4j -p discogsographyPostgreSQL:
# Check connectivity
PGPASSWORD=discogsography psql \
-h localhost -p 5433 -U discogsography \
-d discogsography -c "SELECT 1;"RabbitMQ:
# Check management API
curl -u discogsography:discogsography \
http://localhost:15672/api/overviewThe dashboard automatically detects when processing stalls:
Conditions:
- Queue has messages but no consumption for 5+ minutes
- Consumer count is 0 but messages exist
- Message rate drops to 0 unexpectedly
Actions:
- Alert displayed on dashboard
- Log entry with
β οΈ emoji - Optional webhook notification (configure in dashboard code)
Errors are automatically tracked and reported:
# Recent errors across all services
just check-errors
# Errors by service
docker-compose logs graphinator | grep "β"
# Critical errors
docker-compose logs | grep "CRITICAL"Extend the dashboard for custom alerts:
# dashboard/dashboard.py
async def check_custom_condition():
"""Custom alert condition"""
if some_metric > threshold:
await broadcast_alert({
"type": "custom_alert",
"severity": "warning",
"message": "Custom condition triggered"
})# Health check all services individually
curl http://localhost:8000/health # Extractor
curl http://localhost:8001/health # Graphinator
curl http://localhost:8002/health # Tableinator
curl http://localhost:8003/health # Dashboard
curl http://localhost:8005/health # API (health check port)
curl http://localhost:8007/health # Explore (health check port)# Set LOG_LEVEL environment variable
export LOG_LEVEL=DEBUG
# Restart services
docker-compose down
docker-compose up -d
# Or for specific service
LOG_LEVEL=DEBUG uv run python dashboard/dashboard.pyDebug Level Includes:
- Database query logging with parameters
- Internal state transitions
- Cache hits/misses
- Message processing details
- Connection lifecycle events
# All services
docker-compose logs -f
# Specific service with timestamp
docker-compose logs -f --timestamps graphinator
# Filter for errors
docker-compose logs -f | grep -E "(ERROR|β)"# RabbitMQ management UI
open http://localhost:15672
# Or use CLI monitoring
just monitorLook for:
- Messages accumulating (consumers not keeping up)
- Zero consumers (service not connected)
- High unacked count (processing errors)
# Neo4j
curl http://localhost:7474
# PostgreSQL
PGPASSWORD=discogsography psql -h localhost -p 5433 \
-U discogsography -d discogsography -c "SELECT 1;"# System monitoring
just system-monitor
# Database query performance (Neo4j)
MATCH (n) RETURN count(n);
PROFILE MATCH (a:Artist {name: "Pink Floyd"}) RETURN a;
# PostgreSQL query performance
EXPLAIN ANALYZE
SELECT data FROM artists WHERE data->>'name' = 'Pink Floyd';Set LOG_LEVEL environment variable:
| Level | Use Case | Output |
|---|---|---|
DEBUG |
Development | All logs + query details |
INFO |
Production | Normal operations (default) |
WARNING |
Production alerts | Warnings and errors only |
ERROR |
Critical only | Errors only |
CRITICAL |
Emergencies | Critical errors only |
All services use structlog with JSON output. Each log entry is a JSON object containing structured fields:
{"service": "graphinator", "environment": "production", "event": "π Starting service", "level": "info", "timestamp": "2025-01-15T10:30:45.123456Z"}Example output:
{"service": "graphinator", "environment": "production", "event": "π Starting service", "level": "info", "timestamp": "2025-01-15T10:30:45.123456Z"}
{"service": "graphinator", "environment": "production", "event": "π Connected to Neo4j", "level": "info", "timestamp": "2025-01-15T10:30:46.234567Z"}
{"service": "graphinator", "environment": "production", "event": "π° Connected to RabbitMQ", "level": "info", "timestamp": "2025-01-15T10:30:47.345678Z"}See Logging Guide for complete logging documentation.
Track records/second for each service:
# Watch logs for processing stats
docker-compose logs -f | grep "π"
# Expected rates
# - Extractor: 20,000-400,000+ records/s# Docker stats
docker stats
# Specific service
docker stats discogsography-graphinator-1
# System monitor utility
just system-monitorNeo4j:
-- Query performance profiling
PROFILE MATCH (a:Artist)-[:BY]-(r:Release)
WHERE a.name = "Pink Floyd"
RETURN r.title, r.year;
-- Slow query log (check logs)
docker-compose logs neo4j | grep "slow query"PostgreSQL:
-- Active queries
SELECT pid, query, state, query_start
FROM pg_stat_activity
WHERE datname = 'discogsography'
AND state = 'active';
-- Slow queries (requires pg_stat_statements extension)
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;- Troubleshooting Guide - Common issues and solutions
- Performance Guide - Performance optimization
- Logging Guide - Detailed logging documentation
- Architecture Overview - System architecture
- Database Resilience - Connection patterns
Last Updated: 2026-04-03