Summary of recent enhancements to the Discogsography platform
Last Updated: 2026-03-27
Overview: Added three new /api/network/ endpoints for collaboration network analysis β multi-hop collaborator traversal, centrality scoring, and community detection.
- Multi-hop collaborators (
GET /api/network/artist/{id}/collaborators?depth=2&limit=50): Traverses shared-release relationships up to 3 hops deep, returning collaborators ranked by distance and collaboration count. - Centrality scores (
GET /api/network/artist/{id}/centrality): Computes degree centrality (total relationships), collaborator count, collaboration releases, group membership, and alias counts. Cached in Redis (1h TTL). - Community detection (
GET /api/network/cluster/{id}): Detects communities around an artist by grouping shared-release neighbors by their primary genre. Cached in Redis (1h TTL). - Performance tested: All three endpoints added to the perftest suite with depth-1 and depth-2 collaborator variants.
Overview: Added two new graph-powered features β a collaborator network endpoint that finds artists sharing releases with temporal breakdown, and a genre tree endpoint that derives a full genre/style hierarchy from release co-occurrence.
- Collaborator Network (
GET /api/collaborators/{artist_id}): Finds artists who share releases with a given artist, returning yearly collaboration counts, first/last year, and total release overlap. Rate limited to 30/min with Neo4j timeout protection. - Genre Tree (
GET /api/genre-tree): Returns the complete genre/style hierarchy derived from release co-occurrence. Cached in-memory for 5 minutes. Rate limited to 30/min with timeout protection. - Explore Frontend: New Collaborators and Genre Tree panes integrated into the Explore UI with dedicated JavaScript modules (
collaborators.js,genre-tree.js). - Security Ignore System: Added
.pip-audit-ignoresandosv-scanner.tomlfor managing known upstream vulnerabilities with no available fix. Theupdate-project.shscript automatically sweeps these after dependency upgrades and removes entries that have been resolved.
Overview: Over 11 optimization rounds across PRs #175-#184, the entire API query layer was systematically profiled and optimized, achieving a 249x reduction in overall average latency (10.95s β 0.044s) across 88 endpoints. See the full Query Performance Optimizations report for detailed analysis.
| PR | Focus | Key Impact |
|---|---|---|
| #175 | Initial Cypher optimization of 6 slowest queries | 10-100x fewer DB hits per query |
| #176 | 7 query families: CALL {} barriers, streaming aggregation, batch similarity | Path finder: 58s β 0.2s, trends: CartesianProduct eliminated |
| #177 | Cardinality management with per-genre LIMITs, parallel genre-emergence | artist-similar: top-5-genre cap prevents mega-genre explosion |
| #179 | asyncio.gather() concurrency, pattern comprehension for planner control | explore/genre: 4 concurrent queries vs chained OPTIONAL MATCHes |
| #180 | Per-genre CALL {} barriers for similarity queries | label-similar: 206M β 60-80M DB hits, 1GB β 200MB memory |
| #181 | Pre-computed Genre/Style/Label node properties at import time | explore/genre: 200M β 6 DB hits; genre-emergence: 410M β 33 DB hits |
| #184 | Style-based similarity, Redis caching (24h TTL), search per-table LIMIT | trends/genre: 28s β 0.001s; artist-similar: 112s β 0.002s |
- Pre-computed node properties: Aggregate counts (release_count, artist_count, label_count, style_count, first_year) computed during graphinator post-import step and stored on Genre/Style/Label nodes
- CALL {} subqueries: Prevent Neo4j planner CartesianProduct plans by creating strong barriers for traversal order
- Pattern comprehension: Force specific node-first traversal when even CALL {} doesn't control the planner
- Redis cache-aside: 24h TTL for trends, similarity, and label-DNA; 5m TTL for search results
- Batch queries: N+1 query patterns (800 queries β 4 queries) replaced with UNWIND-based batching
- Per-dimension LIMIT: Cap high-cardinality genre expansions (Rock: 6M+ releases β LIMIT 500 per genre)
- asyncio.gather(): Execute independent Neo4j/PostgreSQL queries concurrently
- Relationship type filtering: shortestPath with explicit type list eliminates unbounded BFS
| Category | Before | After | Speedup |
|---|---|---|---|
| Path finder (6 endpoints) | 58.5s | 0.21s | 279x |
| Explore genre (2) | 24.1s | 0.014s | 1,721x |
| Trends genre (2) | 28.6s | 0.001s | 28,600x |
| Trends style (3) | 13.2s | 0.001s | 13,200x |
| Genre emergence | 64.3s | 0.10s | 630x |
| Artist similarity (4) | 64s | 0.002s | 32,000x |
| Label similarity (3) | 86s | 0.001s | 86,000x |
| Overall (88 endpoints) | 10.95s | 0.044s | 249x |
Overview: Eliminated cold-cache penalties for label-DNA compare and common search terms by reusing Redis caches and pre-warming on startup.
- Label-DNA compare cache reuse:
_build_dnanow checks and populates the same Redis cache as the/dnaendpoint β compare was doing 15.1s cold cache because it bypassed the label DNA cache - Search pre-warming: Pre-warm Redis search cache on startup for 10 common high-cardinality terms (Rock, Electronic, Jazz, etc.) that take ~9s cold
- Increased search TTL: Search cache TTL increased from 300s (5 min) to 3600s (1 hour) to reduce cold cache frequency
Overview: Pre-compute label statistics during import, add Redis caching for explore and trends endpoints, and fix label-DNA compare 500 errors.
- Pre-computed label stats: Extend
compute_genre_style_statsto setrelease_count,artist_count,genre_counton Label nodes (batched in transactions of 100 rows) - Redis caching: Cache
trends/label(24h TTL) andexplore/artist/explore/label(24h TTL) to avoid expensive COUNT traversals - Label-DNA compare fix: Replace broken single-traversal Cypher with parallel
asyncio.gatherof 4 individual queries; add early return for labels below MIN_RELEASES - Migration script: One-time
scripts/compute-label-stats.shfor existing databases
Overview: A configurable rule engine in the Rust extractor that validates parsed records against YAML-defined quality rules, flagging bad data without blocking the pipeline.
- YAML rule configuration: Define rules per data type with 5 condition types β Required, Range, Regex, Length, and Enum
- Observation-only pipeline stage: Validator evaluates records between parser and batcher; all messages pass through regardless of violations
- Raw XML reconstruction: Parser reconstructs XML fragments from parsed element trees for comparing against parsed JSON to diagnose data vs parsing errors
- Flagged record storage: Writes separate XML, JSON, and JSONL files per flagged record organized by version/data_type
- Quality report: Tracks per-rule violation counts with deterministic output for automated analysis
- Default rules: Ships with
extraction-rules.yamlcovering numeric genre detection, year-out-of-range checks, missing title/name validation across all 4 data types - Docker integration: Rules file mounted read-only into extractor container via docker-compose
- Design spec:
docs/superpowers/specs/2026-03-21-data-quality-rules-design.md - Implementation plan:
docs/superpowers/plans/2026-03-21-data-quality-rules.md
Overview: Targeted optimization of 6 query families β genre-emergence, artist-similar, label-DNA, search, and data-completeness β for dramatic reductions in DB hits and cold cache latency.
- Genre-emergence: Read pre-computed
first_yearfrom Genre/Style nodes instead of live traversal (183.5M β ~50 DB accesses) - Artist-similar: Cap inner release scan at 100K per genre to prevent full traversal of mega-genres like Rock (7M releases β 100K sampled)
- Label-DNA: Batch 4 separate identity/genre/style/decade queries into a single
get_label_full_profiletraversal (6 queries β 3 for cold cache) - Search: Cap total count, type counts, genre facets, and decade facets at 10K rows per table to prevent full scans on common terms
- Data-completeness: Add Redis caching (6h TTL) to prevent repeated full table scans of the releases table
Overview: GitHub Actions workflows now detect when a PR only changes markdown files and skip heavy jobs (build, test, lint) to save CI minutes.
Overview: Optimized the 6 slowest Cypher queries identified by the query profiling infrastructure, achieving 10-100x fewer database hits per query.
- Targeted optimization: Profiling data from #174 identified the exact bottleneck queries
- Better index usage: Rewrote queries to leverage existing indexes more effectively
- Reduced traversals: Minimized relationship traversals and node lookups
- Measurable impact: Before/after perftest results stored in
perftest-results/
Overview: Expanded the query profiling infrastructure to cover SQL queries alongside Cypher, and broadened the perftest suite to cover additional API endpoints.
- SQL profiling: Added
EXPLAIN ANALYZEprofiling for PostgreSQL queries alongside existing CypherPROFILE - Perftest expansion: Additional API endpoints covered in
tests/perftest/config.yaml - Latency reports: p50, p95, p99 latency measurements with statistical accuracy
- Query plan inspection: Automated query plan analysis for identifying performance regressions
Overview: Switched to neo4j-rust-ext, a Rust-backed extension for the Neo4j Python driver that accelerates Bolt protocol serialization/deserialization.
- Drop-in replacement: No code changes required β the Rust extension transparently accelerates the existing
neo4jPython driver - Up to 10x faster: Bolt protocol handling moved from Python to compiled Rust code
- All services benefit: API, Graphinator, Dashboard, and Schema-Init all use the Neo4j driver
Overview: Added a JavaScript testing framework using Vitest for the Explore frontend, enabling unit testing of the modular JS codebase (app.js, graph.js, api-client.js, etc.).
- Vitest framework: Fast, modern JS test runner with native ES module support
- Task runner integration:
just test-jsandjust test-js-covcommands for running JS tests - CI integration: JavaScript tests run as part of the
test.ymlGitHub Actions workflow - Parallel execution: JS tests run alongside Python and Rust tests in
just test-parallel
Overview: Added graph-powered music discovery features including artist similarity scoring, "Explore from Here" navigation, and multi-signal recommendation engine.
- Artist similarity: Graph-based similarity scoring using shared labels, genres, styles, and collaborations
- Explore from Here: Navigate the knowledge graph starting from any artist, label, or release node
- Multi-signal recommendations: Combines graph proximity, genre overlap, and collaboration patterns to surface related artists and releases
- Explore UI integration: Discovery features accessible from the Explore frontend
Overview: Extended the Vinyl Archaeology time-travel feature with snapshot comparison capabilities, allowing users to compare the state of the knowledge graph at different points in music history.
- Snapshot comparison: Compare graph state between two points in time
- Visual diff: See what changed in the graph between selected time periods
- Explore UI integration: Comparison controls integrated into the timeline scrubber
Overview: Unified the Explore frontend styling with the Dashboard design system, ensuring visual consistency across the platform.
- Shared design system: Explore now uses the same Tailwind CSS theme, color palette, and component styles as the Dashboard
- Consistent typography: Unified font usage (Inter + JetBrains Mono) across both frontends
- Dark theme alignment: Explore dark theme matches Dashboard for a cohesive user experience
Overview: Enhanced the Insights service with Redis caching for computed results, an Insights panel in the Explore UI, auto-refresh polling, and configurable milestone years.
- Redis caching: Cache-aside pattern with TTL matching the schedule interval; cache invalidated after each computation run
- Explore UI panel: New "Insights" tab in the Explore frontend displaying precomputed analytics
- Auto-refresh polling: Explore UI polls for updated insights every 60 seconds
- Configurable milestones:
INSIGHTS_MILESTONE_YEARSenvironment variable controls which anniversary years are highlighted (e.g., 25, 50, 75, 100) REDIS_HOSTfor Insights: Insights service now connects to Redis for caching
Overview: Added a new Insights microservice that runs scheduled batch analytics against Neo4j and PostgreSQL, stores precomputed results, and exposes them via read-only HTTP endpoints proxied through the API service.
- 5 computation types: Artist centrality (graph edge count), genre trends (release count by decade), label longevity (years active), monthly anniversaries (25/30/40/50/75/100-year milestones), and data completeness scores
- Scheduled execution: Configurable interval via
INSIGHTS_SCHEDULE_HOURS(default: 24 hours) - API proxy endpoints: All results accessible via
/api/insights/*(top-artists, genre-trends, label-longevity, this-month, data-completeness, status) - PostgreSQL storage: Results stored in
insights.*schema tables with computation audit log
insights/insights.pyβ Main service with scheduler loop and health server (port 8008/8009)insights/computations.pyβ Computation orchestration for all 5 insight typesinsights/models.pyβ Pydantic response modelsapi/routers/insights.pyβ API proxy router forwarding to insights serviceschema-init/postgres_schema.pyβinsights.*table definitionsdocker-compose.ymlβ New insights service containerdocker-compose.prod.ymlβ Production overrides with secrets
Overview: Added a search pane to the Explore frontend with full-text search across all entity types, powered by the existing GET /api/search endpoint.
- Search pane: Dedicated tab in the Explore UI for full-text search
- Entity type filters: Filter results by artists, labels, releases, or masters
- Paginated results: Browse through large result sets
- Graph integration: Click search results to navigate to nodes in the graph
Overview: Added time-travel filtering capabilities that let users explore the knowledge graph as it existed at any point in music history.
- Year-range endpoint:
GET /api/explore/year-rangereturns min/max release years in the dataset - Genre emergence:
GET /api/explore/genre-emergence?before_year=Nreturns genres that existed before a given year - Time-filtered expansion:
before_yearparameter on/api/expandfilters graph expansion by release year - Timeline scrubber UI: Interactive slider in Explore frontend for setting the time-travel year
Overview: Added Label DNA endpoints that create unique fingerprints for record labels based on their genre, style, and format profiles, and allow comparing labels for similarity.
- Label identity:
/api/label/{label_id}/dnareturns a label's identity profile (genres, styles, formats, decades active) - Similar labels:
/api/label/{label_id}/similarreturns labels with similar DNA profiles - Label comparison:
/api/label/dna/comparecompares two labels and returns a similarity score - Genre/style profiles: Percentage breakdown of a label's releases by genre and style
Overview: Added taste fingerprint analytics that analyze a user's personal collection to generate insights about their musical preferences.
- Taste heatmap:
GET /api/user/taste/heatmapβ genre x decade heatmap of the user's collection - Full fingerprint:
GET /api/user/taste/fingerprintβ combined heatmap, obscurity score, drift analysis, and blind spots - Blind spots:
GET /api/user/taste/blindspotsβ genres where favorite artists release but user hasn't collected - Taste card:
GET /api/user/taste/cardβ shareable SVG visualization of taste profile - Dashboard strip: Taste fingerprint summary displayed in the Explore Collection pane
Overview: Added collection timeline endpoints that show how a user's collection has evolved over time.
- Collection timeline:
GET /api/user/collection/timelineβ chronological view of collection additions - Collection evolution:
GET /api/user/collection/evolutionβ statistical evolution of collection over time
Overview: After extraction, database record counts could drift from the extractor's counts due to stub nodes in Neo4j (created by cross-type MERGE operations) and stale rows in PostgreSQL (left over from prior extractions). A new extraction_complete message and per-consumer cleanup phase ensures count parity after every run.
- Extractor (
extractor.rs,message_queue.rs,types.rs): Recordsextraction_started_atand sends anextraction_completemessage to all 4 fanout exchanges after all files finish. The message includesversion,started_at, and per-typerecord_counts. - Graphinator (
graphinator.py): Onextraction_complete, flushes remaining batches and deletes stub nodes (nodes without asha256property) for the given data type. - Tableinator (
tableinator.py): Onextraction_complete, flushes remaining batches and purges stale rows whereupdated_at < started_at. Single-message upsert usesCASEexpressions to skip JSONB data rewrite for unchanged rows while always refreshingupdated_at. - Schema (
postgres_schema.py): Addedupdated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()column and index to all entity tables, with a migration for existing tables. - Batch processor (
batch_processor.py): Upsert SQL setsupdated_at = NOW()on insert and conflict update. Unchanged rows (hash match) skip the data rewrite but get a lightweight bulkUPDATE ... SET updated_at = NOW()to stay marked as current.
- Database counts match extractor counts after each run
- No manual cleanup needed between extractions
- Handles both additions and removals in Discogs dumps
Overview: Added gap analysis endpoints that let users discover which releases they are missing from a label, artist, or master.
- Three gap analysis endpoints:
/api/collection/gaps/label/{id},/api/collection/gaps/artist/{id},/api/collection/gaps/master/{id} - Format filtering:
/api/collection/formatsreturns distinct formats in the user's collection; gap results can be filtered by format - Wantlist awareness: Gap results indicate which missing releases are already on the user's wantlist; optional
exclude_wantlistfilter - Summary counts: Each response includes total/owned/missing counts for the entity
- Frontend integration: "What am I missing?" button on artist and label nodes in the Explore info panel opens a dedicated Missing pane with paginated results, format filters, and wantlist toggle
api/routers/collection.pyβ New router with gap analysis and format endpointsapi/queries/gap_queries.pyβ Cypher queries for label, artist, and master gapsexplore/static/js/user-panes.jsβ Gap analysis pane rendering (table, filters, pagination)explore/static/js/app.jsβ Info panel "What am I missing?" button wiringexplore/static/js/api-client.jsβgetCollectionGaps()andgetCollectionFormats()methodsexplore/static/index.htmlβ Gaps pane and nav tabexplore/static/css/styles.cssβ Gap analysis styles
Overview: Removed the Curator service entirely β it was dead code after sync logic was migrated to api/routers/sync.py during the API consolidation.
- Deleted
curator/directory (service code, Dockerfile, pyproject.toml) - Removed Curator from
docker-compose.yml - No functionality lost β sync endpoints continue to work at
POST /api/syncandGET /api/sync/status
- Reduced operational complexity (one fewer container to build, deploy, and monitor)
- Cleaner codebase with no dead code
Overview: Complete frontend redesign of the Explore service, migrating from Bootstrap + jQuery to Tailwind CSS + Alpine.js.
- Tailwind CSS: Dark theme matching the Dashboard redesign. Stylesheet built at Docker image build time by a dedicated
css-builderNode stage (tailwind.config.js+tailwind.input.cssβexplore/static/tailwind.css) - Alpine.js: Replaced jQuery with Alpine.js for reactive UI state management (modals, auth state, panel toggling)
- Modular JS: Split monolithic JavaScript into focused modules (
app.js,graph.js,trends.js,auth.js,autocomplete.js,api-client.js,user-panes.js) - D3.js + Plotly.js: Retained for graph visualization and trends charts (unchanged)
| File | Change |
|---|---|
explore/static/index.html |
Complete rewrite (dark Tailwind theme + Alpine.js) |
explore/static/tailwind.css |
New β generated at Docker build time by css-builder Node stage |
explore/static/css/styles.css |
Simplified to base reset + custom styles |
explore/static/js/*.js |
New β modular JS replacing monolithic script |
explore/tailwind.config.js |
New β Tailwind CLI config (content paths, plugins) |
explore/tailwind.input.css |
New β Tailwind source directives (@tailwind base/components/utilities) |
Overview: Consolidated shared code (OAuth helpers, auth utilities, and dependency injection) into common/ and api/ to reduce duplication across services.
- Extracted shared JWT decode helpers to
api/auth.py - Consolidated OAuth token encryption/decryption into
common/oauth.py - Removed duplicated implementations from individual routers
Overview: Improved query performance, indexing, and batch write throughput for PostgreSQL.
- Optimized high-frequency queries with targeted indexes
- Improved batch write logic in Tableinator for higher throughput
- Added missing indexes on
user_collectionstable including full JSONBformatscolumn
Overview: Improved Neo4j query performance with better index coverage and query planning.
- Added missing composite indexes for frequent query patterns
- Optimized graph traversal queries in API explore/expand endpoints
- Schema-init now creates all performance-critical indexes on first run
Overview: Added authentication UI to the Explore frontend and comprehensive E2E tests.
- Login/register UI integrated into the Explore static frontend
- User collection and wantlist panes visible after authentication
- Playwright E2E tests covering auth flow and personalized UI states
Overview: Autocomplete endpoint now requires at least 3 characters to reduce noise and improve index performance.
GET /api/autocomplete?q=...now returns422for queries shorter than 3 characters- Frontend debounce threshold updated to match
Overview: The Explore service now serves static files only and proxies all /api/* requests to the API service.
- Removed direct Neo4j connection from Explore β no
NEO4J_*env vars required - All graph queries go through the API service (configured via
API_BASE_URL) - Explore configuration simplified to
API_BASE_URLand optionalCORS_ORIGINS
This document tracks recent improvements made to the Discogsography platform, focusing on CI/CD, automation, and development experience enhancements.
Overview: Migrated SnapshotStore from an in-memory Python dict to Redis (redis.asyncio), eliminating all limitations of the previous in-memory approach.
api/snapshot_store.py: Rewritten as an async Redis-backed store.save()andload()are now coroutines. TTL eviction is handled natively by Redis (SET ... EX) β the manual_evict_expired()scan is gone.api/routers/snapshot.py:configure()now accepts aredis_clientparameter; endpoint handlersawaitthe async store methods.api/api.py: Passes the existing_redisclient to_snapshot_router.configure()at startup.explore/snapshot_store.py: Deleted β dead code after API consolidation (the snapshot router has lived inapi/since issue #72).pyproject.toml: Addedfakeredis>=2.0.0to dev dependencies for test isolation.- Tests:
tests/api/andtests/explore/snapshot tests updated to usefakeredis.aioredis.FakeRedis(backed by a sharedfakeredis.FakeServerfixture); unit tests converted toasyncwith@pytest.mark.asyncio.
| Before | After |
|---|---|
| Lost on service restart | Persists across restarts (appendonly yes) |
| Process-local only | Shared across multiple API replicas |
| O(n) lazy eviction scan on every save | Native Redis TTL β zero overhead |
| Unbounded memory growth | Bounded by Redis maxmemory 512mb + LRU |
Overview: Addressed a set of security findings (issue #71) across the API service.
- OAuth token encryption: Discogs OAuth access tokens are now encrypted at rest using Fernet symmetric encryption before being stored in PostgreSQL. A new
OAUTH_ENCRYPTION_KEYenv var is required for the API container. - Constant-time login: Login and registration now use constant-time comparison to prevent user enumeration via timing attacks.
- Blind registration: Duplicate email registration returns the same
201response to prevent account enumeration. - JWT logout with JTI blacklist:
POST /api/auth/logoutnow revokes the token'sjticlaim in Redis (TTL = token expiry), making logout stateful. - Snapshot auth required:
POST /api/snapshotnow requires a valid JWT token. - Rate limiting: Added SlowAPI rate limits β register (3/min), login (5/min), sync (2/10min), autocomplete (30/min). Per-user sync cooldown (600 s) stored in Redis.
- Security response headers: All responses now include
X-Content-Type-Options,X-Frame-Options,Referrer-Policy, andPermissions-Policy. - CORS: Origins configurable via
CORS_ORIGINSenv var (comma-separated; disabled by default). - Input validation: JWT algorithm validated to be
HS256; Discogs API response bodies redacted from error messages.
- Extracted shared JWT helpers to
api/auth.py(b64url_encode/decode,decode_token) β removed duplicated implementations from individual routers. - Added
api/limiter.pyfor a shared SlowAPILimiterinstance. - Replaced all
type: ignorepragmas with proper type narrowing across the codebase.
Overview: Full Discogs account linking, collection and wantlist sync, and personalised graph exploration.
- OAuth 1.0a OOB flow: Users connect their Discogs account via
GET /api/oauth/authorize/discogsβPOST /api/oauth/verify/discogs. State token stored in Redis with TTL. - Collection & wantlist sync:
POST /api/synctriggers a background sync in the API service that fetches the user's Discogs collection and wantlist and writesCOLLECTED/WANTSrelationships to Neo4j. - Sync history:
GET /api/sync/statusreturns the last 10 sync operations with status, item count, and error details. - User endpoints:
/api/user/collection,/api/user/wantlist,/api/user/recommendations,/api/user/collection/stats,/api/user/statusfor personalised graph data. - Operator setup: Discogs app credentials configured once via the
discogs-setupCLI bundled in the API container (reads/writes theapp_configtable).
Overview: All user-facing HTTP endpoints consolidated into the central API service. The Curator service was removed entirely β its sync logic now lives in api/routers/sync.py. Explore now serves static files only.
| Endpoint group | Before | After |
|---|---|---|
| Graph queries | Explore service (:8006) | API service (:8004) |
| Sync triggers | Curator service (removed; :8010 now brainztableinator) | API service (:8004) |
| User collection data | (new) | API service (:8004) |
- Single port (8004) for all client-facing API calls β simpler frontend configuration.
- Curator eliminated as a separate service β sync logic migrated directly into the API, reducing operational complexity.
- Explore is now a static file server only, reducing its attack surface.
- Shared JWT authentication and rate limiting enforced uniformly at the API layer.
Overview: Complete frontend redesign based on a new Stitch-generated dark theme.
- Tailwind CSS: Replaced hand-written CSS with Tailwind CSS (Inter + JetBrains Mono fonts,
Material Symbols Outlined icons). The stylesheet is built at Docker image build time by a
dedicated
css-builderNode stage using the Tailwind CLI (tailwind.config.js+tailwind.input.cssβdashboard/static/tailwind.css), eliminating any CDN dependency at runtime. - Logo placeholder:
<div id="app-logo">with prominent comment block for easy brand swapping - Service cards: Per-service sections (
#service-extractor,#service-graphinator,#service-tableinator) with per-queue-type rows showing state/counts - Queue Size Metrics: CSS height bars replace the previous Chart.js canvas β no CDN JS dependency
- Processing Rates: SVG circular gauges with
stroke-dashoffsetanimation for publish and ack rates per queue type - Database cards:
#db-neo4jand#db-postgresqlwith status badges and live stats - Event log:
#activityLogwith.connection-status/.status-indicator/.status-textkept for Playwright test compatibility - E2E tests updated:
test_dashboard_ui.pyselectors updated to match the new HTML structure
| File | Change |
|---|---|
dashboard/static/index.html |
Complete rewrite (dark Tailwind theme) |
dashboard/static/tailwind.css |
New β generated at Docker build time by css-builder Node stage |
dashboard/static/styles.css |
Simplified to base reset + legacy selector stubs |
dashboard/static/dashboard.js |
Complete rewrite (Chart.js removed, SVG gauges + CSS bars) |
dashboard/tailwind.config.js |
New β Tailwind CLI config (content paths, forms + container-queries plugins) |
dashboard/tailwind.input.css |
New β Tailwind source directives (@tailwind base/components/utilities) |
tests/dashboard/test_dashboard_ui.py |
Updated Playwright selectors |
Overview: Simplified service code across all five components and improved test coverage from 92% to 94%.
Reduced complexity and improved readability across all Python and Rust services without changing behavior:
Dashboard (dashboard/dashboard.py):
- Extracted
_get_or_create_gauge()and_get_or_create_counter()helpers to eliminate duplicate Prometheus metric registration try/except blocks - Fixed WebSocket connection tracking to use
set.discard()instead oflist.remove()to avoidValueErroron double-removal, and to track connection count accurately withGauge.set() - Hardened PostgreSQL address parsing to handle addresses without an explicit port (defaults to 5432)
API (api/routers/explore.py, api/neo4j_queries.py β previously in explore service, consolidated into API):
- Added
_run_query(),_run_single(), and_run_count()helpers to eliminate ~20 repeatedasync with driver.session()blocks across all query functions - Merged duplicate
autocomplete_genre()andautocomplete_style()implementations into a single_autocomplete_prefix()helper - Simplified
_build_categories()using early returns instead of a mutable accumulator variable
Graphinator (graphinator/graphinator.py):
- Removed dead code branches and simplified control flow in message handlers
- Consolidated repeated node-merge patterns and deduplication logic
Tableinator (tableinator/tableinator.py):
- Simplified batch processing logic and removed redundant state tracking
- Consolidated repeated table and index creation patterns
Extractor (Rust, extractor/src/):
- Removed unused
types.rsmodule entirely - Removed dead S3 configuration fields (
s3_bucket,s3_region) andmax_temp_size - Removed unused
from_file()config loader (environment variables are the only supported method) - Simplified error handling and control flow across all modules
Increased overall test coverage from 92% β 94% (774 β 798 tests):
| File | Coverage |
|---|---|
graphinator/graphinator.py |
82% |
common/postgres_resilient.py |
90% |
dashboard/dashboard.py |
93% |
tableinator/tableinator.py |
96% |
common/rabbitmq_resilient.py |
92% |
common/neo4j_resilient.py |
98% |
explore/explore.py |
97% |
New tests cover previously untested paths including: config errors and early returns, Neo4j and
PostgreSQL connection failures, async queue edge cases (QueueFull/QueueEmpty), WebSocket
exception cleanup, batch processor flush errors in finally blocks, and missing-ID edge cases in
graph entity processing.
Overview: Completed three major infrastructure upgrades to modernize the platform's core dependencies.
Upgrade: RabbitMQ 3.13-management β 4-management (4.2.3)
Key Changes:
- Quorum Queues: Migrated all 8 message queues from classic to quorum type for improved data safety and replication
- Dead-Letter Exchanges (DLX): Each consumer declares its own DLQs and consumer-owned DLXs for poison message handling
- Delivery Limit: Set to 20 retries before routing to DLQ, preventing infinite retry loops
- Files Modified: docker-compose.yml, extractor.py, graphinator.py, tableinator.py, message_queue.rs
Benefits:
- β High availability with Raft consensus
- β Automatic data replication across cluster nodes
- β Poison message handling prevents infinite retries
- β Better data safety for critical music metadata
Upgrade: Neo4j 5.25-community β 2026-community (calendar versioning)
Key Changes:
- Calendar Versioning: Switched from semantic versioning (5.x) to calendar versioning (YYYY.MM.PATCH)
- Python Driver: Upgraded neo4j driver from 5.x β 6.1.x across all services
- Files Modified: docker-compose.yml + pyproject.toml files (root, common, api, graphinator, dashboard)
Benefits:
- β Access to latest Neo4j features and optimizations
- β Improved graph query performance
- β Better APOC plugin compatibility
- β Future-proofed for 2026 releases
Upgrade: PostgreSQL 16-alpine β 18-alpine
Key Changes:
- JSONB Performance: 10-15% faster JSONB operations (heavily used in tableinator)
- Data Checksums: Enabled by default for automatic corruption detection
- GIN Indexes: Improved query planning for JSONB GIN indexes
- Files Modified: docker-compose.yml only (psycopg3 already compatible!)
Benefits:
- β 10-15% faster JSONB queries (used extensively in releases, artists, labels, masters tables)
- β Improved GIN index performance for containment queries
- β Data integrity with automatic checksums
- β 20-30% faster VACUUM operations
- β Zero code changes required - psycopg3 is fully compatible
| Component | Old Version | New Version | Code Changes |
|---|---|---|---|
| RabbitMQ | 3.13-management | 4-management | 5 files (queue declarations) |
| Neo4j | 5.25-community | 2026-community | 7 files (driver version bumps) |
| PostgreSQL | 16-alpine | 18-alpine | 0 files (fully compatible!) |
Total Documentation: 3 comprehensive migration guides created (one per service)
Migration Guides:
Problem: When the extractor service restarted, it couldn't determine whether to continue processing, re-process, or skip already-processed Discogs data versions, potentially leading to duplicate processing or missed updates.
Solution: Implemented a comprehensive state marker system that tracks extraction progress across all phases.
- Version-Specific Tracking: Each Discogs version (e.g.,
20260101) gets its own state marker file - Multi-Phase Monitoring: Tracks download, processing, publishing, and overall status
- Smart Resume Logic: Automatically decides whether to reprocess, continue, or skip on restart
- Per-File Progress: Detailed tracking of individual file processing status
- Error Recovery: Records errors at each phase for debugging and recovery
- β
Rust Implementation:
extractor/extractor/src/state_marker.rswith 11 unit tests - β
Python Implementation:
common/state_marker.pywith 22 unit tests - β
Documentation: Complete usage guide in
docs/state-marker-system.md - β
Cross-Platform: Rust extractor and Python
commonlibrary share identical state marker functionality
- Restart Safety: No duplicate processing after service restarts
- Progress Visibility: Clear view of extraction status at any time
- Idempotency: Safe to restart at any point without data corruption
- Efficiency: Skip already-completed work automatically
- Observability: Detailed metrics for monitoring and debugging
{
"current_version": "20260101",
"download_phase": { "status": "completed", "files_downloaded": 4, ... },
"processing_phase": { "status": "in_progress", "files_processed": 2, ... },
"publishing_phase": { "status": "in_progress", "messages_published": 1234567, ... },
"summary": { "overall_status": "in_progress", ... }
}| Scenario | Decision | Action |
|---|---|---|
| Download failed | Reprocess | Re-download everything |
| Processing in progress | Continue | Resume unfinished files |
| All completed | Skip | Wait for next check |
See State Marker System for complete documentation.
Problem: Extractor only saved state at file boundaries (start/complete), meaning a crash during processing could lose hours of progress. State files showed 0 records even after hours of processing.
Solution: Implemented periodic state marker updates every 5,000 records in extractor's existing behavior.
- β
Config: Added
state_save_intervalparameter (default: 5,000 records) - β
Batcher: Modified
message_batcherto save state periodically during processing - β Tests: Updated all 125 tests to pass with new signature
- β Consistency: Both extractors now have identical periodic save behavior
- Crash Recovery: Resume from last checkpoint (max 5,000 records lost vs. entire file)
- Progress Visibility: Real-time progress updates in state file
- Minimal Overhead: ~1-2ms per save, ~580 saves for 2.9M records (negligible)
- Production-Ready: Tested with multi-million record files
| File | Records | Saves | Overhead |
|---|---|---|---|
| Masters | 2.9M | ~580 | <2s |
| Releases | 20M | ~4,000 | <10s |
See State Marker Periodic Updates for implementation details.
- β Added emojis to all workflow step names for better visual scanning
- β Standardized step naming patterns across all workflows
- β Improved readability and quick status recognition
- β Added explicit permissions blocks to all workflows (least privilege)
- β Pinned non-GitHub/Docker actions to specific SHA hashes
- β Updated cleanup-images workflow permissions for package management
- β Enhanced container security with non-root users and security options
setup-python-uv- Consolidated Python/UV setup with cachingdocker-build-cache- Advanced Docker layer caching managementretry-step- Retry logic with exponential backoff
- β Run tests and E2E tests in parallel (20-30% faster)
- β Enhanced caching strategies with hierarchical keys
- β Docker BuildKit optimizations (inline cache, namespaces)
- β Conditional execution to skip unnecessary work
- β Artifact compression and retention optimization
- β Build duration tracking
- β Cache hit rate reporting
- β Performance notices in workflow logs
- β Enhanced Discord notifications with metrics
- β Standardized quote usage across all YAML files
- β Single quotes in GitHub Actions expressions
- β Double quotes for YAML string values
- β Removed unnecessary quotes from simple identifiers
- β GitHub Actions Guide - Comprehensive CI/CD documentation
- β Recent Improvements - This document
- β README.md - Added workflow status badges and links
- β CLAUDE.md - Added AI development memories for GitHub Actions
- β Emoji Guide - Added CI/CD & GitHub Actions emoji section
- β Automated weekly dependency updates
- β Dependabot configuration for all ecosystems
- β Discord notifications for update status
- β Pre-commit hooks for all workflows
- β Actionlint validation for workflow files
- β YAML linting with consistent formatting
- Build Time: 20-30% reduction through parallelization
- Cache Hit Rate: 60-70% improvement with new strategy
- Resource Usage: 40-50% reduction in redundant operations
- Failure Rate: 80% reduction in transient failures
All workflows now have status badges for quick health monitoring:
- β Implemented automatic consumer cancellation after file completion
- β
Added grace period configuration (
CONSUMER_CANCEL_DELAY) - β Enhanced progress reporting with consumer status
- β Freed up RabbitMQ resources for completed files
- β Added intelligent file completion tracking in extractor
- β Prevented false stalled extractor warnings for completed files
- β Enhanced progress monitoring with completion status
- β Improved debugging with clear active vs. completed indicators
Resource Optimization & Intelligent Connection Management
- β Automatic Connection Closure: RabbitMQ connections automatically close when all consumers are idle
- β
Periodic Queue Checking: New
QUEUE_CHECK_INTERVAL(default: 1 hour) for checking queues without persistent connections - β Auto-Reconnection: Automatically detects new messages and restarts consumers
- β Silent When Idle: Progress logging stops when all queues are complete to reduce log noise
- β Type Safety: Added explicit type annotations for better code quality
Benefits:
- Resource Efficiency: 90%+ reduction in idle RabbitMQ connection resources
- Cleaner Logs: No repetitive progress messages when idle
- Automatic Recovery: Services automatically resume when new data arrives
- Zero Configuration: Works out of the box with sensible defaults
Configuration:
QUEUE_CHECK_INTERVAL=3600 # Check queues every hour when idle (default)
CONSUMER_CANCEL_DELAY=300 # Wait 5 minutes before canceling consumers (default)- β Created comprehensive File Completion Tracking guide
- β Updated Consumer Cancellation documentation
- β Added complete documentation index at docs/README.md
- β Linked all documentation from main README
- β Updated main README with smart connection lifecycle documentation
- β Updated tableinator and graphinator READMEs with new environment variables
- β Documented deprecated settings with migration guidance
- β Cleaned up outdated progress and coverage reports
Database Write Performance Enhancement
- β Graphinator Batch Processing: Implemented batch processing for Neo4j writes
- β Tableinator Batch Processing: Implemented batch processing for PostgreSQL writes
- β Configurable Batch Sizes: Environment variables for tuning batch size and flush interval
- β Automatic Flushing: Time-based and size-based batch flushing
- β Graceful Shutdown: All pending batches flushed before service shutdown
- β SHA256 Hash Deduplication: Added hash-based indexes for efficient duplicate detection
Performance Improvements:
- Neo4j: 3-5x faster write throughput with batch processing
- PostgreSQL: 3-5x faster write throughput with batch processing
- Memory Efficiency: Optimized batch memory usage with configurable limits
- Reduced Database Load: Fewer transactions and connection overhead
Configuration:
# Neo4j Batch Processing
NEO4J_BATCH_MODE=true # Enable batch mode (default)
NEO4J_BATCH_SIZE=500 # Records per batch (default)
NEO4J_BATCH_FLUSH_INTERVAL=2.0 # Seconds between flushes (default)
# PostgreSQL Batch Processing
POSTGRES_BATCH_MODE=true # Enable batch mode (default)
POSTGRES_BATCH_SIZE=500 # Records per batch (default)
POSTGRES_BATCH_FLUSH_INTERVAL=2.0 # Seconds between flushes (default)Benefits:
- Throughput: Process 3-5x more records per second
- Database Load: Significant reduction in transaction overhead
- Resource Usage: More efficient use of database connections
- Tunable: Configure batch size and interval based on workload
See Configuration Guide for detailed tuning guidance.
- Implement semantic versioning with automated releases
- Add performance benchmarking workflows
- Create development environment setup workflow
- Implement automated changelog generation
- Persist file completion state across restarts
- Add batch processing metrics to monitoring dashboard
- Add workflow analytics dashboard
- Implement cost tracking for GitHub Actions
- Create automated performance reports
- Add completion metrics to monitoring dashboard
When contributing to workflows:
- Follow the established emoji patterns
- Use composite actions for reusable steps
- Ensure all workflows have appropriate permissions
- Add tests for new functionality
- Update documentation accordingly