CEMS Architecture

Overview

Everything lives in PostgreSQL with pgvector — no Redis, Qdrant, or external vector DB needed.

Table	Purpose
`memory_documents`	Documents with content, user/team scoping, categories, tags, soft-delete
`memory_chunks`	Chunked content with 1536-dim vector embeddings (HNSW index) + full-text search (tsvector)
`users` / `teams`	Authentication via bcrypt-hashed API keys

text-embedding-3-small via OpenRouter (1536 dimensions). Batch support for bulk operations.

Multi-stage retrieval pipeline:

Query → Understanding → Synthesis → HyDE → Retrieval → RRF Fusion → Filtering → Scoring → Assembly → Results

Stage	What it does
1. Query Understanding	LLM routes to vector or hybrid strategy
2. Query Synthesis	LLM expands query into 2-5 search terms
3. HyDE	Generates hypothetical ideal answer for better matching
4. Candidate Retrieval	pgvector HNSW (vector) + tsvector (BM25 full-text)
5. RRF Fusion	Reciprocal Rank Fusion combines result lists
6. Relevance Filtering	Removes results below threshold
7. Scoring Adjustments	Time decay, priority boost, project-scoped boost
8. Token-Budgeted Assembly	Greedy selection within token budget (default: 2000)

Search modes: vector (fast, 0 LLM calls), hybrid (thorough, 3-4 LLM calls), auto (smart routing).

Scheduled via APScheduler — runs in-process, no separate worker needed.

Job	Schedule	Purpose
Consolidation	Nightly 3 AM	Merge semantic duplicates (cosine >= 0.92)
Observation Reflection	Nightly 3:30 AM	Condense observations per project
Summarization	Weekly Sun 4 AM	Compress old memories, prune stale
Re-indexing	Monthly 1st 5 AM	Rebuild embeddings, archive dead memories

The observer (cems-observer) runs as a background process on the client machine:

Polls ~/.claude/projects/*/ JSONL transcript files every 30 seconds
When 50KB of new content accumulates, sends it to the server
Server extracts high-level observations via Gemini 2.5 Flash
Observations like "User deploys via Coolify" or "Project uses PostgreSQL" are stored as memories

The MCP wrapper (port 8766) exposes CEMS as an MCP server with 6 tools:

Tool	Description
`memory_add`	Store a memory
`memory_search`	Search with the full retrieval pipeline
`memory_get`	Retrieve full document by ID
`memory_forget`	Delete or archive a memory
`memory_update`	Update memory content
`memory_maintenance`	Trigger maintenance jobs