TriMem/
├── main.py # entry — wires the three tiers together
├── config.py # all knobs (LLM, embedding, parallelism, profiles)
├── core/
│ ├── memory_builder.py # window → atomic facts (compression + synthesis)
│ ├── hybrid_retriever.py # planning + parallel multi-view retrieval
│ ├── answer_generator.py # factual / inference prompt routing, profile-aware
│ └── profile_manager.py # progressive per-entity profile builder
├── database/
│ ├── vector_store.py # LanceDB-backed atomic-fact store
│ ├── profile_store.py # SQLite KV store for persona profiles
│ └── dialogue_store.py # raw-dialogue store with context windows
├── models/memory_entry.py # MemoryEntry / Dialogue pydantic models
├── utils/ # LLM client + embedding model
├── test_locomo10.py # LoCoMo benchmark runner
└── polish_prompts_with_claude_code.md # TextGrad-style prompt evolution loop
Requirements: Python 3.10+, an OpenAI-compatible API key.
git clone https://github.com/tmlr-group/TriMem.git
cd TriMem
pip install -r requirements.txtEdit config.py to set your provider, key, and model:
OPENAI_API_KEY = "sk-..."
OPENAI_BASE_URL = "https://openrouter.ai/api/v1" # or None for OpenAI default
LLM_MODEL = "openai/gpt-4.1-mini"
EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B" # local, no API call
EMBEDDING_DIMENSION = 1024
ENABLE_PROFILES = True # turn on the persona tier
PROFILE_STORE_PATH = "./profile_store.db"from main import TriMemSystem
system = TriMemSystem(clear_db=True)
# Ingest dialogues
system.add_dialogue("Alice", "Bob, let's meet at Starbucks tomorrow at 2pm",
"2025-11-15T14:30:00")
system.add_dialogue("Bob", "Sure, I'll bring the market analysis report",
"2025-11-15T14:31:00")
# Flush the buffered window (atomic facts + profiles are written here)
system.finalize()
# Factual question — answered from atomic facts + source turns
print(system.ask("When and where will Alice and Bob meet?"))
# Inference question — also draws on Alice/Bob's persona profiles
print(system.ask("Would Bob enjoy a strategy board game?"))What happens internally on finalize():
- The dialogue window is compressed into atomic memory entries and written to the vector store.
ProfileManager.update_profiles(entries)groups the new entries bypersonsand updates each persona profile with one LLM call.- Raw dialogues remain in
DialogueStoreand are reachable viasource_dialogue_ids.
On ask(...):
- The hybrid retriever plans, generates queries, and merges semantic + lexical + symbolic results.
- Persons appearing in the retrieved facts are looked up in the profile store (no LLM call).
- The answer generator chooses factual or inference mode based on the question, then synthesizes the final answer using all three tiers.
The most relevant settings (full list in config.py):
| Section | Setting | Purpose |
|---|---|---|
| LLM | LLM_MODEL, OPENAI_BASE_URL, ENABLE_THINKING, USE_STREAMING |
Provider + decoding |
| Memory build | WINDOW_SIZE, OVERLAP_SIZE, SOURCE_CONTEXT_WINDOW |
Sliding window + how many surrounding turns to pull when answering |
| Retrieval | SEMANTIC_TOP_K, KEYWORD_TOP_K, STRUCTURED_TOP_K |
Top-k per view |
| Planning | ENABLE_PLANNING, ENABLE_REFLECTION, MAX_REFLECTION_ROUNDS |
Intent-aware retrieval planning loop |
| Parallelism | ENABLE_PARALLEL_PROCESSING, MAX_PARALLEL_WORKERS, ENABLE_PARALLEL_RETRIEVAL, MAX_RETRIEVAL_WORKERS |
Window/query concurrency |
| Profiles | ENABLE_PROFILES, PROFILE_STORE_PATH |
Toggle the persona tier |
LoCoMo benchmark runner lives in test_locomo10.py:
python test_locomo10.py # full benchmark
python test_locomo10.py --num-samples 5 # quick subset
python test_locomo10.py --result-file my.json # custom output pathEnd-to-end quality of TriMem is dominated by two prompts:
- P_ext — the fact-extraction prompt that turns a dialogue window into atomic memory units (in
core/memory_builder.py). - P_prof — the entity profile construction prompt that updates each persona profile from newly extracted facts (in
core/profile_manager.py).
Rather than hand-tuning these indefinitely, TriMem refines them with a TextGrad-style loop. The evaluation result JSON plays the role of the loss, and a strong LLM (we use Claude Opus 4.6) plays the role of the textual backward pass — diagnosing systematic failure patterns and rewriting P_ext / P_prof in place. The full procedure lives in polish_prompts.md.
If you find this environment useful, please consider citing our work:
@article{sun2026trimem,
title = {Rethinking How to Remember: Beyond Atomic Facts in Lifelong LLM Agent Memory},
author = {Jingwei Sun and Jianing Zhu and Jiangchao Yao and Tongliang Liu and Bo Han},
journal = {arXiv preprint arXiv:2605.19952},
year = {2026}
}
Dependencies: Qwen3-Embedding for retrieval, LanceDB for vector storage, LoCoMo for evaluation.