We've implemented cascading temporal summaries for AI agent memory, based on Anthropic's context engineering best practices. This provides agents with multi-scale historical context without overwhelming the LLM's attention budget.
-
Immediate Window (default: 64 memories)
- Most recent memories in full detail
- Shown verbatim in agent prompts
- Acts as few-shot examples for base models
- Provides rich context for decision-making
-
Short-term Summary (65-128 from end)
- LLM-generated 2-3 sentence summary
- Covers memories that aged out of immediate window
- Provides mid-range historical context
- Generated at 129 memories, then every ~64 memories thereafter
- Updates in clean chunks (not on every single memory)
- Historical versions saved with timestamps for future analysis
-
Long-term Summary (all older memories)
- Progressively compacted summary
- Waterfalls older short-term summaries together
- Preserves distant historical patterns
- Traces back to earliest memories (never loses history)
- Persisted to vault (user://agents/{name}/summaries/)
Every compaction cycle (triggered at 129 memories, then every ~64 thereafter):
1. Long-term summary ← LLM(short_term_summary + longterm_summary)
2. Short-term summary ← LLM(memories outside immediate window)
3. Immediate context ← 64 newest memories (unchanged)
4. Save both summaries to vault for persistence
5. Save timestamped copy of short-term summary for historical analysis
This creates a cascading effect where:
- Recent details stay crisp (64 memories in full)
- Mid-range context gets summarized (next 64 memories → paragraph)
- Distant history gets progressively compressed (waterfall squashing)
- Compaction triggers at 129 memories (immediate + recent windows filled)
- Then every ~64 new memories (not on every single memory)
- Long-term summary traces back to earliest memories (never loses history)
- Summaries persist across restarts (vault storage)
- Historical short-term summaries preserved with timestamps
All settings in vault/config/memory_compaction.md:
immediate_window: 64 - Most recent memories in full detailrecent_window: 64 - Size of short-term summary windowprofile: "summarizer" - LLM profile for summaries
Runs automatically at 129 memories, then every ~64 memories thereafter:
First compaction: immediate_window + recent_window = 128 + 1 = 129 memories
Next compactions: every recent_window (64) new memories
When an agent reaches 129 memories, compaction triggers asynchronously:
- Waterfalls old short-term summary into long-term (progressive squashing)
- Generates new short-term summary from memories outside immediate window
- Saves both summaries to vault for persistence
- Saves timestamped copy of short-term summary for historical analysis
Then repeats every ~64 new memories (193, 257, 321...).
This ensures agents get summarization in clean chunks without regenerating summaries on every single new memory, while maintaining long-term summaries that trace back to their earliest memories.
Force compaction regardless of threshold:
@compact-memories
View memory stats and summaries:
@memory-status
-
Core/components/memory.gd
- Added
recent_summary,longterm_summary,last_compaction_timeproperties - Implemented
compact_memories_async()with waterfall logic - Added
get_recent_context()method returning summaries + immediate memories - Added
should_compact()threshold check - Integrated compaction trigger in
add_memory()
- Added
-
Core/components/thinker.gd
- Updated
_build_context()to useget_recent_context() - Modified
_construct_prompt()to include summaries in prompt - Added documentation about multi-scale memory context
- Updated
-
Core/components/actor.gd
- Added
@compact-memoriescommand - Command manually triggers compaction
- Added
-
vault/config/memory_compaction.md
- Configuration file with window sizes and LLM profile
Summarization prompts optimized for base models (like comma-v0.1-2t):
"Summarize these memories in 2-3 sentences:\n\n"
+ [memory contents]
+ "\nSUMMARY:\n"- Direct task framing (no chat wrapper)
- Uses
/api/generateendpoint - Stop tokens configured server-side
- Works equally well with instruct/chat models
Agent prompts now include summaries before the transcript:
You are AgentName. [profile]
BASIC COMMANDS:
- [command list]
## Relevant Private Notes
[contextual notes]
## Older Memories (Summary)
[longterm_summary if exists]
## Recent Past (Summary)
[recent_summary if exists]
---
[immediate memories in full detail]
---
You are AgentName in LocationName. [description]
Exits: [exits]
Also here: [occupants]
~AgentName>
This structure:
- Provides historical context via summaries
- Keeps recent memories as few-shot examples
- Maintains base model compatibility
- Prevents context rot from excessive history
To test the system:
-
Create an agent and have them accumulate 129+ memories (immediate + recent windows)
-
Watch for first compaction logs:
[Memory] AgentName: Starting compaction (129 memories, 64 immediate, 64 recent) [Memory] AgentName: Short-term summary updated (127 chars) [Memory] AgentName: Compaction complete (first summary) -
Continue adding memories (to 193+) to see waterfall compaction:
[Memory] AgentName: Starting compaction (193 memories, 64 immediate, 64 recent) [Memory] AgentName: Long-term summary updated (215 chars) [Memory] AgentName: Short-term summary updated (134 chars) [Memory] AgentName: Compaction complete (waterfall) -
Use
@memory-statusto verify summaries exist -
Check vault folder for timestamped short-term summaries (recent-YYYYMMDD-HHMMSS.md)
-
Restart the agent to verify summaries load from vault
-
Use
@compact-memoriesto manually trigger
Short-term summaries are now embedded and stored in VectorStore, enabling semantic retrieval of relevant past experiences:
- Embedding: When a short-term summary is generated, it's automatically embedded using Ollama
- Storage: Stored in VectorStore (same as notes) with metadata:
type: "memory_summary"timestamp: When summary was createdmemory_range_start/end: Which memories it covers (e.g., 65-128)summary_text: The actual summary content
- Retrieval:
get_relevant_summaries_for_context()finds summaries semantically similar to current situation - Display: Thinker prompts include "RELATED PAST EXPERIENCES" section with top 2 relevant summaries
- Better than individual memories: Summaries are LLM-compressed coherent narratives (~64 memories → 2-3 sentences)
- Complements notes: Notes = explicit knowledge, Summaries = experiential history
- Contextual recall: Agent can remember "I've been in this situation before" even if it's not in immediate window
- Token efficient: Summaries are already compressed, perfect for RAG
POTENTIALLY RELATED PRIVATE NOTES:
- garden: I planted some flowers here last week
RELATED PAST EXPERIENCES:
- I explored the garden and greenhouse, discovering rare plants and tools (memories 65-128)
- I spent time with friends in the courtyard, playing games and sharing stories (memories 129-192)
LONG TERM MEMORY SUMMARY
...
- Attention Budget: Summaries are token-efficient vs. full memories
- Historical Context: Agents remember distant events via summaries
- Progressive Disclosure: Most recent = most detail, older = compressed
- Clean Chunks: Summaries regenerate every ~64 memories (not per-memory)
- Progressive Squashing: Long-term summary traces back to earliest memories
- Persistence: Summaries survive restarts via vault storage
- Historical Record: Timestamped short-term summaries preserved for analysis
- RAG-Enabled: Summaries embedded for semantic retrieval of past experiences
- Base Model Support: Works with comma-v0.1-2t and other base models
- Universal: Effective for instruct/chat models too
- Automatic: Runs in background, no manual intervention needed
Potential improvements:
- Semantic clustering before summarization
- Different summary lengths based on importance
- User-configurable compaction strategies
- Summary quality metrics and validation
- Periodic re-summarization of long-term summary
- Anthropic: Effective context engineering for AI agents
- Project CLAUDE.md: Context engineering patterns
- Memory component implementation:
Core/components/memory.gd:407-580 - Thinker integration:
Core/components/thinker.gd:259-401