Intelligent context compaction reduces prompt size by replacing stale tool outputs with explicit stubs, helping stay within token limits while preserving conversation integrity.
Context compaction is an automatic feature that identifies and removes superseded tool outputs from message histories before they are sent to LLM backends. When the LLM reads the same file multiple times, runs the same command, or performs repeated searches, compaction replaces older outputs with concise stubs that explain what was removed. This reduces token usage and helps prevent context window overflow while maintaining full transparency about what content was compacted.
The feature is particularly useful for:
- Long-running agent conversations that repeatedly access the same files
- Code editing workflows where the agent views the same file across multiple iterations
- Debugging sessions with repeated command executions
- Large codebases with frequent file reads
The compaction mechanism follows these steps:
-
Detect Stale Tool Results: Analyzes message history to identify tool outputs that have been superseded by newer results for the same resource. A "resource" is typically a file path, command signature, or search query.
-
Track Resource Identity: Correlates tool results across the conversation using normalized identifiers (file paths, command signatures, etc.). For file reading tools, pagination parameters (offset/limit) are included to distinguish reads of different file portions.
-
Preserve Latest Results: Always keeps the most recent tool result for each resource intact, ensuring the LLM has the latest state.
-
Replace With Stubs: Replaces older, stale outputs with explicit stub messages that clearly state what was removed and why:
[COMPACTED] Previous output for /path/to/file.py (2500 bytes) was removed because a newer result for this resource exists later in the conversation. -
Transparent to LLM: Stub messages are visible to the LLM and explicitly explain the compaction, maintaining conversation context while reducing token count.
-
Fail-Open: If compaction encounters any error, the original message history is forwarded unchanged with appropriate logging.
Enable compaction with minimal configuration in config.yaml:
compaction:
enabled: true
token_threshold: 100000 # Start compacting at 100K tokens| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool | false |
Master switch for compaction |
token_threshold |
int | 100000 |
Estimated token count that triggers compaction |
| Field | Type | Default | Description |
|---|---|---|---|
token_threshold |
int | 100000 |
Threshold that triggers compaction. When the estimated token count exceeds this value, compaction is performed. |
max_tokens |
int | 150000 |
Hard limit. If compaction cannot reduce the token count below this value, a warning is logged. |
Token Budget Example:
compaction:
enabled: true
token_threshold: 80000 # Start compacting at 80K tokens
max_tokens: 120000 # Warn if can't reduce below 120KControl which tool types are eligible for compaction using categories and allow/deny lists.
Available Tool Categories:
file_read: File reading operations (read_file, cat, file_read, etc.)file_write: File writing operations (write_file, edit_file, apply_diff, etc.)view_file: File viewing operations (view_file, view_file_outline, etc.)command_execution: Command execution (run_command, execute_command, bash, terminal, etc.)search: Search operations (grep_search, codebase_search, ripgrep, find, etc.)list_directory: Directory listing (list_dir, ls, etc.)test_execution: Test execution (run_pytest, run_tests, pytest, etc.)other: Any other tool type
Policy Fields:
| Field | Type | Default | Description |
|---|---|---|---|
allowed_tool_categories |
list[str] | [] |
Categories eligible for compaction. Empty list = all categories allowed (except those in denied list). |
denied_tool_categories |
list[str] | [] |
Categories never compacted. Takes precedence over allowed list. |
Examples:
Compact only file reads and searches:
compaction:
enabled: true
token_threshold: 100000
allowed_tool_categories:
- file_read
- view_file
- searchCompact everything except file writes and command execution:
compaction:
enabled: true
token_threshold: 100000
denied_tool_categories:
- file_write
- command_execution| Field | Type | Default | Description |
|---|---|---|---|
max_stubs_per_resource |
int | 1 |
Maximum stub messages to keep per resource (future use) |
preserve_last_n_results |
int | 1 |
Always keep this many recent results per resource |
stub_template |
string | Default template | Template for generating stub messages |
redact_resource_identifiers |
bool | false |
Redact file paths/commands in stubs (see below) |
The redact_resource_identifiers option controls whether sensitive information appears in compaction stubs.
Default (recommended for most cases):
compaction:
enabled: true
redact_resource_identifiers: false # Keep paths visible for debuggingSecurity-sensitive environments:
compaction:
enabled: true
redact_resource_identifiers: true # Redact paths containing secretsTrade-off Summary:
| Setting | Security | Debuggability | Best For |
|---|---|---|---|
false (default) |
Medium | High | Development, debugging, non-sensitive environments |
true |
High | Medium | Production with sensitive file paths or API keys |
What gets redacted:
When enabled, stubs apply the project's redact_text() function to resource identifiers, which redacts:
- API keys matching patterns like
ak-ant...,sk-...,ak-proj... - Sensitive strings detected by the redaction rules
- File paths containing these patterns
Example outputs:
With redact_resource_identifiers: false:
[COMPACTED] Previous output for /home/user/project/config.py (2500 bytes) was removed because a newer result for this resource exists later in the conversation.
With redact_resource_identifiers: true:
[COMPACTED] Previous output for /home/user/***/***.py (2500 bytes) was removed because a newer result for this resource exists later in the conversation.
Enable or disable compaction via command line:
# Enable compaction regardless of config
./.venv/Scripts/python.exe -m src.core.cli --enable-context-compaction
# Override doesn't exist for disabling; use config insteadNote: CLI override currently only supports enabling the feature. Use config.yaml to disable compaction.
Currently, no environment variables are supported for compaction configuration. All settings must be specified in config.yaml.
When compaction occurs, metrics are included in structured logs:
{
"compacted_messages": 5,
"bytes_saved": 2500,
"tokens_saved_estimate": 625,
"original_messages": 12,
"was_compacted": true,
"failed_open": false,
"stale_resources": "view_file:/a.py,view_file:/b.py"
}Available Metrics:
| Metric | Type | Description |
|---|---|---|
compaction_messages_compacted |
int | Number of messages replaced with stubs |
compaction_bytes_saved |
int | Approximate bytes reduced by compaction |
compaction_tokens_saved_estimate |
int | Estimated tokens saved (roughly bytes / 4) |
compaction_original_count |
int | Number of messages before compaction |
compaction_stale_resources_count |
int | Number of unique resources detected as stale |
compaction_failed_open |
int | 1 if compaction failed and returned original messages, 0 otherwise |
Example Log Output:
2025-12-28 23:40:39 [INFO] Compacted conversation history (5 messages, 2500 bytes saved, ~625 tokens saved)
Structured Log Fields:
{
"compacted_count": 5,
"bytes_saved": 2500,
"tokens_saved_estimate": 625,
"original_message_count": 12,
"was_compacted": true,
"failed_open": false,
"stale_resources": "view_file:/a.py,view_file:/b.py"
}If compaction cannot reduce the token count below max_tokens, a warning is logged:
2025-12-28 23:40:39 [WARNING] Context compaction completed but token estimate (125000) still exceeds max_tokens (120000)
What the warning means:
- Compaction ran and removed all possible stale tool outputs
- The resulting token count is still above the
max_tokenslimit - The request will proceed, but may risk context window overflow at the backend
Actions to take when warnings appear:
- Increase
max_tokens: Allow a higher threshold if the backend supports it - Decrease
token_threshold: Trigger compaction earlier to allow more aggressive reduction - Add tools to
allowed_tool_categories: Enable compaction for more tool types - Review conversation size: Consider session management or manual history pruning
- Check backend context limits: Ensure
max_tokensmatches the backend's actual limits
Example configuration to address warnings:
compaction:
enabled: true
token_threshold: 60000 # Start compacting earlier
max_tokens: 140000 # Higher limit if backend supports itCompaction is transparent in wire captures:
- Wire captures record the post-compaction request exactly as sent to the backend
- Stubs are visible in the captured messages
- Original content is not recorded (it was already replaced)
- Use wire captures to verify what the backend actually received
Inspecting compaction in wire captures:
./.venv/Scripts/python.exe scripts/inspect_cbor_capture.py var/wire_captures_cbor/<capture-file>Look for stub messages like:
role: tool
content: "[COMPACTED] Previous output for /path/file.py (2500 bytes) was removed..."
Symptom: You see warnings about token estimates exceeding max_tokens after compaction.
Cause: Compaction removed all stale tool outputs, but the remaining messages still exceed the hard limit.
Solutions:
-
Increase
max_tokenslimit (if backend supports it):compaction: enabled: true token_threshold: 100000 max_tokens: 180000 # Increased from 150000
-
Decrease
token_thresholdto start compacting earlier:compaction: enabled: true token_threshold: 70000 # Decreased from 100000 max_tokens: 150000
-
Add more tools to
allowed_tool_categories:compaction: enabled: true allowed_tool_categories: - file_read - view_file - search - test_execution # Add this - list_directory # Add this
-
Review and reduce conversation size:
- Implement session management limits
- Consider manually summarizing long conversations
- Use tools that avoid repeated large file reads
-
Verify backend context limits: Ensure
max_tokensmatches your backend's actual context window size.
Typical Performance Impact:
- Latency: Compaction adds <10ms p95 latency (typically 1-5ms)
- Token estimation: Uses character count / 4 approximation (fast, no blocking operations)
- Resource tracking: O(n) where n is the number of tool messages
- Memory: Minimal overhead; only tracks resource identities, not content
Why it's fast:
- Token estimation uses simple string length division
- Resource identity extraction uses cached lookups
- No external API calls or blocking I/O
- Early exit if no stale outputs detected
When to disable compaction:
- If every millisecond matters and compaction overhead is unacceptable
- If you never exceed token thresholds
- If your conversations never have repeated tool operations
Check logs for compaction events:
# Enable DEBUG logging to see detailed compaction information
# In config.yaml:
logging:
level: "DEBUG"
# Then look for messages like:
# "History compaction triggered: token_estimate=110000 > threshold=100000"
# "Detected 3 stale resources: view_file:/a.py, read_file:/b.py, grep_search:pattern"**Verify which resources are being compacted:**
Review structured log output for the `stale_resources` field:
```json
{
"stale_resources": "view_file:/src/main.py,view_file:/src/util.py,read_file:config.json"
}Use redact_resource_identifiers: false for debugging:
compaction:
enabled: true
redact_resource_identifiers: false # Ensure full paths are visibleExample debug session:
- Enable DEBUG logging
- Set
redact_resource_identifiers: false - Run your conversation
- Look for log messages showing:
- Which resources were detected as stale
- How many bytes/tokens were saved
- Which tool results were compacted
Verify stub content:
Inspect wire captures or logs to see stub messages:
./.venv/Scripts/python.exe scripts/inspect_cbor_capture.py <capture-file>Look for tool result messages with [COMPACTED] prefix.
Symptom: Compaction appears enabled but no compaction occurs.
Check list:
-
Verify enabled flag:
compaction: enabled: true # Must be explicitly true
-
Check token threshold:
# Look for log messages like: # "History compaction skipped: token_estimate=50000 < threshold=100000"
If you're under the threshold, compaction won't run. Try lowering
token_thresholdfor testing. -
Verify stale outputs exist:
- Compaction only replaces superseded tool outputs
- If each resource appears only once, there's nothing to compact
- Check for repeated file reads, searches, or commands
-
Check allow/deny policies:
compaction: allowed_tool_categories: [] # Empty = allow all (except denied) denied_tool_categories: [] # Empty = deny none
If tools are in
denied_tool_categories, they won't be compacted. -
Look for error logs:
# Search for: # "History compaction failed"
If compaction fails, it fails-open and continues with original messages.
Minimal configuration to enable compaction with default settings:
# config.yaml
compaction:
enabled: true
token_threshold: 100000 # Start compacting at 100K tokensBehavior:
- Compacts all tool categories except file writes and command execution (default
CompactionConfig.default()policy) - Shows full file paths in stubs (
redact_resource_identifiers: false) - Warns if can't reduce below 150K tokens
Configuration for production environments with sensitive file paths:
# config.yaml
compaction:
enabled: true
token_threshold: 100000
max_tokens: 150000
redact_resource_identifiers: true # Redact sensitive paths
# Compact only safe operations
allowed_tool_categories:
- file_read
- view_file
- search
- test_execution
# Never compact writes or commands
denied_tool_categories:
- file_write
- command_executionBehavior:
- Only compacts file reads, views, searches, and test execution
- Never compacts file writes or command execution
- Redacts API keys and sensitive strings in stubs
- Suitable for environments with secrets in file paths
Configuration for large codebases with frequent repeated operations:
# config.yaml
compaction:
enabled: true
token_threshold: 50000 # Start compacting earlier (50K)
max_tokens: 100000 # Warn at 100K instead of 150K
# Compact nearly everything except writes
allowed_tool_categories:
- file_read
- view_file
- search
- list_directory
- test_execution
denied_tool_categories:
- file_write
# command_execution not in denied list, so it will be compactedBehavior:
- Compacts most tool types, including command execution
- Starts compacting at 50K tokens (more aggressive)
- Warns earlier if compaction can't reduce size
- Good for large codebases with many repeated reads
Configuration targeting specific tool categories only:
# config.yaml
compaction:
enabled: true
token_threshold: 100000
# Only compact file reading operations
allowed_tool_categories:
- file_read
- view_file
# Everything else is preserved
denied_tool_categories:
- command_execution
- test_execution
- search
- list_directoryBehavior:
- Only compacts file read operations (most common case)
- Preserves all command execution, tests, searches, and directory listings
- Useful when you want to reduce file read duplication but preserve command history
Customize stub message format (advanced):
# config.yaml
compaction:
enabled: true
token_threshold: 100000
stub_template: |
[COMPACTED: {resource}] Removed {size} bytes of previous output.
A more recent result for this resource is available later in the conversation.Behavior:
- Uses custom stub template with formatted output
- Shows resource identifier and byte size
- Explains that a newer result exists
When agents work on complex tasks over multiple turns, they often read the same files repeatedly. Context compaction automatically removes older file reads, keeping only the most recent version:
compaction:
enabled: true
token_threshold: 100000Benefits:
- Reduces token usage by 20-40% in long conversations
- Prevents context window overflow
- Maintains conversation continuity
During code editing sessions, agents frequently view the same files across iterations. Compaction ensures only the latest file state is sent:
compaction:
enabled: true
token_threshold: 80000
allowed_tool_categories:
- file_read
- view_fileBenefits:
- Keeps context focused on current file state
- Reduces redundant file content
- Improves agent decision-making with fresher data
Debugging often involves repeated command executions and file reads. Compaction preserves command outputs while removing duplicate file reads:
compaction:
enabled: true
token_threshold: 100000
denied_tool_categories:
- command_execution # Preserve all command outputsBenefits:
- Preserves command execution history
- Removes duplicate file reads
- Maintains debugging context
When exploring large codebases, agents read many files. Compaction helps manage context size:
compaction:
enabled: true
token_threshold: 50000 # Start compacting earlier
max_tokens: 100000Benefits:
- Allows exploration of larger codebases
- Prevents context overflow
- Maintains exploration context
-
Start with conservative thresholds: Begin with
token_threshold: 100000andmax_tokens: 150000, then adjust based on your needs. -
Monitor overflow warnings: Review logs regularly for overflow warnings and adjust thresholds accordingly.
-
Keep redaction OFF for debugging: Use
redact_resource_identifiers: false(default) during development to easily identify which resources are being compacted. -
Enable redaction in production: If your file paths contain sensitive information (API keys, secrets), enable
redact_resource_identifiers: truein production. -
Review metrics regularly: Check
compaction_bytes_savedandcompaction_tokens_saved_estimatemetrics to measure effectiveness. -
Test before deploying: Enable DEBUG logging and verify compaction behavior in a non-production environment first.
-
Combine with session management: For very long conversations, use session management alongside compaction to effectively manage context size.
-
Align
max_tokenswith backend limits: Ensure yourmax_tokenssetting matches or is slightly below your backend's actual context window size. -
Don't compact writes by default: File write operations should typically be preserved (included in default
denied_tool_categories). -
Monitor performance impact: While compaction is fast (<10ms), verify that it doesn't impact your specific use case's latency requirements.
A: No. Only older, superseded tool outputs are compacted. The most recent result for each resource is always preserved. User messages, assistant messages, system prompts, and unique tool results are never compacted.
A: Yes. Stubs explicitly state what was removed and why. For example:
[COMPACTED] Previous output for /path/file.py (2500 bytes) was removed because a newer result for this resource exists later in the conversation.
The LLM can understand that a previous version existed and was superseded.
A: Wire captures record the post-compaction request exactly as sent to the backend. Stubs are visible in captures, but the original compacted content is not (since it was replaced before the request was sent). This gives you an accurate record of what the backend received.
A: Fail-open: Original messages are forwarded unchanged, and an error is logged. This ensures that compaction errors never block legitimate requests.
A: Token estimation uses character count / 4 as an approximation. This is fast and reasonably accurate for most text (typically within 20% of actual token count). The estimate is sufficient for triggering compaction and warning about overflow, but not precise enough for cost calculations.
A: By default, no. Command execution is in the denied_tool_categories list because command outputs often contain unique information that shouldn't be discarded. However, you can remove it from the denied list if you're sure you want to compact command results.
A: File reads are tracked by their normalized file path. For pagination-aware tools (like view_file with StartLine/EndLine), pagination parameters are included as secondary keys to distinguish reads of different file portions.
A: File paths are fully normalized (forward slashes, no trailing slashes, lowercase drive letters on Windows). Different paths are treated as different resources. Only reads of the exact same path are considered duplicates.
A: Yes. Check structured logs for the stale_resources field, which lists the resources that were detected as stale and compacted. You can also inspect wire captures to see stub messages with [COMPACTED] prefixes.
A: Yes. Compaction works with any tool that produces result messages with role="tool" and a tool_call_id. Tools are categorized by name patterns, and unknown tools fall into the other category.
A: These are complementary features. Context window enforcement blocks requests that exceed model limits, while compaction actively reduces token usage by removing stale content. Use both together for robust context management.
A: Not directly. Compaction applies globally based on the enabled flag in config.yaml. If you need backend-specific behavior, consider using separate proxy instances with different configurations.
A: The denied list takes precedence. If a category is in both lists, it will not be compacted. This provides a safety mechanism: denied categories are never compacted, regardless of the allowed list.
A: Yes. Compaction is designed with fail-open behavior: if anything goes wrong, the original messages are used. However, test thoroughly in your environment first and monitor metrics to ensure it's effective for your use case.
- Context Window Enforcement - Enforce per-model context window limits
- Session Management - Intelligent session continuity and history management
- Wire Captures - Inspect requests and responses with CBOR captures
- Monitoring Overview - Track compaction metrics and performance