Skip to content

Split system prompt into static/dynamic blocks for better caching#10938

Open
sestinj wants to merge 6 commits intomainfrom
nate/split-system-prompt
Open

Split system prompt into static/dynamic blocks for better caching#10938
sestinj wants to merge 6 commits intomainfrom
nate/split-system-prompt

Conversation

@sestinj
Copy link
Contributor

@sestinj sestinj commented Feb 28, 2026

Summary

  • Restructure constructSystemMessage() to return an array of SystemMessageBlock[] instead of a single string, separating static content (core identity/instructions) from dynamic content (environment info, git status)
  • Static content in block 1 is identical across all users/projects, enabling Anthropic prompt cache hits globally. Semi-static user rules go in block 2, and per-session dynamic content goes in block 3
  • The Anthropic API adapter and AnthropicCachingStrategies.ts already handle system as an array of blocks with independent cache_control, so this change automatically improves cache utilization without API-side changes

Test plan

  • All 25 systemMessage.test.ts tests pass, including new tests for block structure
  • Updated all downstream consumers (streamChatResponse, compaction helpers, subagent executor, serve command) to handle the new array format
  • Updated all test mocks to use block format
  • Pre-existing test failures (yaml module resolution) confirmed to exist on main branch

🤖 Generated with Continue


Continue Tasks: ❌ 7 failed — View all


Summary by cubic

Split the system prompt into static, rules, and dynamic blocks to improve Anthropic prompt cache hits and reduce prompt tokens. Removed the directory tree context, added prompt cache telemetry, and updated services/tests to handle blocks.

  • New Features

    • constructSystemMessage returns SystemMessageBlock[] (static → user rules → dynamic).
    • Added flattenSystemMessage for places that need a single string.
    • convertFromUnifiedHistoryWithSystemMessage accepts a string or block array.
    • PostHog telemetry now includes cache_read_tokens, cache_write_tokens, and a prompt_cache_metrics event.
  • Refactors

    • Updated CLI services, streaming, compaction helpers, subagent executor, and tests to use blocks; flatten to string for token counting and subagents.
    • Removed directory tree from the prompt; kept git status and environment info.
    • Fixed auto compaction nullish check and lint rule violations; cleaned up unused directives and formatting.
    • Updated Vitest global mocks to return blocks and expose flattenSystemMessage.

Written for commit 400c329. Summary will update on new commits.

sestinj and others added 4 commits February 28, 2026 13:14
Remove the `getDirectoryStructure()` function and its embedding in the
system prompt. This was walking up to 500 files and embedding them as a
static tree in every API request, adding ~3,500-5,000 tokens per call.

The LLM already has tools (listFiles, Glob, Grep) to discover files on
demand, making the embedded tree redundant. Claude Code does not include
directory structure in its system prompt for the same reason.

This also improves prompt cache hit rates since the system prompt no
longer varies by project directory contents.

Generated with [Continue](https://continue.dev)

Co-Authored-By: Continue <noreply@continue.dev>
Add cacheReadTokens and cacheWriteTokens to the existing apiRequest
PostHog event, and emit a new prompt_cache_metrics event with
cache_hit_rate, cache_read_tokens, cache_write_tokens,
total_prompt_tokens, tool_count, and model. This populates the
existing Prompt Cache Performance dashboard (ID: 1310089).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These files belong to a different PR and were accidentally included.

Generated with [Continue](https://continue.dev)

Co-Authored-By: Continue <noreply@continue.dev>
Restructure constructSystemMessage() to return an array of content blocks
instead of a single string. This separates:

- Block 1 (static): Core identity and behavior instructions - identical
  across all users/projects, maximizing Anthropic cache hit rates
- Block 2 (semi-static): User rules from AGENTS.md, config YAML - same
  within a session but differs per project
- Block 3 (dynamic): Environment info (cwd, git status, platform, date) -
  changes per session

The Anthropic API adapter already handles system message content as an
array of {type:"text", text:string} blocks, and the caching strategies
in AnthropicCachingStrategies.ts cache each block independently. By
putting static content first, it gets cached and reused globally while
dynamic content at the end doesn't invalidate the cached prefix.

Co-Authored-By: Continue <noreply@continue.dev>
@sestinj sestinj requested a review from a team as a code owner February 28, 2026 21:35
@sestinj sestinj requested review from RomneyDa and removed request for a team February 28, 2026 21:35
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Feb 28, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 17 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="extensions/cli/src/stream/streamChatResponse.autoCompaction.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.autoCompaction.ts:167">
P2: The new truthy check skips an explicitly provided empty system message. This changes behavior from the previous nullish-coalescing logic and can override a deliberate empty string with the default system message. Use a nullish check instead.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

sestinj and others added 2 commits February 28, 2026 16:20
Apply prettier formatting to PR-changed files and fix the truthy check
for providedSystemMessage to use !== undefined, preserving behavior for
explicitly empty system messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Flip negated condition in autoCompaction to satisfy no-negated-condition rule
- Remove unused eslint-disable complexity directives
- Update vitest.setup.ts global mock to include flattenSystemMessage export
  and return SystemMessageBlock[] instead of plain string

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant