Add automatic provider prompt caching and cache-hit metrics#1403
Add automatic provider prompt caching and cache-hit metrics#1403willccbb wants to merge 5 commits into
Conversation
ApprovabilityVerdict: Needs human review This PR introduces a new feature that automatically enables prompt caching for supported providers by default, modifying API requests and changing how token usage is calculated/reported. New features with runtime behavior changes enabled by default warrant human review. You can customize Macroscope's approvability policy. Learn more. |
…refix-caching # Conflicts: # verifiers/scripts/tui.py # verifiers/utils/metric_utils.py # verifiers/utils/save_utils.py # verifiers/utils/usage_utils.py
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 10e0030. Configure here.
| if not isinstance(raw_prompt_cache, bool): | ||
| raise ValueError("'prompt_cache' must be a boolean when provided.") | ||
| return raw_prompt_cache | ||
|
|
There was a problem hiding this comment.
Skills files not updated after eval changes
Low Severity
Changes to verifiers/scripts/eval.py (adding prompt_cache config option and build_prompt_cache_enabled) and docs/evaluation.md (documenting prompt caching behavior) are listed as triggers for skills updates. No corresponding updates to skills/evaluate-environments/SKILL.md or other affected skill files appear in this PR.
Additional Locations (1)
Triggered by project rule: BugBot Instructions
Reviewed by Cursor Bugbot for commit 10e0030. Configure here.


Summary
input_tokensreflects non-cache-hit prompt tokensTesting
Note
Medium Risk
Medium risk because it changes request construction for Anthropic/OpenRouter (injecting
cache_control) and alters token/usage accounting (input_tokensnow excludes cache hits), which can affect downstream metrics and cost/usage reporting.Overview
Adds automatic prompt caching behavior for supported providers by inferring provider from
api_base_url+client_type, and applying the correct request mutation (Anthropic top-levelcache_control, OpenRouterextra_body.cache_control, OpenAI implicit no-op) via a newprompt_cache_utilshook wired intoClient.get_response().Introduces a
prompt_cacheopt-out flag (defaulttrue) plumbed from endpoint registry and eval TOML intoClientConfig/EndpointClientConfig, with TOML validation and precedence rules.Surfaces cache-hit accounting end-to-end by adding
cached_input_tokenstoUsage/TokenUsage, updating OpenAI/Anthropic usage parsing to split cached vs uncached prompt tokens, and propagating the new field through state saving, metrics, CLI/TUI display, and documentation/tests.Reviewed by Cursor Bugbot for commit 10e0030. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add automatic prompt caching and cache-hit metrics for OpenAI, Anthropic, and OpenRouter
cached_input_tokensonUsage, subtracting them from reportedinput_tokensto avoid double-counting.CachedInputTokensMetricand propagatescached_input_tokensthroughStateUsageTracker,TokenUsage,RolloutOutput, and metadata so cache hits are tracked end-to-end.prompt_cache = falsein TOML config orClientConfig; invalid non-boolean values raise aValueError.input_tokensinUsageandTokenUsagenow excludes cache-hit tokens; consumers relying oninput_tokensfor total prompt size should addcached_input_tokens.Macroscope summarized 10e0030.