Skip to content

Add automatic provider prompt caching and cache-hit metrics#1403

Open
willccbb wants to merge 5 commits into
mainfrom
codex/leverage-host-prefix-caching
Open

Add automatic provider prompt caching and cache-hit metrics#1403
willccbb wants to merge 5 commits into
mainfrom
codex/leverage-host-prefix-caching

Conversation

@willccbb
Copy link
Copy Markdown
Member

@willccbb willccbb commented May 17, 2026

Summary

  • Automatically apply prompt-cache defaults for supported providers inferred from endpoint URL and client type
  • Surface cache-hit usage in token accounting so input_tokens reflects non-cache-hit prompt tokens
  • Keep rollout scheduling unchanged and remove pre-firing logic
  • Update eval display, TUI, docs, and tests for the new cache accounting shape

Testing

  • Focused unit tests for prompt-cache policy inference, provider request mutation, usage parsing, and serialized output fallback
  • Broader client/runtime regression tests for OpenAI, Anthropic, and eval lifecycle paths
  • Lint and formatting checks passed

Note

Medium Risk
Medium risk because it changes request construction for Anthropic/OpenRouter (injecting cache_control) and alters token/usage accounting (input_tokens now excludes cache hits), which can affect downstream metrics and cost/usage reporting.

Overview
Adds automatic prompt caching behavior for supported providers by inferring provider from api_base_url + client_type, and applying the correct request mutation (Anthropic top-level cache_control, OpenRouter extra_body.cache_control, OpenAI implicit no-op) via a new prompt_cache_utils hook wired into Client.get_response().

Introduces a prompt_cache opt-out flag (default true) plumbed from endpoint registry and eval TOML into ClientConfig/EndpointClientConfig, with TOML validation and precedence rules.

Surfaces cache-hit accounting end-to-end by adding cached_input_tokens to Usage/TokenUsage, updating OpenAI/Anthropic usage parsing to split cached vs uncached prompt tokens, and propagating the new field through state saving, metrics, CLI/TUI display, and documentation/tests.

Reviewed by Cursor Bugbot for commit 10e0030. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add automatic prompt caching and cache-hit metrics for OpenAI, Anthropic, and OpenRouter

  • Adds a new prompt_cache_utils.py module that detects the provider from client config/URL and injects provider-specific cache control payloads into requests for Anthropic and OpenRouter before each API call.
  • All OpenAI, Anthropic, and OpenRouter clients now parse cache hit tokens from native responses and expose them as cached_input_tokens on Usage, subtracting them from reported input_tokens to avoid double-counting.
  • Adds CachedInputTokensMetric and propagates cached_input_tokens through StateUsageTracker, TokenUsage, RolloutOutput, and metadata so cache hits are tracked end-to-end.
  • Prompt caching can be disabled per-endpoint or globally via prompt_cache = false in TOML config or ClientConfig; invalid non-boolean values raise a ValueError.
  • Behavioral Change: input_tokens in Usage and TokenUsage now excludes cache-hit tokens; consumers relying on input_tokens for total prompt size should add cached_input_tokens.

Macroscope summarized 10e0030.

Comment thread verifiers/utils/usage_utils.py
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 17, 2026

Approvability

Verdict: Needs human review

This PR introduces a new feature that automatically enables prompt caching for supported providers by default, modifying API requests and changing how token usage is calculated/reported. New features with runtime behavior changes enabled by default warrant human review.

You can customize Macroscope's approvability policy. Learn more.

willccbb added 3 commits May 17, 2026 17:45
…refix-caching

# Conflicts:
#	verifiers/scripts/tui.py
#	verifiers/utils/metric_utils.py
#	verifiers/utils/save_utils.py
#	verifiers/utils/usage_utils.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 10e0030. Configure here.

Comment thread verifiers/scripts/eval.py
if not isinstance(raw_prompt_cache, bool):
raise ValueError("'prompt_cache' must be a boolean when provided.")
return raw_prompt_cache

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills files not updated after eval changes

Low Severity

Changes to verifiers/scripts/eval.py (adding prompt_cache config option and build_prompt_cache_enabled) and docs/evaluation.md (documenting prompt caching behavior) are listed as triggers for skills updates. No corresponding updates to skills/evaluate-environments/SKILL.md or other affected skill files appear in this PR.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit 10e0030. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant