-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
extract_anthropic_usage() in the Anthropic wrapper uses a hardcoded whitelist of 4 usage fields. The Anthropic Messages API now returns additional usage fields that are silently dropped, most notably server_tool_use (with web_search_requests and web_fetch_requests sub-fields) and cache_creation TTL breakdowns.
By contrast, the OpenAI and LiteLLM wrappers both use a generic iteration approach (_parse_metrics_from_usage) that automatically captures any new usage field the upstream API returns.
What is missing
The function extract_anthropic_usage() in py/src/braintrust/integrations/anthropic/_utils.py (lines 16-70) only extracts:
input_tokens→prompt_tokensoutput_tokens→completion_tokenscache_read_input_tokens→prompt_cached_tokenscache_creation_input_tokens→prompt_cache_creation_tokens
The Anthropic API usage object now also includes:
server_tool_use.web_search_requests(int) — number of web search requests madeserver_tool_use.web_fetch_requests(int) — number of web fetch requests madecache_creation.ephemeral_5m_input_tokens(int) — 5-minute TTL cache tokenscache_creation.ephemeral_1h_input_tokens(int) — 1-hour TTL cache tokensservice_tier(string) — which tier handled the request- Any future fields added to the usage object
These fields are present in API responses today (confirmed in Claude Agent SDK cassettes, e.g. py/src/braintrust/integrations/claude_agent_sdk/cassettes/test_calculator_with_multiple_operations.json lines 823-825) but are silently discarded.
Braintrust docs status
not_found — Braintrust docs at https://www.braintrust.dev/docs do not mention Anthropic server tool usage metrics or cache TTL breakdowns.
Upstream sources
- Anthropic Python SDK
Usagetype: anthropic-sdk-pythonusage.py - Anthropic Python SDK
ServerToolUsagetype: anthropic-sdk-pythonserver_tool_usage.py - Anthropic web search tool docs: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/web-search-tool
Local repo files inspected
py/src/braintrust/integrations/anthropic/_utils.py— hardcoded usage extractionpy/src/braintrust/integrations/anthropic/tracing.py— callsextract_anthropic_usage()py/src/braintrust/oai.pylines 1009-1036 — OpenAI's generic_parse_metrics_from_usage()for comparisonpy/src/braintrust/wrappers/litellm.pylines 546-573 — LiteLLM's generic approach for comparisonpy/src/braintrust/integrations/claude_agent_sdk/cassettes/test_calculator_with_multiple_operations.json— cassette confirmingserver_tool_usein real API responses
Suggested approach
Replace the hardcoded whitelist with a generic iteration approach similar to the OpenAI wrapper, mapping known Anthropic field names to Braintrust's standard metric names while preserving any unrecognized numeric fields. Handle nested sub-objects like server_tool_use and cache_creation with a prefix pattern (e.g. server_tool_use_web_search_requests).