Skip to content

Anthropic: usage metrics extraction silently drops server_tool_use, cache_creation breakdown, and other new API fields #164

@braintrust-bot

Description

@braintrust-bot

Summary

extract_anthropic_usage() in the Anthropic wrapper uses a hardcoded whitelist of 4 usage fields. The Anthropic Messages API now returns additional usage fields that are silently dropped, most notably server_tool_use (with web_search_requests and web_fetch_requests sub-fields) and cache_creation TTL breakdowns.

By contrast, the OpenAI and LiteLLM wrappers both use a generic iteration approach (_parse_metrics_from_usage) that automatically captures any new usage field the upstream API returns.

What is missing

The function extract_anthropic_usage() in py/src/braintrust/integrations/anthropic/_utils.py (lines 16-70) only extracts:

  • input_tokensprompt_tokens
  • output_tokenscompletion_tokens
  • cache_read_input_tokensprompt_cached_tokens
  • cache_creation_input_tokensprompt_cache_creation_tokens

The Anthropic API usage object now also includes:

  • server_tool_use.web_search_requests (int) — number of web search requests made
  • server_tool_use.web_fetch_requests (int) — number of web fetch requests made
  • cache_creation.ephemeral_5m_input_tokens (int) — 5-minute TTL cache tokens
  • cache_creation.ephemeral_1h_input_tokens (int) — 1-hour TTL cache tokens
  • service_tier (string) — which tier handled the request
  • Any future fields added to the usage object

These fields are present in API responses today (confirmed in Claude Agent SDK cassettes, e.g. py/src/braintrust/integrations/claude_agent_sdk/cassettes/test_calculator_with_multiple_operations.json lines 823-825) but are silently discarded.

Braintrust docs status

not_found — Braintrust docs at https://www.braintrust.dev/docs do not mention Anthropic server tool usage metrics or cache TTL breakdowns.

Upstream sources

Local repo files inspected

  • py/src/braintrust/integrations/anthropic/_utils.py — hardcoded usage extraction
  • py/src/braintrust/integrations/anthropic/tracing.py — calls extract_anthropic_usage()
  • py/src/braintrust/oai.py lines 1009-1036 — OpenAI's generic _parse_metrics_from_usage() for comparison
  • py/src/braintrust/wrappers/litellm.py lines 546-573 — LiteLLM's generic approach for comparison
  • py/src/braintrust/integrations/claude_agent_sdk/cassettes/test_calculator_with_multiple_operations.json — cassette confirming server_tool_use in real API responses

Suggested approach

Replace the hardcoded whitelist with a generic iteration approach similar to the OpenAI wrapper, mapping known Anthropic field names to Braintrust's standard metric names while preserving any unrecognized numeric fields. Handle nested sub-objects like server_tool_use and cache_creation with a prefix pattern (e.g. server_tool_use_web_search_requests).

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions