feat(ai-proxy): add built-in nginx variables for LLM observability by AlinsRan · Pull Request #13477 · apache/apisix

AlinsRan · 2026-06-05T01:04:07Z

Summary

Add 8 built-in nginx variables that ai-proxy sets automatically on every request. These variables expose LLM request/response metadata in the nginx access log and all logger plugins without any additional configuration.

New variables

Variable	Type	Description
`$llm_total_tokens`	integer	Total tokens (prompt + completion)
`$llm_stream`	boolean	`true` if request is streaming
`$llm_has_tool_calls`	boolean	`true` if response contains tool calls
`$llm_tool_count`	integer	Number of tools in the upstream request body
`$llm_end_user_id`	string	End-user identifier from request body
`$llm_cache_read_input_tokens`	integer	Prompt tokens served from provider cache
`$llm_cache_creation_input_tokens`	integer	Prompt tokens written to provider cache
`$llm_reasoning_tokens`	integer	Reasoning tokens (OpenAI o1/o3, Responses API)

Provider mapping for cache/reasoning tokens

Variable	OpenAI Chat	OpenAI Responses	Anthropic	DeepSeek
`llm_cache_read_input_tokens`	`prompt_tokens_details.cached_tokens`	`input_tokens_details.cached_tokens`	`cache_read_input_tokens`	`prompt_cache_hit_tokens`
`llm_cache_creation_input_tokens`	—	—	`cache_creation_input_tokens`	—
`llm_reasoning_tokens`	`completion_tokens_details.reasoning_tokens`	`output_tokens_details.reasoning_tokens`	—	—

End-user ID extraction precedence: safety_identifier > user (OpenAI/compatible) or metadata.user_id (Anthropic Messages).

Example access log usage

log_format main '$llm_model $llm_total_tokens $llm_stream $llm_has_tool_calls $llm_end_user_id';

Also includes

Optional on_event callback parameter added to parse_streaming_response for per-event processing (e.g., streaming tool call detection).
extract_usage updated across all three protocol adapters to extract cache and reasoning token counts from provider responses.

Files changed

File	Change
`apisix/cli/ngx_tpl.lua`	Add 8 `set $llm_*` variable definitions
`apisix/core/ctx.lua`	Register 8 variables in `ngx_var_names`
`apisix/plugins/ai-protocols/openai-chat.lua`	Extract cache/reasoning tokens from `prompt_tokens_details` / `completion_tokens_details`
`apisix/plugins/ai-protocols/openai-responses.lua`	Extract cache/reasoning tokens from `input_tokens_details` / `output_tokens_details`
`apisix/plugins/ai-protocols/anthropic-messages.lua`	Extract cache tokens from Anthropic usage fields
`apisix/plugins/ai-providers/base.lua`	Set new variables from usage; add `on_event` callback
`apisix/plugins/ai-proxy/base.lua`	Compute `llm_stream`, `llm_tool_count`, `llm_end_user_id`, `llm_has_tool_calls`, `llm_total_tokens`
`t/APISIX.pm`	Add `set` directives + extend `log_format main`
`t/plugin/ai-proxy3.t`	6 test cases (TEST 7–12) covering all new variables

Test plan

prove -I. -r t/plugin/ai-proxy3.t

Add 8 built-in nginx variables that ai-proxy sets automatically on every request, making LLM metadata available in access_log format and logger plugins without any additional plugin configuration. New variables: | Variable | Description | |---------------------------------|--------------------------------------------------| | $llm_total_tokens | Total tokens (prompt + completion) | | $llm_stream | true if request is streaming | | $llm_has_tool_calls | true if response contains tool calls | | $llm_tool_count | Number of tools in the upstream request body | | $llm_end_user_id | End-user ID from request body | | $llm_cache_read_input_tokens | Prompt tokens served from provider cache | | $llm_cache_creation_input_tokens| Prompt tokens written to provider cache | | $llm_reasoning_tokens | Reasoning tokens (OpenAI o1/o3, Responses API) | Provider mapping for cache/reasoning tokens: - OpenAI Chat: prompt_tokens_details.cached_tokens / completion_tokens_details.reasoning_tokens - OpenAI Responses: input_tokens_details.cached_tokens / output_tokens_details.reasoning_tokens - Anthropic: cache_read_input_tokens / cache_creation_input_tokens - DeepSeek: prompt_cache_hit_tokens (as cache_read_input_tokens fallback) End-user ID extraction precedence: safety_identifier > user (OpenAI/compatible) or metadata.user_id (Anthropic Messages). The on_event callback parameter added to parse_streaming_response allows callers to hook into the streaming event loop for per-event processing such as tool call detection.

Eliminate redundant JSON decoding: event.data was being decoded twice, once in parse_sse_event and again in the on_stream_event callback. Instead, parse_sse_event now returns has_tool_call=true when a tool call delta is detected — consistent with how prompt/completion tokens are returned via extract_usage. ai-providers/base.lua sets ctx.var.llm_has_tool_calls from this flag, removing the need for the on_event callback and the on_stream_event closure entirely.

The has_tool_call check was placed after goto CONTINUE, so skip events (including Anthropic content_block_start/tool_use) never reached it. Move the check to before the skip guard. Anthropic content_block_start with tool_use now correctly returns {type='skip', has_tool_call=true} so it does not enter the converter/downstream pipeline.

anthropic-messages.lua does not need to be changed. The has_tool_call check in ai-providers/base.lua now runs before goto CONTINUE, so skip events with no has_tool_call field are simply ignored — no extra branch in parse_sse_event needed.

…ck_start Add content_block_start handling to anthropic-messages parse_sse_event. When content_block.type == 'tool_use', return {type='skip', has_tool_call=true}: - skip: do not forward this internal framing event downstream - has_tool_call: signals ai-providers/base.lua to set llm_has_tool_calls The has_tool_call check in base.lua runs before goto CONTINUE so skip events carrying this flag are correctly handled.

1. openai-responses parse_sse_event: add cache_read_input_tokens and reasoning_tokens to response.completed usage (was missing, fix #1) 2. openai-responses parse_sse_event: detect function_call items in response.output[] and set has_tool_call (streaming tool calls were never detected, fix #3) 3. merge_usage: recompute total_tokens as prompt+completion after merge to handle split Anthropic events (message_start has input_tokens, message_delta has output_tokens; before this fix total_tokens was overwritten with output_tokens only, fix #2)

- Fix TEST 10 regex to verify cache_creation_input_tokens and reasoning_tokens - Add TEST 38-40 in ai-proxy.t: Responses API streaming cache tokens, reasoning tokens, and function_call tool detection - Add fixture files: responses-streaming-with-cache.sse, responses-streaming-with-tool-call.sse

Add chat-streaming-with-tool-calls.sse fixture and TEST 13 in ai-proxy3.t to cover the choice.delta.tool_calls path in openai-chat parse_sse_event.

…ST 35 TEST 35 creates route 1 at uris=[/anything, /v1/responses]. TEST 38 was creating a new route 2, leaving both routes matching /v1/responses with different model configs (gpt-4o vs gpt-4o-mini). APISIX hit route 1 so the access log showed gpt-4o instead of gpt-4o-mini, causing TEST 39/40 to fail. Fix by reusing route ID 1 in TEST 38.

…tion The tests in ai-proxy.t (TEST 38-40) conflicted with the existing route 1 from TEST 35 (uris=[/anything, /v1/responses]) despite the PUT overwrite, causing access_log pattern mismatch. Moving the tests to ai-proxy3.t where all built-in var tests live avoids this interference. Route URI /ai/v1/responses is used so openai-responses matches() detects it (string_sub(uri, -13) == '/v1/responses'), while staying isolated from the existing /v1/responses route in ai-proxy.t.

The SSE codec delimits events on a blank line. The two Responses API streaming fixtures ended with a single newline after the last data line, so the final response.completed frame (which carries usage and the function_call output) was buffered and dropped at EOF, leaving all token vars at 0 and llm_has_tool_calls=false. Add the trailing blank line.

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jun 5, 2026

AlinsRan added 7 commits June 5, 2026 13:48

test(ai-proxy): add OpenAI Chat streaming tool_calls detection test

4448926

Add chat-streaming-with-tool-calls.sse fixture and TEST 13 in ai-proxy3.t to cover the choice.delta.tool_calls path in openai-chat parse_sse_event.

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 5, 2026

AlinsRan added 3 commits June 5, 2026 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-proxy): add built-in nginx variables for LLM observability#13477

feat(ai-proxy): add built-in nginx variables for LLM observability#13477
AlinsRan wants to merge 11 commits into
apache:masterfrom
AlinsRan:feat/ai-builtin-access-log-vars

AlinsRan commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AlinsRan commented Jun 5, 2026

Summary

New variables

Provider mapping for cache/reasoning tokens

Example access log usage

Also includes

Files changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant