feat(ai-proxy): add built-in nginx variables for LLM observability#13477
Open
AlinsRan wants to merge 11 commits into
Open
feat(ai-proxy): add built-in nginx variables for LLM observability#13477AlinsRan wants to merge 11 commits into
AlinsRan wants to merge 11 commits into
Conversation
Add 8 built-in nginx variables that ai-proxy sets automatically on every request, making LLM metadata available in access_log format and logger plugins without any additional plugin configuration. New variables: | Variable | Description | |---------------------------------|--------------------------------------------------| | $llm_total_tokens | Total tokens (prompt + completion) | | $llm_stream | true if request is streaming | | $llm_has_tool_calls | true if response contains tool calls | | $llm_tool_count | Number of tools in the upstream request body | | $llm_end_user_id | End-user ID from request body | | $llm_cache_read_input_tokens | Prompt tokens served from provider cache | | $llm_cache_creation_input_tokens| Prompt tokens written to provider cache | | $llm_reasoning_tokens | Reasoning tokens (OpenAI o1/o3, Responses API) | Provider mapping for cache/reasoning tokens: - OpenAI Chat: prompt_tokens_details.cached_tokens / completion_tokens_details.reasoning_tokens - OpenAI Responses: input_tokens_details.cached_tokens / output_tokens_details.reasoning_tokens - Anthropic: cache_read_input_tokens / cache_creation_input_tokens - DeepSeek: prompt_cache_hit_tokens (as cache_read_input_tokens fallback) End-user ID extraction precedence: safety_identifier > user (OpenAI/compatible) or metadata.user_id (Anthropic Messages). The on_event callback parameter added to parse_streaming_response allows callers to hook into the streaming event loop for per-event processing such as tool call detection.
Eliminate redundant JSON decoding: event.data was being decoded twice, once in parse_sse_event and again in the on_stream_event callback. Instead, parse_sse_event now returns has_tool_call=true when a tool call delta is detected — consistent with how prompt/completion tokens are returned via extract_usage. ai-providers/base.lua sets ctx.var.llm_has_tool_calls from this flag, removing the need for the on_event callback and the on_stream_event closure entirely.
The has_tool_call check was placed after goto CONTINUE, so skip events
(including Anthropic content_block_start/tool_use) never reached it.
Move the check to before the skip guard. Anthropic content_block_start
with tool_use now correctly returns {type='skip', has_tool_call=true}
so it does not enter the converter/downstream pipeline.
anthropic-messages.lua does not need to be changed. The has_tool_call check in ai-providers/base.lua now runs before goto CONTINUE, so skip events with no has_tool_call field are simply ignored — no extra branch in parse_sse_event needed.
…ck_start
Add content_block_start handling to anthropic-messages parse_sse_event.
When content_block.type == 'tool_use', return {type='skip', has_tool_call=true}:
- skip: do not forward this internal framing event downstream
- has_tool_call: signals ai-providers/base.lua to set llm_has_tool_calls
The has_tool_call check in base.lua runs before goto CONTINUE so skip
events carrying this flag are correctly handled.
1. openai-responses parse_sse_event: add cache_read_input_tokens and reasoning_tokens to response.completed usage (was missing, fix #1) 2. openai-responses parse_sse_event: detect function_call items in response.output[] and set has_tool_call (streaming tool calls were never detected, fix #3) 3. merge_usage: recompute total_tokens as prompt+completion after merge to handle split Anthropic events (message_start has input_tokens, message_delta has output_tokens; before this fix total_tokens was overwritten with output_tokens only, fix #2)
- Fix TEST 10 regex to verify cache_creation_input_tokens and reasoning_tokens - Add TEST 38-40 in ai-proxy.t: Responses API streaming cache tokens, reasoning tokens, and function_call tool detection - Add fixture files: responses-streaming-with-cache.sse, responses-streaming-with-tool-call.sse
Add chat-streaming-with-tool-calls.sse fixture and TEST 13 in ai-proxy3.t to cover the choice.delta.tool_calls path in openai-chat parse_sse_event.
…ST 35 TEST 35 creates route 1 at uris=[/anything, /v1/responses]. TEST 38 was creating a new route 2, leaving both routes matching /v1/responses with different model configs (gpt-4o vs gpt-4o-mini). APISIX hit route 1 so the access log showed gpt-4o instead of gpt-4o-mini, causing TEST 39/40 to fail. Fix by reusing route ID 1 in TEST 38.
…tion The tests in ai-proxy.t (TEST 38-40) conflicted with the existing route 1 from TEST 35 (uris=[/anything, /v1/responses]) despite the PUT overwrite, causing access_log pattern mismatch. Moving the tests to ai-proxy3.t where all built-in var tests live avoids this interference. Route URI /ai/v1/responses is used so openai-responses matches() detects it (string_sub(uri, -13) == '/v1/responses'), while staying isolated from the existing /v1/responses route in ai-proxy.t.
The SSE codec delimits events on a blank line. The two Responses API streaming fixtures ended with a single newline after the last data line, so the final response.completed frame (which carries usage and the function_call output) was buffered and dropped at EOF, leaving all token vars at 0 and llm_has_tool_calls=false. Add the trailing blank line.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add 8 built-in nginx variables that
ai-proxysets automatically on every request. These variables expose LLM request/response metadata in the nginx access log and all logger plugins without any additional configuration.New variables
$llm_total_tokens$llm_streamtrueif request is streaming$llm_has_tool_callstrueif response contains tool calls$llm_tool_count$llm_end_user_id$llm_cache_read_input_tokens$llm_cache_creation_input_tokens$llm_reasoning_tokensProvider mapping for cache/reasoning tokens
llm_cache_read_input_tokensprompt_tokens_details.cached_tokensinput_tokens_details.cached_tokenscache_read_input_tokensprompt_cache_hit_tokensllm_cache_creation_input_tokenscache_creation_input_tokensllm_reasoning_tokenscompletion_tokens_details.reasoning_tokensoutput_tokens_details.reasoning_tokensEnd-user ID extraction precedence:
safety_identifier>user(OpenAI/compatible) ormetadata.user_id(Anthropic Messages).Example access log usage
Also includes
on_eventcallback parameter added toparse_streaming_responsefor per-event processing (e.g., streaming tool call detection).extract_usageupdated across all three protocol adapters to extract cache and reasoning token counts from provider responses.Files changed
apisix/cli/ngx_tpl.luaset $llm_*variable definitionsapisix/core/ctx.luangx_var_namesapisix/plugins/ai-protocols/openai-chat.luaprompt_tokens_details/completion_tokens_detailsapisix/plugins/ai-protocols/openai-responses.luainput_tokens_details/output_tokens_detailsapisix/plugins/ai-protocols/anthropic-messages.luaapisix/plugins/ai-providers/base.luaon_eventcallbackapisix/plugins/ai-proxy/base.luallm_stream,llm_tool_count,llm_end_user_id,llm_has_tool_calls,llm_total_tokenst/APISIX.pmsetdirectives + extendlog_format maint/plugin/ai-proxy3.tTest plan