Skip to content

feat(ai-proxy): add built-in nginx variables for LLM observability#13477

Open
AlinsRan wants to merge 11 commits into
apache:masterfrom
AlinsRan:feat/ai-builtin-access-log-vars
Open

feat(ai-proxy): add built-in nginx variables for LLM observability#13477
AlinsRan wants to merge 11 commits into
apache:masterfrom
AlinsRan:feat/ai-builtin-access-log-vars

Conversation

@AlinsRan
Copy link
Copy Markdown
Contributor

@AlinsRan AlinsRan commented Jun 5, 2026

Summary

Add 8 built-in nginx variables that ai-proxy sets automatically on every request. These variables expose LLM request/response metadata in the nginx access log and all logger plugins without any additional configuration.

New variables

Variable Type Description
$llm_total_tokens integer Total tokens (prompt + completion)
$llm_stream boolean true if request is streaming
$llm_has_tool_calls boolean true if response contains tool calls
$llm_tool_count integer Number of tools in the upstream request body
$llm_end_user_id string End-user identifier from request body
$llm_cache_read_input_tokens integer Prompt tokens served from provider cache
$llm_cache_creation_input_tokens integer Prompt tokens written to provider cache
$llm_reasoning_tokens integer Reasoning tokens (OpenAI o1/o3, Responses API)

Provider mapping for cache/reasoning tokens

Variable OpenAI Chat OpenAI Responses Anthropic DeepSeek
llm_cache_read_input_tokens prompt_tokens_details.cached_tokens input_tokens_details.cached_tokens cache_read_input_tokens prompt_cache_hit_tokens
llm_cache_creation_input_tokens cache_creation_input_tokens
llm_reasoning_tokens completion_tokens_details.reasoning_tokens output_tokens_details.reasoning_tokens

End-user ID extraction precedence: safety_identifier > user (OpenAI/compatible) or metadata.user_id (Anthropic Messages).

Example access log usage

log_format main '$llm_model $llm_total_tokens $llm_stream $llm_has_tool_calls $llm_end_user_id';

Also includes

  • Optional on_event callback parameter added to parse_streaming_response for per-event processing (e.g., streaming tool call detection).
  • extract_usage updated across all three protocol adapters to extract cache and reasoning token counts from provider responses.

Files changed

File Change
apisix/cli/ngx_tpl.lua Add 8 set $llm_* variable definitions
apisix/core/ctx.lua Register 8 variables in ngx_var_names
apisix/plugins/ai-protocols/openai-chat.lua Extract cache/reasoning tokens from prompt_tokens_details / completion_tokens_details
apisix/plugins/ai-protocols/openai-responses.lua Extract cache/reasoning tokens from input_tokens_details / output_tokens_details
apisix/plugins/ai-protocols/anthropic-messages.lua Extract cache tokens from Anthropic usage fields
apisix/plugins/ai-providers/base.lua Set new variables from usage; add on_event callback
apisix/plugins/ai-proxy/base.lua Compute llm_stream, llm_tool_count, llm_end_user_id, llm_has_tool_calls, llm_total_tokens
t/APISIX.pm Add set directives + extend log_format main
t/plugin/ai-proxy3.t 6 test cases (TEST 7–12) covering all new variables

Test plan

prove -I. -r t/plugin/ai-proxy3.t

Add 8 built-in nginx variables that ai-proxy sets automatically on every
request, making LLM metadata available in access_log format and logger plugins
without any additional plugin configuration.

New variables:

| Variable                        | Description                                      |
|---------------------------------|--------------------------------------------------|
| $llm_total_tokens               | Total tokens (prompt + completion)               |
| $llm_stream                     | true if request is streaming                     |
| $llm_has_tool_calls             | true if response contains tool calls             |
| $llm_tool_count                 | Number of tools in the upstream request body     |
| $llm_end_user_id                | End-user ID from request body                    |
| $llm_cache_read_input_tokens    | Prompt tokens served from provider cache         |
| $llm_cache_creation_input_tokens| Prompt tokens written to provider cache          |
| $llm_reasoning_tokens           | Reasoning tokens (OpenAI o1/o3, Responses API)   |

Provider mapping for cache/reasoning tokens:
- OpenAI Chat: prompt_tokens_details.cached_tokens / completion_tokens_details.reasoning_tokens
- OpenAI Responses: input_tokens_details.cached_tokens / output_tokens_details.reasoning_tokens
- Anthropic: cache_read_input_tokens / cache_creation_input_tokens
- DeepSeek: prompt_cache_hit_tokens (as cache_read_input_tokens fallback)

End-user ID extraction precedence: safety_identifier > user (OpenAI/compatible)
or metadata.user_id (Anthropic Messages).

The on_event callback parameter added to parse_streaming_response allows
callers to hook into the streaming event loop for per-event processing such
as tool call detection.
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jun 5, 2026
AlinsRan added 7 commits June 5, 2026 13:48
Eliminate redundant JSON decoding: event.data was being decoded twice,
once in parse_sse_event and again in the on_stream_event callback.

Instead, parse_sse_event now returns has_tool_call=true when a tool call
delta is detected — consistent with how prompt/completion tokens are
returned via extract_usage. ai-providers/base.lua sets
ctx.var.llm_has_tool_calls from this flag, removing the need for the
on_event callback and the on_stream_event closure entirely.
The has_tool_call check was placed after goto CONTINUE, so skip events
(including Anthropic content_block_start/tool_use) never reached it.

Move the check to before the skip guard. Anthropic content_block_start
with tool_use now correctly returns {type='skip', has_tool_call=true}
so it does not enter the converter/downstream pipeline.
anthropic-messages.lua does not need to be changed. The has_tool_call
check in ai-providers/base.lua now runs before goto CONTINUE, so skip
events with no has_tool_call field are simply ignored — no extra branch
in parse_sse_event needed.
…ck_start

Add content_block_start handling to anthropic-messages parse_sse_event.
When content_block.type == 'tool_use', return {type='skip', has_tool_call=true}:
- skip: do not forward this internal framing event downstream
- has_tool_call: signals ai-providers/base.lua to set llm_has_tool_calls

The has_tool_call check in base.lua runs before goto CONTINUE so skip
events carrying this flag are correctly handled.
1. openai-responses parse_sse_event: add cache_read_input_tokens and
   reasoning_tokens to response.completed usage (was missing, fix #1)

2. openai-responses parse_sse_event: detect function_call items in
   response.output[] and set has_tool_call (streaming tool calls were
   never detected, fix #3)

3. merge_usage: recompute total_tokens as prompt+completion after merge
   to handle split Anthropic events (message_start has input_tokens,
   message_delta has output_tokens; before this fix total_tokens was
   overwritten with output_tokens only, fix #2)
- Fix TEST 10 regex to verify cache_creation_input_tokens and reasoning_tokens
- Add TEST 38-40 in ai-proxy.t: Responses API streaming cache tokens,
  reasoning tokens, and function_call tool detection
- Add fixture files: responses-streaming-with-cache.sse,
  responses-streaming-with-tool-call.sse
Add chat-streaming-with-tool-calls.sse fixture and TEST 13 in ai-proxy3.t
to cover the choice.delta.tool_calls path in openai-chat parse_sse_event.
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 5, 2026
AlinsRan added 3 commits June 5, 2026 15:33
…ST 35

TEST 35 creates route 1 at uris=[/anything, /v1/responses]. TEST 38 was
creating a new route 2, leaving both routes matching /v1/responses with
different model configs (gpt-4o vs gpt-4o-mini). APISIX hit route 1 so
the access log showed gpt-4o instead of gpt-4o-mini, causing TEST 39/40
to fail. Fix by reusing route ID 1 in TEST 38.
…tion

The tests in ai-proxy.t (TEST 38-40) conflicted with the existing route 1
from TEST 35 (uris=[/anything, /v1/responses]) despite the PUT overwrite,
causing access_log pattern mismatch. Moving the tests to ai-proxy3.t where
all built-in var tests live avoids this interference.

Route URI /ai/v1/responses is used so openai-responses matches() detects it
(string_sub(uri, -13) == '/v1/responses'), while staying isolated from the
existing /v1/responses route in ai-proxy.t.
The SSE codec delimits events on a blank line. The two Responses API
streaming fixtures ended with a single newline after the last data line,
so the final response.completed frame (which carries usage and the
function_call output) was buffered and dropped at EOF, leaving all token
vars at 0 and llm_has_tool_calls=false. Add the trailing blank line.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant