You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calculating the average time it takes for the user to receive successive tokens after the first token
Normalized time per output token
e2e request latency / output tokens
ms per output token
Normalizing the request latency at the output token level for comparing different use cases
Inter Token Latency (ITL)
time between output token generation within a request
ms per output token
Calculating the time it takes for the user to receive successive tokens after the first token, but at a more granular level than TPOT which averages the token latency within a request
Calculating the performance to price ratio to get the throughput we are able to achieve for the cost spent
*Note: input and output token cost might need to be divided in mixed-batching cases since they are handled together by the server, using some factor like 1:4 for cost to generate input vs output tokens.