In these two traces, there are only input and output lengths, but no prefix cache hit rate. Therefore, the data tested using these traces cannot truly reflect the inference performance and load conditions. Could you please supplement these traces with their corresponding prefix cache hit rates?
In these two traces, there are only input and output lengths, but no prefix cache hit rate. Therefore, the data tested using these traces cannot truly reflect the inference performance and load conditions. Could you please supplement these traces with their corresponding prefix cache hit rates?