fix: fix cache length#1314
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the prompt_cache_len in the metadata to be the sum of GPU, CPU, and disk cache lengths. Feedback indicates that this change introduces inconsistencies with other parts of the function, specifically regarding metrics and logging that still rely on the GPU-only cache length variable.
| cpu_prompt_cache_len = metadata.pop("cpu_prompt_cache_len", 0) | ||
| disk_prompt_cache_len = metadata.pop("disk_prompt_cache_len", 0) | ||
| metadata["prompt_cache_len"] = prompt_cache_len | ||
| metadata["prompt_cache_len"] = prompt_cache_len + cpu_prompt_cache_len + disk_prompt_cache_len |
There was a problem hiding this comment.
The update to metadata["prompt_cache_len"] correctly calculates the total cache length by summing GPU, CPU, and disk cache lengths. However, this change introduces an inconsistency with the internal metrics and logging found later in this function.
Specifically, the local variable prompt_cache_len (which still holds only the GPU cache length) is used for:
- Calculating
prompt_cache_ratio(line 731). - Logging
gpu_prompt_cache_len(line 747). - Reporting the
lightllm_cache_lengthmetric (line 760).
If the goal is to report the total cache length consistently across the API and monitoring, these other locations should also be updated to use the sum. Note that if you update the local variable prompt_cache_len to be the sum, you should also update the log label at line 747 to avoid mislabeling the total as 'gpu'.
No description provided.