Skip to content

fix: fix cache length#1314

Merged
shihaobai merged 1 commit into
mainfrom
fix_cache_tokens
May 21, 2026
Merged

fix: fix cache length#1314
shihaobai merged 1 commit into
mainfrom
fix_cache_tokens

Conversation

@blueswhen
Copy link
Copy Markdown
Collaborator

No description provided.

@shihaobai shihaobai merged commit 4c069d3 into main May 21, 2026
1 check passed
@shihaobai shihaobai deleted the fix_cache_tokens branch May 21, 2026 11:01
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the prompt_cache_len in the metadata to be the sum of GPU, CPU, and disk cache lengths. Feedback indicates that this change introduces inconsistencies with other parts of the function, specifically regarding metrics and logging that still rely on the GPU-only cache length variable.

cpu_prompt_cache_len = metadata.pop("cpu_prompt_cache_len", 0)
disk_prompt_cache_len = metadata.pop("disk_prompt_cache_len", 0)
metadata["prompt_cache_len"] = prompt_cache_len
metadata["prompt_cache_len"] = prompt_cache_len + cpu_prompt_cache_len + disk_prompt_cache_len
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The update to metadata["prompt_cache_len"] correctly calculates the total cache length by summing GPU, CPU, and disk cache lengths. However, this change introduces an inconsistency with the internal metrics and logging found later in this function.

Specifically, the local variable prompt_cache_len (which still holds only the GPU cache length) is used for:

  • Calculating prompt_cache_ratio (line 731).
  • Logging gpu_prompt_cache_len (line 747).
  • Reporting the lightllm_cache_length metric (line 760).

If the goal is to report the total cache length consistently across the API and monitoring, these other locations should also be updated to use the sum. Note that if you update the local variable prompt_cache_len to be the sum, you should also update the log label at line 747 to avoid mislabeling the total as 'gpu'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants