Skip to content

Conversation

@BenjaminBraunDev
Copy link
Contributor

Due to customer issues with TTFT spikes caused by the prefix cache scorer having an incorrect configuration, adding this to the troubleshooting guide to make it easier for users to diagnose and remediate similar issues.

In this case it was unclear that the TTFT spikes were caused by the prefix cache config until we saw the config wasn't set to the right parameters for the model being served.

Does this PR introduce a user-facing change?:

NONE

@netlify
Copy link

netlify bot commented Dec 23, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit f99ebb7
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/694b2c1961d2b10008fe7f46
😎 Deploy Preview https://deploy-preview-2040--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BenjaminBraunDev
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 23, 2025
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 23, 2025
@liu-cong
Copy link
Contributor

liu-cong commented Dec 24, 2025

Since v1.2 the plugin auto tunes such configurations from the model server metrics so no manual tuning is required, #1748. We should recommend users using the v1.2+ versions, and highlight that such tuning is only required before v1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants