Skip to content

Conversation

@mayabar
Copy link
Contributor

@mayabar mayabar commented Jan 5, 2026

What this PR does / why we need it:
Currently, the prefix length stored in the prefix cache plugin is measured in blocks.

As part of enabling easy configuration for disaggregated PD support in the inference scheduler, all configuration field units will use tokens. This involves converting from characters to tokens using the average token length constant.

Which issue(s) this PR fixes:
Fixes #2068

Does this PR introduce a user-facing change?:

Prefix Plugin Changes
- New parameter: Added `blockSizeTokens` to prefix plugin configuration, defining cache block length in tokens (replacing character-based sizing).
- Deprecation notice: The legacy `blockSize` parameter is deprecated. Instantiating the prefix plugin will fail if `blockSize` is defined without also specifying `blockSizeTokens`.
- Data unit update: Changed data stored in `PrepareRequestData` in the prefix plugin from blocks to tokens.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mayabar
Once this PR has been reviewed and has the lgtm label, please assign kfswain for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link

netlify bot commented Jan 5, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit c1cea68
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/695f6778d985ea0007581598
😎 Deploy Preview https://deploy-preview-2053--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 5, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @mayabar. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 5, 2026
@mayabar mayabar changed the title WIP: Update prefix scorer to report cached prefix length in tokens Update prefix scorer to report cached prefix length in tokens Jan 5, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026
Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add more context in the description why this is needed.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 5, 2026
@mayabar mayabar requested a review from ahg-g January 5, 2026 14:03
// The input prompt is broken into sizes of BlockSizeTokens to calculate block hashes . Requests
// with length shorter than the block size will be ignored.
BlockSize int `json:"blockSize"`
BlockSizeTokens int `json:"blockSize"`
Copy link
Contributor

@ahg-g ahg-g Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add to the description that this PR introduces a user-facing change (the user here being the one who deploys the epp); This PR removes a config variable and adds a new one with different semantics.

In fact, we should keep the old variable, mark it as deprecated and fail to instantiate the plugin if set with an error message to instruct the user to migrate to the new parameter with its new semantics.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 6, 2026
@mayabar mayabar force-pushed the update-prefix-scorer branch from 40656e7 to b301782 Compare January 6, 2026 12:27
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 6, 2026
state := &SchedulingContextState{
PrefixHashes: hashes,
PrefixCacheServers: p.matchLongestPrefix(ctx, hashes),
PrefixCacheServers: p.matchLongestPrefix(ctx, hashes, blockSize),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not need to set the blockSize parameter here like we do in Score?

// A map of server to its longest prefix cache match length.
PrefixCacheServers map[ServerID]int
// Size of a block in tokens
BlockSize int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be consistent and also name this BlockSizeTokens?

// Update servers with their longest prefix match.
res[server]++
// Update servers with their longest prefix match, prefix length is in tokens.
res[server] += blockSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to report the longest prefix in tokens, isn't it enough to track it in terms of number of blocks? The less the number of places where we make the blockSize a factor the better, right?


total := len(state.PrefixHashes)
// total prefix length in tokens
total := len(state.PrefixHashes) * blockSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If matchLongestPrefix reports the number of matched blocks, then we don't need to multiply by blockSize here, right? May be I am missing something, but If we do that, wouldn't we restrict the relevance and use of the blockSizeTokens to the function that computes the hashes.

@mayabar mayabar requested a review from ahg-g January 6, 2026 13:26
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2026
…d of tokens

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
…contains length values in tokens.

- Add block size to SchedulingContextState of the prefix cache plugin.
- Tests partial updates

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
… defined in chars and the new one defined in tokens, update tests accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
@mayabar mayabar force-pushed the update-prefix-scorer branch from b301782 to c1cea68 Compare January 8, 2026 08:14
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 8, 2026
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change measuring units in prefix plugin

3 participants