Skip to content

Conversation

@evacchi
Copy link

@evacchi evacchi commented Jan 8, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Addresses #2012.

Does this PR introduce a user-facing change?:

NONE

The structure of the PR follows very closely what's done for FlowState, you will recognize many patterns and similar code paths. This is a very early attempt, so it might be too simplistic...

  • Config: we introduce PriorityBandGCTimeout with a defaultPriorityBandGCTimeout; this defaults to 2 * defaultFlowGCTimeout to give a "grace period" where the bands are retained even if the owning flows have been already collected, so that if a new flow with the same priority is re-created shortly after, they won't have to be reinstantiated from scratch
  • FlowRegistry: we introduce a priorityBandStates sync.Map to be used similarly to flowStates sync.Map
    • keyed with int priorities
    • valued with priorityBandState structs; this struct include a becameIdleAt time.Time field
  • executeGCCycle() now also updates and checks the "becameIdleAt" timestamp for priority bands + marks candidates + deletes them
  • isBandEmpty(priority) might be simplistic and maybe racy? it checks for len(band.queues) and uses band.len, band.byteSize to account for in-flight, buffered requests. There is no equivalent to the leaseCount. Does this make sense?
  • verifyAndSweepPriorityBands() is almost verbatim the same as verifyAndSweepFlows()

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 8, 2026
@netlify
Copy link

netlify bot commented Jan 8, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit ffc09fb
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/695fbe66833255000843acca
😎 Deploy Preview https://deploy-preview-2097--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: evacchi
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @evacchi!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 8, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @evacchi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 8, 2026
@LukeAVanDrie
Copy link
Contributor

@evacchi, this is an excellent draft! I agree with you that the flowState pattern (leases + surgical locking) is the right model for managing lifecycle here. However, seeing this implementation laid out has helped me realize a critical circular dependency between the JIT Provisioning path (prepareNewFlow) and this Garbage Collection path (verifyAndSweepPriorityBands).

Problem: "Zombie" Band

Currently, we have two competing authorities for the existence of a band:

  1. JIT Path: Checks Config (Read Lock) → Assumes Band exists → Dispatches to Shard.
  2. GC Path: Checks Idleness (Write Lock) → Removes from Config & Shard.

Because the JIT path does not (and currently cannot) acquire the priorityBandState.gcLock, we have a race condition:

  1. GC verifies a band is empty and pauses.
  2. JIT sees the band exists in Config, validates it, and passes the "check".
  3. GC wakes up and deletes the band from the Shards.
  4. JIT proceeds to use the now-deleted band on the Shard, causing a panic or ErrBandNotFound.

Why We Can't Just Add a Lock

If we try to fix this by making the JIT path acquire band.gcLock.RLock() to prevent deletion, we create a deadlock:

  • JIT needs: Registry.mu (to find the band) → band.gcLock (to keep it alive).
  • GC needs: band.gcLock (to verify empty) → Registry.mu (to delete it from config).
  • Result: circular wait

Proposal: Split Admission from Cleanup

To solve this safely, I believe we need to decouple the admission from memory management (GC). This likely makes Issue #2011 (controller-driven lifecycle) a prerequisite, but with a twist:

  1. Soft Delete (Controller / [Flow Control] Refactor: Drive Priority Band lifecycle from InferenceObjective controller. #2011): The Controller removes the Band from the Registry.Config. New requests are rejected (503) or defaulted. This stops any new admissions.
  2. Hard Delete (Your PR): The GC scans for bands that are (a) Missing from Config AND (b) Empty / Idle. It then cleans up the memory.

This breaks the lock hierarchy cycle because the JIT path never fights the GC. The JIT only cares if it's in the Config. The GC only touches it if it's not in the Config.

Next Steps?

I think we have two options:

  1. Pivot to [Flow Control] Refactor: Drive Priority Band lifecycle from InferenceObjective controller. #2011: We implement the Controller-driven logic first, making the "Soft Delete" the authority. Then this PR becomes the future cleanup mechanism.
  2. Adapt this PR: We can try to implement a leaseCount on PriorityBandState (similar to Flows) that the JIT path increments. This effectively "holds" the band in memory even if the Config is gone.

What are your thoughts? I am happy to hop on a call to whiteboard the locking hierarchy if it helps!

state := val.(*priorityBandState)

// Acquire the exclusive write lock for this specific band, ensuring the state is stable for our check.
state.gcLock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the specific crtiical zone. When we hold state.gcLock here, we are preventing other GC cycles, but we are not blocking the JIT path (prepareNewFlow).

The prepareNewFlow method acquires fr.mu.RLock(), checks fr.config.PriorityBands, and returns. It doesn't touch gcLock.

This means we could successfully execute fr.deletePriorityBand(priority) immediately after prepareNewFlow has returned true, but before that request reaches shard.ManagedQueue(). The request would arrive at the shard to find the queue destroyed.

To fix this natively in this PR, prepareNewFlow would need to increment a leaseCount on this priorityBandState before releasing the global config lock.


fr.perPriorityBandStats.LoadOrStore(priority, &bandStats{})

fr.priorityBandStates.Store(priority, &priorityBandState{
Copy link
Contributor

@LukeAVanDrie LukeAVanDrie Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to persist with the JIT model for now, we might need to introduce a leaseCount here, identical to how flowState works.

The flow is:

  1. JIT: Increment Band.leaseCount.
  2. JIT: Dispatch Request.
  3. Request Finish: Decrement Band.leaseCount.

Then, the GC condition (in verifyAndSweep) changes from isBandEmpty() to isBandEmpty() && leaseCount == 0.

This ensures that even if we "Soft Delete" the band from the Config, the physical memory structures remain active as long as a request is currently being set up or processed. This effectively solves the "Zombie Band" issue without needing complex locking hierarchies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes I had the suspect we would need that sort of counter anyway 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants