[WIP] [Flow Control] Garbage Collection for Priority Bands #2097

evacchi · 2026-01-08T14:25:39Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Addresses #2012.

Does this PR introduce a user-facing change?:

NONE

The structure of the PR follows very closely what's done for FlowState, you will recognize many patterns and similar code paths. This is a very early attempt, so it might be too simplistic...

Config: we introduce PriorityBandGCTimeout with a defaultPriorityBandGCTimeout; this defaults to 2 * defaultFlowGCTimeout to give a "grace period" where the bands are retained even if the owning flows have been already collected, so that if a new flow with the same priority is re-created shortly after, they won't have to be reinstantiated from scratch
FlowRegistry: we introduce a priorityBandStates sync.Map to be used similarly to flowStates sync.Map
- keyed with int priorities
- valued with priorityBandState structs; this struct include a becameIdleAt time.Time field
executeGCCycle() now also updates and checks the "becameIdleAt" timestamp for priority bands + marks candidates + deletes them
isBandEmpty(priority) might be simplistic and maybe racy? it checks for len(band.queues) and uses band.len, band.byteSize to account for in-flight, buffered requests. There is no equivalent to the leaseCount. Does this make sense?
verifyAndSweepPriorityBands() is almost verbatim the same as verifyAndSweepFlows()

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

netlify · 2026-01-08T14:25:46Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`ffc09fb`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/695fbe66833255000843acca
😎 Deploy Preview	https://deploy-preview-2097--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-01-08T14:25:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: evacchi
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-01-08T14:25:48Z

Welcome @evacchi!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2026-01-08T14:25:49Z

Hi @evacchi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

LukeAVanDrie · 2026-01-09T00:34:22Z

@evacchi, this is an excellent draft! I agree with you that the flowState pattern (leases + surgical locking) is the right model for managing lifecycle here. However, seeing this implementation laid out has helped me realize a critical circular dependency between the JIT Provisioning path (prepareNewFlow) and this Garbage Collection path (verifyAndSweepPriorityBands).

Problem: "Zombie" Band

Currently, we have two competing authorities for the existence of a band:

JIT Path: Checks Config (Read Lock) → Assumes Band exists → Dispatches to Shard.
GC Path: Checks Idleness (Write Lock) → Removes from Config & Shard.

Because the JIT path does not (and currently cannot) acquire the priorityBandState.gcLock, we have a race condition:

GC verifies a band is empty and pauses.
JIT sees the band exists in Config, validates it, and passes the "check".
GC wakes up and deletes the band from the Shards.
JIT proceeds to use the now-deleted band on the Shard, causing a panic or ErrBandNotFound.

Why We Can't Just Add a Lock

If we try to fix this by making the JIT path acquire band.gcLock.RLock() to prevent deletion, we create a deadlock:

JIT needs: Registry.mu (to find the band) → band.gcLock (to keep it alive).
GC needs: band.gcLock (to verify empty) → Registry.mu (to delete it from config).
Result: circular wait

Proposal: Split Admission from Cleanup

To solve this safely, I believe we need to decouple the admission from memory management (GC). This likely makes Issue #2011 (controller-driven lifecycle) a prerequisite, but with a twist:

Soft Delete (Controller / [Flow Control] Refactor: Drive Priority Band lifecycle from InferenceObjective controller. #2011): The Controller removes the Band from the Registry.Config. New requests are rejected (503) or defaulted. This stops any new admissions.
Hard Delete (Your PR): The GC scans for bands that are (a) Missing from Config AND (b) Empty / Idle. It then cleans up the memory.

This breaks the lock hierarchy cycle because the JIT path never fights the GC. The JIT only cares if it's in the Config. The GC only touches it if it's not in the Config.

Next Steps?

I think we have two options:

Pivot to [Flow Control] Refactor: Drive Priority Band lifecycle from InferenceObjective controller. #2011: We implement the Controller-driven logic first, making the "Soft Delete" the authority. Then this PR becomes the future cleanup mechanism.
Adapt this PR: We can try to implement a leaseCount on PriorityBandState (similar to Flows) that the JIT path increments. This effectively "holds" the band in memory even if the Config is gone.

What are your thoughts? I am happy to hop on a call to whiteboard the locking hierarchy if it helps!

LukeAVanDrie · 2026-01-09T00:36:50Z

pkg/epp/flowcontrol/registry/registry.go

+		state := val.(*priorityBandState)
+
+		// Acquire the exclusive write lock for this specific band, ensuring the state is stable for our check.
+		state.gcLock.Lock()


This is the specific crtiical zone. When we hold state.gcLock here, we are preventing other GC cycles, but we are not blocking the JIT path (prepareNewFlow).

The prepareNewFlow method acquires fr.mu.RLock(), checks fr.config.PriorityBands, and returns. It doesn't touch gcLock.

This means we could successfully execute fr.deletePriorityBand(priority) immediately after prepareNewFlow has returned true, but before that request reaches shard.ManagedQueue(). The request would arrive at the shard to find the queue destroyed.

To fix this natively in this PR, prepareNewFlow would need to increment a leaseCount on this priorityBandState before releasing the global config lock.

LukeAVanDrie · 2026-01-09T00:37:50Z

pkg/epp/flowcontrol/registry/registry.go


 	fr.perPriorityBandStats.LoadOrStore(priority, &bandStats{})

+	fr.priorityBandStates.Store(priority, &priorityBandState{


If we want to persist with the JIT model for now, we might need to introduce a leaseCount here, identical to how flowState works.

The flow is:

JIT: Increment Band.leaseCount.

JIT: Dispatch Request.

Request Finish: Decrement Band.leaseCount.

Then, the GC condition (in verifyAndSweep) changes from isBandEmpty() to isBandEmpty() && leaseCount == 0.

This ensures that even if we "Soft Delete" the band from the Config, the physical memory structures remain active as long as a request is currently being set up or processed. This effectively solves the "Zombie Band" issue without needing complex locking hierarchies.

ah yes I had the suspect we would need that sort of counter anyway 🤔

evacchi added 2 commits January 8, 2026 15:16

[Flow Control] Garbage Collection for Priority Bands

667e258

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

test cases

ffc09fb

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 8, 2026

k8s-ci-robot requested review from elevran and liu-cong January 8, 2026 14:25

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 8, 2026

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 8, 2026

evacchi mentioned this pull request Jan 8, 2026

[Flow Control] Implement Garbage Collection for Priority Bands. #2012

Open

LukeAVanDrie reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] [Flow Control] Garbage Collection for Priority Bands #2097

[WIP] [Flow Control] Garbage Collection for Priority Bands #2097

evacchi commented Jan 8, 2026

Uh oh!

netlify bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jan 8, 2026

Uh oh!

k8s-ci-robot commented Jan 8, 2026

Uh oh!

k8s-ci-robot commented Jan 8, 2026

Uh oh!

LukeAVanDrie commented Jan 9, 2026

Uh oh!

LukeAVanDrie Jan 9, 2026

Uh oh!

LukeAVanDrie Jan 9, 2026 •

edited

Loading

Uh oh!

evacchi Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		fr.perPriorityBandStats.LoadOrStore(priority, &bandStats{})

		fr.priorityBandStates.Store(priority, &priorityBandState{

[WIP] [Flow Control] Garbage Collection for Priority Bands #2097

Are you sure you want to change the base?

[WIP] [Flow Control] Garbage Collection for Priority Bands #2097

Conversation

evacchi commented Jan 8, 2026

Uh oh!

netlify bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Jan 8, 2026

Uh oh!

k8s-ci-robot commented Jan 8, 2026

Uh oh!

k8s-ci-robot commented Jan 8, 2026

Uh oh!

LukeAVanDrie commented Jan 9, 2026

Uh oh!

LukeAVanDrie Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

LukeAVanDrie Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evacchi Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Jan 8, 2026 •

edited

Loading

LukeAVanDrie Jan 9, 2026 •

edited

Loading