-
Notifications
You must be signed in to change notification settings - Fork 219
[WIP] [Flow Control] Garbage Collection for Priority Bands #2097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: evacchi The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @evacchi! |
|
Hi @evacchi. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@evacchi, this is an excellent draft! I agree with you that the Problem: "Zombie" Band Currently, we have two competing authorities for the existence of a band:
Because the JIT path does not (and currently cannot) acquire the
Why We Can't Just Add a Lock If we try to fix this by making the JIT path acquire
Proposal: Split Admission from Cleanup To solve this safely, I believe we need to decouple the admission from memory management (GC). This likely makes Issue #2011 (controller-driven lifecycle) a prerequisite, but with a twist:
This breaks the lock hierarchy cycle because the JIT path never fights the GC. The JIT only cares if it's in the Next Steps? I think we have two options:
What are your thoughts? I am happy to hop on a call to whiteboard the locking hierarchy if it helps! |
| state := val.(*priorityBandState) | ||
|
|
||
| // Acquire the exclusive write lock for this specific band, ensuring the state is stable for our check. | ||
| state.gcLock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the specific crtiical zone. When we hold state.gcLock here, we are preventing other GC cycles, but we are not blocking the JIT path (prepareNewFlow).
The prepareNewFlow method acquires fr.mu.RLock(), checks fr.config.PriorityBands, and returns. It doesn't touch gcLock.
This means we could successfully execute fr.deletePriorityBand(priority) immediately after prepareNewFlow has returned true, but before that request reaches shard.ManagedQueue(). The request would arrive at the shard to find the queue destroyed.
To fix this natively in this PR, prepareNewFlow would need to increment a leaseCount on this priorityBandState before releasing the global config lock.
|
|
||
| fr.perPriorityBandStats.LoadOrStore(priority, &bandStats{}) | ||
|
|
||
| fr.priorityBandStates.Store(priority, &priorityBandState{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to persist with the JIT model for now, we might need to introduce a leaseCount here, identical to how flowState works.
The flow is:
- JIT: Increment
Band.leaseCount. - JIT: Dispatch Request.
- Request Finish: Decrement
Band.leaseCount.
Then, the GC condition (in verifyAndSweep) changes from isBandEmpty() to isBandEmpty() && leaseCount == 0.
This ensures that even if we "Soft Delete" the band from the Config, the physical memory structures remain active as long as a request is currently being set up or processed. This effectively solves the "Zombie Band" issue without needing complex locking hierarchies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes I had the suspect we would need that sort of counter anyway 🤔
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Addresses #2012.
Does this PR introduce a user-facing change?:
The structure of the PR follows very closely what's done for FlowState, you will recognize many patterns and similar code paths. This is a very early attempt, so it might be too simplistic...
PriorityBandGCTimeoutwith adefaultPriorityBandGCTimeout; this defaults to2 * defaultFlowGCTimeoutto give a "grace period" where the bands are retained even if the owning flows have been already collected, so that if a new flow with the same priority is re-created shortly after, they won't have to be reinstantiated from scratchFlowRegistry: we introduce apriorityBandStates sync.Mapto be used similarly toflowStates sync.MapintprioritiespriorityBandStatestructs; this struct include abecameIdleAt time.TimefieldexecuteGCCycle()now also updates and checks the "becameIdleAt" timestamp for priority bands + marks candidates + deletes themisBandEmpty(priority)might be simplistic and maybe racy? it checks forlen(band.queues)and usesband.len, band.byteSizeto account for in-flight, buffered requests. There is no equivalent to theleaseCount. Does this make sense?verifyAndSweepPriorityBands()is almost verbatim the same asverifyAndSweepFlows()