Skip to content

Delete assumedWorkloads field from cache#8474

Merged
k8s-ci-robot merged 8 commits intokubernetes-sigs:mainfrom
PBundyra:rm-assumed-workloads
Jan 12, 2026
Merged

Delete assumedWorkloads field from cache#8474
k8s-ci-robot merged 8 commits intokubernetes-sigs:mainfrom
PBundyra:rm-assumed-workloads

Conversation

@PBundyra
Copy link
Copy Markdown
Contributor

@PBundyra PBundyra commented Jan 8, 2026

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

After code analysis and experimenting what happens if we deleted assumedWorkloads I came to conclusion that it is redundant. Kueue doesn't leverage information about Workload being assumed, and it doesn't differ from just adding it to cache. AssumeWorkloads eventually adds it to the cache and the workload is treated as it was just added (not assumed). Hence this PR to simplify the code and maintenance

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 8, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 8, 2026

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 3e3af50
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/6964f0b0488bb40008c1d48e

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 8, 2026
@PBundyra
Copy link
Copy Markdown
Contributor Author

PBundyra commented Jan 9, 2026

cc @mimowo @Singularity23x0

@PBundyra PBundyra force-pushed the rm-assumed-workloads branch from 711fd72 to af3b85f Compare January 9, 2026 13:50
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 9, 2026
@PBundyra PBundyra changed the title [WIP] Remove assumed workloads from cache Delete assumedWorkloads field from cache Jan 9, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 9, 2026
@PBundyra PBundyra force-pushed the rm-assumed-workloads branch from af3b85f to b421657 Compare January 9, 2026 14:00
Comment thread test/integration/singlecluster/scheduler/scheduler_test.go Outdated
@PBundyra PBundyra force-pushed the rm-assumed-workloads branch from 91e395d to a31e1d7 Compare January 12, 2026 09:04
Comment thread pkg/cache/scheduler/cache.go Outdated
@PBundyra PBundyra force-pushed the rm-assumed-workloads branch from a31e1d7 to 5446f3f Compare January 12, 2026 10:08
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 12, 2026
@mimowo
Copy link
Copy Markdown
Contributor

mimowo commented Jan 12, 2026

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_kueue/8474/pull-kueue-test-integration-baseline-main/2010655165719252992
haven't seen tests failing like this before, @PBundyra please check if this could be related to the PR. Let me re-trigger the tests to see if this is a flake or stable failing

@mimowo
Copy link
Copy Markdown
Contributor

mimowo commented Jan 12, 2026

/test pull-kueue-test-integration-baseline-main

@PBundyra
Copy link
Copy Markdown
Contributor Author

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_kueue/8474/pull-kueue-test-integration-baseline-main/2010655165719252992 haven't seen tests failing like this before, @PBundyra please check if this could be related to the PR. Let me re-trigger the tests to see if this is a flake or stable failing

I'm looking into it

@PBundyra PBundyra force-pushed the rm-assumed-workloads branch from 5446f3f to 71d0278 Compare January 12, 2026 11:17
@PBundyra
Copy link
Copy Markdown
Contributor Author

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_kueue/8474/pull-kueue-test-integration-baseline-main/2010655165719252992 haven't seen tests failing like this before, @PBundyra please check if this could be related to the PR. Let me re-trigger the tests to see if this is a flake or stable failing

It seems like the new added test was causing the problem. There I introduced a variable numCalls that was changed by the Patch call in scheduler, and read in the test at the same time which led to data race. Switch to atomic int to prevent race condition

@mimowo
Copy link
Copy Markdown
Contributor

mimowo commented Jan 12, 2026

Ah, I see. We could probably just call ginkgo.Fail from within the fakeSubResourcePatchSpec, but this is more generic, so let's go with that

return false
}

func (c *Cache) AssumeWorkload(log logr.Logger, w *kueue.Workload) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely to see this much code removed and all tests still pass ❤️ . Thank you for investigating 👍

Copy link
Copy Markdown
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
Thank you for the simplification, I couldn't see any problematic cases except for the error handling, but the new integration test shows it works well. Let's go with that,

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: bcd41dd3f2df4f5f6c147b6c0581e292a5ecfd02

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, PBundyra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 12, 2026
@k8s-ci-robot k8s-ci-robot merged commit 907c8ee into kubernetes-sigs:main Jan 12, 2026
30 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.16 milestone Jan 12, 2026
Comment on lines -656 to -658
func (c *Cache) AssumeWorkload(log logr.Logger, w *kueue.Workload) error {
c.Lock()
defer c.Unlock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "assume mechanism" could potentially contribute to avoiding lock-acquiring deadlock between controllers and the scheduler, which can improve kueue-controller-manager performance.

OTOH, AssumeWorkload takes an entire lock, as you can see here. So the current "assume mechanism" is no longer meaningful, I believe.

We can revisit the "assume mechanism" once we observe the cache lock-acquiring error many times.

@PBundyra Thank you 👍

thejoeejoee pushed a commit to thejoeejoee/kueue that referenced this pull request Jan 20, 2026
* Comment assumed workloads in cache

* Modify unit tests so they dont call AssumeWorkload but AddOrUpdate just like scheduler

* Delete assumed workloads field, add an integration test

* Fix verify lint

* Cleanup ForgetWorkload func

* Fix added test

* Fix added test

* Synchronize the numCalls counter in tests
dongjiang1989 pushed a commit to dongjiang1989/kueue that referenced this pull request Feb 28, 2026
* Comment assumed workloads in cache

* Modify unit tests so they dont call AssumeWorkload but AddOrUpdate just like scheduler

* Delete assumed workloads field, add an integration test

* Fix verify lint

* Cleanup ForgetWorkload func

* Fix added test

* Fix added test

* Synchronize the numCalls counter in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants