KEP-5278: Specify to keep NNN in case of error during binding cycle by brejman · Pull Request #5771 · kubernetes/enhancements

brejman · 2026-01-07T14:45:03Z

One-line PR description: Specifying behavior for NNN on binding cycle error

Issue link: Use NominatedNodeName to express the expected pod placement #5278

Other comments:

Right now the implementation intends to clear NNN on binding error: https://github.com/kubernetes/kubernetes/blob/21f7c3ff68b388b565331724d5a46920854431a8/pkg/scheduler/schedule_one.go#L392.

There are two problems with this:

There's a bug that causes it to only clear on the 2nd binding failure in a row
In case PreBind succeeded, the pod may be "locked into" a node, and clearing NNN may let another pod take its place, e.g. due to dynamically provisioned PV (#135771). Side note: even with NNN, a higher priority pod may take its place, so it's not completely robust

The KEP is a bit unclear on the intended behavior in this case. It mentions

scheduler clears the NominatedNodeName field at the end of failed scheduling cycle if it found the nominated node unschedulable for the pod

There's a comment in code that suggests that a pod that fails to bind isn't considered unschedulable: https://github.com/kubernetes/kubernetes/blob/21f7c3ff68b388b565331724d5a46920854431a8/pkg/scheduler/schedule_one.go#L318. I weren't able to find any docs that would mention it though.

I suggest to not clear NNN if there was an error during binding cycle and update the KEP wording to make it apparent. This is based on the assumption that binding cycle failure is either

transient or
consistent for all pods or
a bug in the scheduler code or one of the plugins

Even without NNN the scheduler may keep picking the same node. The main change is that the pod will continue to take space from other lower priority pods, which shouldn't matter if the assumption above holds.

k8s-ci-robot · 2026-01-07T14:45:13Z

Welcome @brejman!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2026-01-07T14:45:13Z

Hi @brejman. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2026-01-07T14:45:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: brejman
Once this PR has been reviewed and has the lgtm label, please assign dom4ha for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

keps/sig-scheduling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

brejman · 2026-01-07T15:00:00Z

/cc @ania-borowiec @sanposhiho

sanposhiho · 2026-01-08T07:21:59Z

There's a bug that causes it to only clear on the 2nd binding failure in a row

Why not just fix it?

In case PreBind succeeded, the pod may be "locked into" a node, and clearing NNN may let another pod take its place, e.g. due to dynamically provisioned PV (#135771). Side note: even with NNN, a higher priority pod may take its place, so it's not completely robust

In that case, a plugin should implement Unreserve to unlock.

brejman · 2026-01-08T10:44:41Z

Why not just fix it?

Fixing that bug would make the other issue more frequent, so we need to agree on how to address that one first

In that case, a plugin should implement Unreserve to unlock.

Right now, the volumebinding plugin only clears the assumed (in-memory) state in Unreserve. We could change the implementation to also unbind the PV from PVC, but I have a couple of concerns:

Before unbinding, we probably would need to make sure that only a single pod is using the PVC to avoid pulling the PV from another pod, which can be susceptible to race conditions
We probably wouldn't want to delete the PV to avoid data loss, so it will continue taking up storage. I suspect the same can already happen with preemption but it's a bit different because in that case the PVC would remain bound (?)

sanposhiho · 2026-01-08T12:06:31Z

Can you make a issue on kubernetes/kubernetes repo side with all your argument? and tag @/kubernetes/sig-scheduling-leads and @/kubernetes/sig-storage-misc there.
Currently KEP-5278 doesn't change the previous behavior, which is, binding cycle's failures could result in scheduling pods on other places in next scheduling cycles. So, I don't think it's proper to discuss this on this KEP and discuss this here in k/enhancement repo.

brejman · 2026-01-08T12:23:17Z

Let's move the discussion to kubernetes/kubernetes#135771

Update NNN KEP to clarify behavior on binding cycle error

d7cdcbd

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 7, 2026

k8s-ci-robot requested a review from dom4ha January 7, 2026 14:45

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jan 7, 2026

k8s-ci-robot requested a review from macsko January 7, 2026 14:45

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Jan 7, 2026

github-project-automation Bot added this to SIG Scheduling Jan 7, 2026

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 7, 2026

k8s-ci-robot requested review from ania-borowiec and sanposhiho January 7, 2026 15:00

brejman mentioned this pull request Jan 8, 2026

NominatedNodeName is cleared on Bind failure leading to unschedulable pods kubernetes/kubernetes#135771

Open

brejman closed this Jan 12, 2026

github-project-automation Bot moved this to Closed in SIG Scheduling Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-5278: Specify to keep NNN in case of error during binding cycle#5771

KEP-5278: Specify to keep NNN in case of error during binding cycle#5771
brejman wants to merge 1 commit intokubernetes:masterfrom
brejman:issue-135771-kep-update

brejman commented Jan 7, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jan 7, 2026

Uh oh!

k8s-ci-robot commented Jan 7, 2026

Uh oh!

k8s-ci-robot commented Jan 7, 2026

Uh oh!

brejman commented Jan 7, 2026

Uh oh!

sanposhiho commented Jan 8, 2026

Uh oh!

brejman commented Jan 8, 2026

Uh oh!

sanposhiho commented Jan 8, 2026

Uh oh!

brejman commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

brejman commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jan 7, 2026

Uh oh!

k8s-ci-robot commented Jan 7, 2026

Uh oh!

k8s-ci-robot commented Jan 7, 2026

Uh oh!

brejman commented Jan 7, 2026

Uh oh!

sanposhiho commented Jan 8, 2026

Uh oh!

brejman commented Jan 8, 2026

Uh oh!

sanposhiho commented Jan 8, 2026

Uh oh!

brejman commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brejman commented Jan 7, 2026 •

edited

Loading