Skip to content

After a node becomes unready and all it's pod are marked as unready, KWOK fails to update pods to ready #1464

@serathius

Description

@serathius

How to use it?

  • kwok
  • kwokctl --runtime=docker (default runtime)
  • kwokctl --runtime=binary
  • kwokctl --runtime=nerdctl
  • kwokctl --runtime=kind

What happened?

Noticed that with large number of nodes (4k), KWOK sometimes fails to update node lease resulting to node become unready. This is resolved after automatically, but there is a side effect on pods.

When node becomes unready all it's pod are marked unready too. KWOK fails to update the pods to be ready back again, resulting in some pods to be stuck in unready state forever.

Example:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2025-10-09T08:21:21Z"
  generateName: nginx-
  generation: 1
  labels:
    app: nginx
    apps.kubernetes.io/pod-index: "3"
    controller-revision-hash: nginx-d6df65d5b
    statefulset.kubernetes.io/pod-name: nginx-3
  name: nginx-3
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: nginx
    uid: 14bac6df-c3ed-43db-b636-4a8587aa75e3
  resourceVersion: "496007"
  uid: ea786112-54bf-4761-a1da-dce13ab4c265
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: type
            operator: In
            values:
            - kwok
  containers:
  - image: registry.k8s.io/nginx-slim:0.21
    imagePullPolicy: IfNotPresent
    name: nginx
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-jr9h9
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: nginx--3
  nodeName: kwok-node-1
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: kwok.x-k8s.io/node
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-jr9h9
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-10-09T08:22:16Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-10-09T08:27:40Z"
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-10-09T08:22:16Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-10-09T08:22:14Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: registry.k8s.io/nginx-slim:0.21
    imageID: ""
    lastState: {}
    name: nginx
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2025-10-09T08:22:16Z"
  hostIP: 10.0.0.1
  phase: Running
  podIP: 10.0.16.123
  podIPs:
  - ip: 10.0.16.123
  qosClass: BestEffort
  startTime: "2025-10-09T08:22:16Z"

What did you expect to happen?

KWOK should ensure that all pods it's responsible for have ready condition true.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create cluster with KWOK (in my case KIND cluster with KWOK outside of cluster). Ensure all nodes are ready:
    $ kubectl get nodes kwok-node-1
    NAME               STATUS   ROLES   AGE   VERSION
    kind-control-plane Ready      control-plane   45m   v1.34.1-dirty
    kwok-node-1        Ready    agent   42m   fake
    
  2. Schedule pods (in my case via statefulsefulset) and wait for them to become ready. Can be confirmed by running kubectl get statefulset. Like:
    $ kubectl get statefulset nginx
    NAME        READY   AGE
    nginx       4/4     31m
    
  3. Stop KWOK for couple for seconds enough for nodes, start it again and notice node flip between NotReady and Ready.
    $ kubectl get nodes 
    NAME                 STATUS     ROLES           AGE   VERSION
    kind-control-plane   Ready      control-plane   45m   v1.34.1-dirty
    kwok-node-1          NotReady   agent           44m   fake
    
  4. After Node becomes ready notice that pods never become ready.
    $ kubectl get statefulset nginx
    NAME        READY   AGE
    nginx       0/4     36m
    

Anything else we need to know?

No response

Kwok version

Details
$ kwok --version
kwok version v0.7.0 go1.24.1 (linux/amd64)

OS version

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions