Skip to content

WIP: debug rebase in place vpa failure#2672

Open
haircommander wants to merge 2965 commits into
openshift:masterfrom
haircommander:fix-deferred-resize
Open

WIP: debug rebase in place vpa failure#2672
haircommander wants to merge 2965 commits into
openshift:masterfrom
haircommander:fix-deferred-resize

Conversation

@haircommander
Copy link
Copy Markdown
Member

@haircommander haircommander commented May 20, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Summary by CodeRabbit

  • New Features

    • Sharded list/watch support (alpha, behind ShardedListAndWatch) added across multiple APIs.
    • New admission resources: MutatingAdmissionPolicy and MutatingAdmissionPolicyBinding.
    • Scheduling additions: PodGroup and Workload in scheduling.k8s.io v1alpha2.
    • New resource APIs: DeviceTaintRule and ResourcePoolStatusRequest.
  • Updates

    • Go toolchain pinned to 1.26.2.
    • API discovery and OpenAPI docs enhanced for sharding and list semantics.
    • Pull request template updated to include dependency kind; gitignore rules adjusted.
    • CI build image tag updated.
  • Deprecations

    • PodCertificateRequest: pkixPublicKey and proofOfPossession deprecated in favor of stubPKCS10Request.
  • Chores

    • Project ownership/alias lists updated.

dims and others added 30 commits March 21, 2026 15:30
The fast-delete pod status tests currently require the intentionally failing
"fail" container to report exit code 1. In CI, some runtimes occasionally
report exit code 2 with reason=Error even though the tested invariant still
holds: the container failed and the blocked workload container never started.

The latest dims/test-k8s failure on master showed exactly that state: the pod
remained Failed, Initialized=False, the blocked container reported
started=false, and only the failing init container drifted from exit 1 to exit
2. This matches kubernetes/kubernetes issue 135713 and the related
pending-container history in PR 131605.

Accept exit code 2 in this verifier so the test continues to assert the
behavior it is meant to cover instead of a lower-layer exit-code detail.

Fixes issue 135713

Tested:
- hack/verify-gofmt.sh
- hack/verify-test-code.sh
- hack/verify-typecheck.sh ./test/e2e/node/...
- go test ./test/e2e/node -run TestNonExistent -count=1

Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Replace plain bool with sync/atomic.Bool for the useStreaming field
in remoteRuntimeService and remoteImageService to eliminate a data
race when multiple goroutines concurrently read/write the field
during Unimplemented fallback.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…logquery-lock-defualt

[FeatureGate] Promote NodeLogQuery to GA in  v1.36 and lock default to `true`
…-flake

Set WithSerial on HPA tests that conflict api registration
…able-tolerance-e2e-deterministic-cpu-load

fix: [sig-autoscaling] flaky HPAConfigurableTolerance e2e should scale up but should not scale down
…-pod-status-exit-2

test/e2e/node: tolerate exit code 2 in pod status flake
gRPC defaults to the DNS resolver for bare targets passed to
NewClient. For CRI socket endpoints, GetAddressAndDialer returns a
socket path plus a custom dialer, but handing the bare path to
grpc.NewClient still lets gRPC resolve the target first.

That breaks unix socket clients with errors like "name resolver error:
produced zero addresses" before the custom dialer ever sees the raw
path. Use the passthrough resolver for socket-style addresses so the
runtime and image clients hand the original endpoint directly to the
custom dialer.

Add a regression test for unix sockets, Windows named pipes, and TCP
addresses.

Precedent:
https://github.com/etcd-io/etcd/blob/v3.3.27/clientv3/client.go#L266-L270
https://github.com/grpc/grpc-go/blob/v1.72.2/dialoptions.go#L448-L451

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
…nt-unix-socket-dialing

cri-client: use passthrough resolver for socket endpoints
KEP-5729: DRA: ResourceClaim Support for Workloads
…g-fixes

[InPlacePodLevelResourcesVerticalScaling] Plr ippr kubelet bug fixes
[InPlacePodLevelResourcesVerticalScaling] Ippr flaky test
[PodLevelResources] Graduate InPlacePodLevelResourcesVerticalScaling feature to beta
cri-client: use atomic.Bool for useStreaming to fix data race
Fix restartable init container startup race
Co-authored-by: Omar Sayed <omarsayed@google.com>
stbenjam and others added 23 commits May 23, 2026 15:01
After openshift/origin#30786 added ibmcloud to the provider switch in
openshift-tests, the provider name is now correctly passed through to
k8s-tests-ext. However, k8s-tests-ext only registers upstream Kubernetes
providers (aws, azure, gce, kubemark, openstack, vsphere) via the
test/e2e/providers.go import. OpenShift-specific providers like ibmcloud
are not registered, causing framework.AfterReadingAllFlags to call
SetupProviderConfig which fails with "Unknown provider" and Exit(1),
crashing every test process.

This registers all OpenShift-specific cloud providers (baremetal, ovirt,
kubevirt, alibabacloud, nutanix, ibmcloud, external) as NullProviders in
k8s-tests-ext. These providers don't require special setup for upstream
kube e2e tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e2e test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The WatchList test “[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] should be requested by metadatainformer when WatchListClient is enabled” works by fetching an expected (initial) state of secrets, starting an informer, and polling until context timeout for the informer to converge to that expected state. If any other secret in the namespace changes while the test is running, they never converge, and the test times out. This change limits the secrets we’re listing to just the ones relevant to the test.
Signed-off-by: Shaza Aldawamneh <shaza.aldawamneh@hotmail.com>
…e when claims.email is used in username expression

Signed-off-by: Shaza Aldawamneh <shaza.aldawamneh@hotmail.com>
RouteExternalCertificate is now enabled by default. This update is
removing references to this feature gate, hardcoding its value to true.
This update is also a pre-requisite to remove it from
openshift/cluster-ingress-operator.

This feature gate was added in 1448e1c
Signed-off-by: jubittajohn <jujohn@redhat.com>
To be squashed with the following commit later:"UPSTREAM: <carry>: Add OpenShift tooling, images, configs and docs"

Signed-off-by: jubittajohn <jujohn@redhat.com>
…er_manager_linux_test.go

Squash into: UPSTREAM: <carry>: disable load balancing on created cgroups when managed is enabled
…s in flagz_test.go and statusz_test.go

Squash into: UPSTREAM: <carry>: apiserver: add system_client=kube-{apiserver,cm,s} to apiserver_request_total
…acheGC is enabled

Squash into UPSTREAM: <carry>: create termination events
Could squash into UPSTREAM: <carry>: emit event when readyz goes true
Squash into: UPSTREAM: <carry>: add management support to kubelet
kuberc subcommand is not yet registered in oc. Tests will be re-enabled after oc is bumped to 1.36
To be squashed with the commit UPSTREAM: <carry>: Add OpenShift tooling, images, configs and docs before 1.36 rebase bump merges

Signed-off-by: jubittajohn <jujohn@redhat.com>
… driver when not enabled

The upstream csi-hostpath-plugin.yaml manifest now includes a csi-snapshot-metadata sidecar container and volume (added in k/k#130918). Upstream PR k/k#137057 added conditional stripping of these when CapSnapshotMetadata is not enabled, but only for the upstream hostpathCSIDriver. The OpenShift-specific groupSnapshotHostpathCSIDriver was never updated, causing the driver pod to fail with "secret csi-snapshot-metadata-server-certs not found"  and all csi-hostpath-groupsnapshot tests to fail in techpreview jobs.

Signed-off-by: jubittajohn <jujohn@redhat.com>
Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
instead, check whether the pod is allocated, and return that when we return allocated pods

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: jubittajohn <jujohn@redhat.com>
Signed-off-by: jubittajohn <jujohn@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
@haircommander haircommander force-pushed the fix-deferred-resize branch from 1a73e8d to 23e4870 Compare May 27, 2026 16:57
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@haircommander: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@haircommander
Copy link
Copy Markdown
Member Author

/test e2e-aws-ovn-techpreview-serial-1of2
/test k8s-e2e-gcp-serial

1 similar comment
@haircommander
Copy link
Copy Markdown
Member Author

/test e2e-aws-ovn-techpreview-serial-1of2
/test k8s-e2e-gcp-serial

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 28, 2026

@haircommander: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-commits 23e4870 link true /test verify-commits

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@haircommander
Copy link
Copy Markdown
Member Author

/test e2e-aws-ovn-techpreview-serial-1of2
/test k8s-e2e-gcp-serial

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. vendor-update Touching vendor dir or related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.