Skip to content

ci(mlflow): speed up k3s CI jobs#140

Open
adamancini wants to merge 3 commits intomainfrom
feat/mlflow-ci-speedup
Open

ci(mlflow): speed up k3s CI jobs#140
adamancini wants to merge 3 commits intomainfrom
feat/mlflow-ci-speedup

Conversation

@adamancini
Copy link
Member

Summary

  • Use 1-node k3s clusters instead of 3 nodes — single node provisions faster and pulls images once instead of three times; GKE matrix entry keeps 3 nodes
  • Remove skip-preflights and debug inputs from kots-install@v1.17.0 — these are not valid inputs for this action version and were being silently ignored (preflights were running anyway, adding time)
  • Replace the 5-minute service-poll loop in the port-forward step with kubectl wait --for=condition=Available — by the time KOTS install completes, the deployment exists; no need to poll for the Service object

Test plan

  • k3s helm-install-test completes faster than the current ~15m baseline
  • k3s kots-install-test completes without the "Run Application Tests" failure
  • GKE matrix entries are unaffected (still 3 nodes)

- Use 1-node k3s clusters instead of 3-node; GKE keeps 3 nodes
- Remove invalid skip-preflights and debug inputs from kots-install action
- Replace 5-minute service poll loop with kubectl wait --for=condition=Available
Copilot AI review requested due to automatic review settings March 25, 2026 20:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the MLflow CI workflow runtime by reducing k3s cluster size and simplifying the post-KOTS-install readiness logic to avoid unnecessary polling.

Changes:

  • Update the cluster matrix to use 1-node k3s clusters (while keeping GKE at 3 nodes) and pass node count via matrix.cluster.nodes.
  • Remove unsupported skip-preflights and debug inputs from replicatedhq/replicated-actions/kots-install@v1.17.0.
  • Replace the service polling loop with kubectl wait --for=condition=Available and a shorter health-check loop before port-forward-based tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adamancini and others added 2 commits March 25, 2026 16:41
- Poll for deployment existence before kubectl wait (avoids immediate
  failure if KOTS hasn't created the Deployment yet)
- Add || true to kubectl diagnostics in error handler (prevents bash -e
  from swallowing the original error when describe returns non-zero)
- Fail port-forward step explicitly if MLflow never becomes reachable,
  with pod logs and port-forward log printed before exiting non-zero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants