KEP-897: First-Class MLflow Integration for Experiment Tracking in Kubeflow by mprahl · Pull Request #892 · kubeflow/community

mprahl · 2025-08-01T19:13:04Z

GitHub issue: #897

Instead of building a new experiment tracking backend inside Kubeflow, the KEP proposes that Kubeflow deeply integrate with MLflow as a strong open-source option with an active community. The proposal focuses on making MLflow Kubernetes-native for Kubeflow through donation of the Kubernetes plugins, alignment with Kubeflow Profiles and multi-tenancy, a supported MLflow image and deployment path, and a UI strategy based on either launching out to MLflow or embedding it in the dashboard.

google-oss-prow · 2025-08-01T19:13:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign juliusvonkohout for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andreyvelich · 2025-08-08T18:59:18Z

Thank you for driving this @mprahl! Please can you create a tracking issue under kubeflow/community, so you can get the KEP number ?

It would be also good to also mention the history as I mentioned here: #783

As previously discussed in

andreyvelich

cc WGs to review
@kubeflow/wg-pipeline-leads @kubeflow/wg-data-leads @kubeflow/wg-automl-leads @kubeflow/wg-data-leads @kubeflow/kubeflow-steering-committee @kubeflow/wg-manifests-leads @kubeflow/wg-notebooks-leads

mprahl · 2025-09-05T21:01:36Z

I'm closing this KEP because my team no longer has capacity to take this on. If others want to pursue this, feel free to fork the KEP and I'll be happy to review and advise. 😄

juliusvonkohout · 2025-09-26T13:09:44Z

@mprahl may we keep it open for now? Just to have it tracked.

The stalebot will close it anyway if there is no activity on this topic

andreyvelich · 2025-09-26T14:02:23Z

I agree with @juliusvonkohout!

Maybe we should put out a call for contributors to help us add Experiment Tracking support via MLFlow for Kubeflow sub-projects.
This feels like a really important capability that many of our users are asking for, and moving it forward would have a big impact on usability and Kubeflow adoption.

cc @kubeflow/wg-training-leads @kubeflow/wg-pipeline-leads @kubeflow/kubeflow-steering-committee @kubeflow/wg-manifests-leads @kubeflow/wg-notebooks-leads @kubeflow/wg-data-leads @kubeflow/kubeflow-sdk-team @kubeflow/kubeflow-outreach-committee @jbottum

tarilabs · 2025-09-26T14:17:49Z

Rather than tying it strictly to MlFlow implementation choice, I believe it would be very helpful to add an SPI (strongly inspired to MlFlow Exp/Run to begin with) so that if one day you want to tie other integration in this area you could.

Not to dispute MlFlow king popularity, but in other community discussions other alternatives have also their market-share, so an SPI would allow to prepare the ground for as well additional contributor, to what Andrey just said.

What would be the @kubeflow/kubeflow-steering-committee pov on this?

andreyvelich · 2025-09-26T14:38:23Z

I fully agree - designing an extensible architecture makes sense, since it will let us easily swap between experiment tracking solutions (e.g., MLflow, W&B, or even custom option).
My only question is: in the short to medium term, what approach should we take to deliver the most value to users?

tarilabs · 2025-09-26T14:47:14Z

My only question is: in the short to medium term, what approach should we take to deliver the most value to users?

Very IMHO an SPI that is 1:1 to the MlFlow API (with MlFlow integration as its implementation) in the short term.
I'm aware is very limiting and naive, but at least forces to identify where the boundary for this integration lies. In turn, it should indeed make it easier to "direct" contributors/GSoC students if they want to integrate W&B (found the #892 (comment) ! 😄 ) or other tracking system, next.

rareddy · 2025-09-26T16:38:52Z

Experiment tracking is heavily dependent on Registry and UI to support it for visualizations, and tracking models and versions and metrics. What are thoughts on that when speak out this SPI based integration?

If we say SPI enables them to capture data and lets the users use the native tools they integrated with, for example using MlFlow UI separately? My next question is how do we foresee we bring back the champion model back into Kubeflow Model Registry for deployment or management? or do we need to? For me, this defines the scope of Model registry activities too going forward. Thoughts?

franciscojavierarceo · 2026-04-08T15:35:40Z

I approve this KEP, great work @mprahl

Collaborating with the MLFlow community would be wonderful. 👏

andreyvelich

Thanks a lot for this @mprahl, overall looks awesome.
Kubeflow users wanted to have this capability since 2019 🚀

cc @kubeflow/kubeflow-trainer-team @akshaychitneni @nabuskey @kubeflow/wg-data-leads @kubeflow/kubeflow-sdk-team @kubeflow/kubeflow-kale-team @kubeflow/wg-notebooks-leads appreciate your review too!

andreyvelich · 2026-04-08T16:54:38Z

+
+### Donated Kubernetes Plugins
+
+Kubeflow should accept donation of


+1 on this.
Shall we discuss maintainers?

Tighten the experiment tracking KEP around shared MLflow conventions, trusted-ingress authorization, and follow-up terminology decisions so reviewers can evaluate one consistent direction. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

DaoDaoNoCode · 2026-04-10T13:25:51Z

@mprahl Hello, our team is interested in contributing to the community and would like to work on this! cc @bobbravo2

mprahl · 2026-04-10T13:38:08Z

@mprahl Hello, our team is interested in contributing to the community and would like to work on this! cc @bobbravo2

Thanks! Specifically, this team is willing to contribute to the UI embedded work. 😄

Clarify namespace mapping, UI scope, and terminology alignment in the experiment tracking KEP while keeping the proposed MLflow deployment and auth model consistent. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

jonburdo · 2026-04-10T21:49:49Z

+## Alternatives
+
+### Expand Model Registry
+
+Kubeflow could continue with the earlier direction of enhancing Model Registry to become the experiment tracking backend
+for the platform.


What is the role of kubeflow model registry in the kubeflow ecosystem, given the direction this KEP? This may warrant a KEP of it's own, but I want to mention it here, and hear and high-level thoughts.

For example, does kf model registry continue to serve model lifecycle needs separately from mlflow's experiment tracking and model registry features? Or does mlflow's own model registry eventually replace kf model registry? Maybe the tooling/infra around kubeflow model registry and kubeflow sdk serve to integrate various components, extending mlflow's capability and usefulness within kubeflow.

I think coexistence is the most pragmatic near-term path.

My current view is that MLflow can cover the experiment tracking experience and optionally the model registry experience if admin wants to, while Kubeflow Model Registry can continue to cover registry-oriented and deployment-oriented capabilities. I'd rather get community feedback on how Kubeflow is deployed with MLflow before making any further decisions unless the Kubeflow Model Registry working group has a strong opinion.

For now, I could see us extracting pieces like the catalog experience and the KServe storage adapter into a more backend-agnostic repo so they can work with either MLflow or Kubeflow Model Registry. That would reduce coupling and let us evaluate, based on real usage, whether those capabilities should remain shared, move more toward MLflow, or stay primarily aligned with Model Registry.

@kubeflow/wg-data-leads do you have thoughts?

extracting pieces like the catalog experience and the KServe storage adapter into a more backend-agnostic repo so they can work with either MLflow or Kubeflow Model Registry

+1 , Model Catalog, MCP Catalog and in general Catalog capabilities are already isolated, so that part in my view is already sort of covered, although it will help the rename currently in action following:

KEP-907: Renaming "Model Registry" to reflect Registry and Catalog use-cases #907 (review)

KEP-0003: Technical implementation strategy for Kubeflow Hub rename hub#2239

What's not yet in-action in my view are:

extracting the capabilities of:

Storage (especially OCI ModelCar)

Signature

into self-sufficient capabilities possibly in KF SDK would be nice,

and making the async-upload job work not only using KF MR as a metadata store, but also MLflow as metadata store (of the Models)

One part that need clarification from the authors of this KEP @mprahl (imho) is once MLflow is integrated as model metadata storage in Kubeflow, how the deployment tracking would work (in MLflow)?

That would allow to assess how to best proceed with the Isvc reconciler capabilities of KF MR; that needs an indication how to remap metadata that are currently mapped to KF MR on top of MLflow

whether those capabilities should remain shared, move more toward MLflow, or stay primarily aligned with Model Registry.

@mprahl with the two MR solutions, wondering if you have any suggestions to pull model metadata from mlflow to show in KF MR for better user experience. If there is plugin we can write may be worth while IMO.

@tarilabs @rareddy Good questions. I think this is out of scope for this KEP. The intent here is to define MLflow as the first-class experiment tracking integration for Kubeflow, including the shared platform contract around tenancy, auth, deployment, UI hand-off, and alignment with MLflow's GenAI direction.

My view is that Kubeflow Model Registry can continue to cover model registry and model deployment capabilities. This KEP does not currently propose deeper metadata or deployment-tracking integration between MLflow and Kubeflow Model Registry. If, after adoption, we see a strong need for that, I think it would be reasonable optional follow-up work for the Kubeflow Data WG to pursue. I'm happy to help and provide guidance there too.

Signed-off-by: mprahl <mprahl@users.noreply.github.com>

andreyvelich

Thanks for the updates @mprahl!
Overall, +1 on this to move forward. Left a few small comments.

andreyvelich · 2026-04-23T21:23:41Z

+Kubeflow and OpenDataHub maintainers should agree on transferring the repository to Kubeflow community ownership, with
+[`mprahl`](https://github.com/mprahl), [`HumairAK`](https://github.com/HumairAK), and any additional volunteers serving
+as the initial maintainer group with clear release responsibilities.


Shall we say that WG Pipelines initially own this code?
We can add this repo to the WG assets: https://github.com/kubeflow/community/blob/master/wgs.yaml#L455-L465

andreyvelich · 2026-04-23T21:33:56Z

+- MLflow experiment: the shared grouping for related work across Kubeflow tools
+- Kubeflow Pipelines pipeline run: one parent MLflow run, with nested MLflow runs for component tasks and loop
+iterations
+- TrainJob or SparkApplication execution: one MLflow run for that execution


@mprahl @kramaranya I am also curious how we can map the MLFlow Experiment concept when TrainJob or OptimizationJob is submitted via KFP?
Not a blocker, we can discuss it later.

andreyvelich · 2026-04-23T21:36:08Z

+A concrete example of that ingress pattern looks like:
+
+```yaml
+apiVersion: security.istio.io/v1beta1


Do we have dependency on Istio in that case? I remember we talked before that we would like to integrate with Gateway API moving forward.

cc @juliusvonkohout @thesuperzapper

andreyvelich · 2026-04-23T21:39:56Z

@@ -0,0 +1,11 @@
+{


Remove this?

juliusvonkohout · 2026-04-24T15:29:10Z

As member of the KSC I vote in general in favor. Technical details and open question are not concerning enough for me to wait with the vote. We will find a way to integrate this at the platform level. I can help with the maintenance as well, since i anyway need to deal with the integration into Kubeflow platform.

andreyvelich

+1 for this, just left a few small comments @mprahl.

chasecadet · 2026-04-24T15:41:04Z

Looks good! Building a deep integration with MLflow for our go-to ML tracking and supporting a Helm chart would benefit the e2e story. I see this as a plugin or integration. Let's call out that Kubeflow (the tools/components/) integrates with MLflow, and that MLflow is not a Kubeflow project. It's a dependency, and we are essentially saying "MLflow won" the OSS registry war here, and we want to provide that functionality to our community. Excited to see this in action and help folks build, deploy, and serve models in a more deterministic manner wherever they see fit. I vote yes!

google-oss-prow Bot requested a review from johnugeorge August 1, 2025 19:13

google-oss-prow Bot requested a review from terrytangyuan August 1, 2025 19:13

google-oss-prow Bot added the size/XL label Aug 1, 2025

mprahl mentioned this pull request Aug 1, 2025

WIP: Propose centralized experiment tracking in Kubeflow mprahl/kubeflow-community#1

Closed

mprahl force-pushed the experiment-tracking branch from 0eefb9d to 1b99df6 Compare August 1, 2025 20:03

tarilabs reviewed Aug 5, 2025

View reviewed changes

Comment thread proposals/892-experiment-tracking/README.md Outdated

andreyvelich reviewed Aug 8, 2025

View reviewed changes

mprahl mentioned this pull request Aug 12, 2025

KEP-897: Centralized experiment tracking store in Kubeflow #897

Open

mprahl force-pushed the experiment-tracking branch from 1b99df6 to 56dd509 Compare August 12, 2025 17:05

mprahl changed the title ~~KEP: Propose centralized experiment tracking in Kubeflow~~ KEP-897: Propose centralized experiment tracking in Kubeflow Aug 12, 2025

mprahl force-pushed the experiment-tracking branch from 56dd509 to 52bd338 Compare August 12, 2025 17:48

mprahl requested a review from andreyvelich August 12, 2025 17:48

MattiaSarti mentioned this pull request Aug 14, 2025

Model Registry Exploration for Charmifying It canonical/bundle-kubeflow#1282

Closed

juliusvonkohout reviewed Aug 15, 2025

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md Outdated

juliusvonkohout reviewed Aug 15, 2025

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md

kramaranya mentioned this pull request Aug 18, 2025

Experiment Tracking for Kubeflow SDK kubeflow/sdk#63

Open

tarilabs mentioned this pull request Aug 27, 2025

give Model two tier naming hierarchy kubeflow/hub#1530

Closed

mprahl closed this Sep 5, 2025

rareddy mentioned this pull request Sep 12, 2025

Add MLflow SDK support for Model Registry as a Tracking Store, fixes #1225 kubeflow/hub#1337

Closed

8 tasks

juliusvonkohout reopened this Sep 26, 2025

google-oss-prow Bot assigned ederign Sep 26, 2025

andreyvelich reviewed Apr 8, 2026

View reviewed changes

juliusvonkohout reviewed Apr 8, 2026

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md

juliusvonkohout reviewed Apr 8, 2026

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md Outdated

juliusvonkohout reviewed Apr 8, 2026

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md Outdated

HumairAK reviewed Apr 9, 2026

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md Outdated

Comment thread proposals/897-experiment-tracking/README.md Outdated

Comment thread proposals/897-experiment-tracking/README.md Outdated

mprahl force-pushed the experiment-tracking branch from 54e5f6b to 83dace6 Compare April 9, 2026 19:06

mprahl requested review from HumairAK, andreyvelich and juliusvonkohout April 9, 2026 19:06

mprahl commented Apr 9, 2026

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md Outdated

andreyvelich reviewed Apr 10, 2026

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md

andreyvelich reviewed Apr 10, 2026

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md Outdated

Comment thread proposals/897-experiment-tracking/README.md

mprahl force-pushed the experiment-tracking branch from e1d6fe2 to 6bf21e8 Compare April 10, 2026 15:18

Refine experiment tracking proposal details.

18cf3e2

Clarify namespace mapping, UI scope, and terminology alignment in the experiment tracking KEP while keeping the proposed MLflow deployment and auth model consistent. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

mprahl force-pushed the experiment-tracking branch from 6bf21e8 to 18cf3e2 Compare April 10, 2026 15:19

mprahl requested review from andreyvelich and kramaranya April 10, 2026 15:19

andreyvelich mentioned this pull request Apr 10, 2026

feat: add kfp-client wrapper proposal kubeflow/sdk#343

Open

1 task

jonburdo reviewed Apr 10, 2026

View reviewed changes

DaoDaoNoCode reviewed Apr 12, 2026

View reviewed changes

Comment thread proposals/897-experiment-tracking/README.md Outdated

Note the limitations around the iframe approach

e8797fe

Signed-off-by: mprahl <mprahl@users.noreply.github.com>

andreyvelich reviewed Apr 23, 2026

View reviewed changes

andreyvelich reviewed Apr 24, 2026

View reviewed changes

lizzzcai mentioned this pull request Apr 27, 2026

Enhancing KServe and MLflow Integration for Improved Usability kserve/kserve#4251

Open


		### Donated Kubernetes Plugins

		Kubeflow should accept donation of

Conversation

mprahl commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-oss-prow Bot commented Aug 1, 2025

Uh oh!

Uh oh!

andreyvelich commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mprahl commented Sep 5, 2025

Uh oh!

juliusvonkohout commented Sep 26, 2025

Uh oh!

andreyvelich commented Sep 26, 2025

Uh oh!

tarilabs commented Sep 26, 2025

Uh oh!

andreyvelich commented Sep 26, 2025

Uh oh!

tarilabs commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rareddy commented Sep 26, 2025

Uh oh!

franciscojavierarceo commented Apr 8, 2026

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaoDaoNoCode commented Apr 10, 2026

Uh oh!

mprahl commented Apr 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreyvelich left a comment

mprahl commented Aug 1, 2025 •

edited

Loading

andreyvelich commented Aug 8, 2025 •

edited

Loading

tarilabs commented Sep 26, 2025 •

edited

Loading

juliusvonkohout commented Apr 24, 2026 •

edited

Loading