Skip to content

plugin authn/authz rfc#19

Open
PatrickKoss wants to merge 2 commits into
mlflow:mainfrom
PatrickKoss:rfc/enterprise-authn-authz
Open

plugin authn/authz rfc#19
PatrickKoss wants to merge 2 commits into
mlflow:mainfrom
PatrickKoss:rfc/enterprise-authn-authz

Conversation

@PatrickKoss

Copy link
Copy Markdown

RFC 0006: Pluggable Authentication and Authorization

Tracking issue: mlflow/mlflow#21240

Summary

Adds a new RFC proposing two small plugin contracts — AuthenticationProvider
and AuthorizationBackend — to replace MLflow's single authorization_function
hook. The split separates who you are from what you may do, and keeps the
load-bearing route → requirement mapping in core so plugins never need to
track MLflow's routing surface.

This is the extension point that RFC 0005 ("Role-Based Access Control for
MLflow OSS") flagged as future work. It builds on 0005's role model and
resolver surface, and the default plugins reproduce today's behavior
byte-for-byte — operators who upgrade and change nothing see no difference.

What's in this PR

  • New file: rfcs/0006-pluggable-auth/0006-pluggable-auth.md (823 lines, one
    commit on top of main).

No code changes, no implementation — this is the design document. Reference
adapters described in the RFC (OIDC, Kubernetes TokenReview /
SubjectAccessReview, OPA, upstream proxy headers) are sketched in enough
detail to validate the interface shape but are not built here.

Why now

The existing surface has three structural problems:

  1. One hook does two jobs. authorization_function returns a
    werkzeug.datastructures.Authorization carrying only a username — too thin
    for bearer tokens, OIDC claims, group membership, or JIT provisioning.
  2. FastAPI silently ignores it. The FastAPI request path refuses any
    non-default function (mlflow/server/auth/__init__.py:4141), so the hook
    is effectively Flask-only.
  3. Route → permission knowledge is fused into ~200 validators across six
    dispatch structures.
    Any external authorization system (Kubernetes SAR,
    OPA, a corporate policy engine) has to rediscover and re-sync that mapping
    every time MLflow adds a route.

Design rule worth calling out

Core retains sole ownership of the route → requirement mapping via a single
authoritative OPERATION_REGISTRY. Plugins only ever see the normalized tuple
(resource_type, resource_id, action, workspace) — never a route, a protobuf
class, or a GraphQL field. A CI guard fails the build if any route ships
without a declared requirement.

Out of scope (intentionally)

  • Changing RFC 0005's role storage or permission levels.
  • New permission semantics beyond READ / USE / EDIT / MANAGE.
  • Multi-tenant data isolation at the storage layer.
  • A built-in policy DSL.

Reviewer guide

Suggested reading order if you're short on time:

  1. Summary + Basic example (lines 15–115) — the operator-facing shape.
  2. Motivation (117–161) — the three structural problems, with file refs.
  3. The three layers (184–211) — the contract boundary in one diagram.
  4. Core keeps owning route → requirement (466–547) — the centerpiece; the
    rest of the design hangs off this.
  5. OPERATION_REGISTRY + CI guard (548–636) — how core stays the source
    of truth as routes evolve.
  6. Drawbacks / Alternatives / Open questions (698–end) — where I'd most
    like pushback.

Open questions I'd like input on

These are spelled out at the bottom of the RFC; flagging them here so they
don't get lost:

  • Whether authn_providers should be an ordered chain or a single provider
    with explicit fallback rules.
  • How fine-grained workspace should be for the Kubernetes SAR adapter
    (namespace? label selector? both?).
  • Whether the CI guard belongs in this RFC or as a follow-up.

Checklist

  • RFC follows 0000-template.md structure
  • start_date set, mlflow_issue linked, rfc_pr left empty per
    template instructions
  • Motivation references concrete code paths in mlflow/server/auth/
  • Builds on (does not contradict) RFC 0005
  • Default behavior is byte-for-byte compatible with today

Signed-off-by: Patrick Koss <pati.koss@gmx.de>
@jwm4

jwm4 commented Jun 8, 2026

Copy link
Copy Markdown

Hi! I've updated #10 to renumber the RFCs 5 and 6 that were in there to RFCs 8 and 9 to avoid conflicts with the now merged RFC 5, this PR, and #13 which proposes an RFC 7. In the future, I'd recommend the following to avoid more numbering conflicts:

  1. Check the open PR list to see which RFC numbers are already in progress.
  2. Put your RFC numbers in the PR title so other people can see what RFC numbers you are using.

Of course that only works if everybody does it, but I think it's worth trying. In my opinion, a better solution would be to stop numbering the RFC's, but presumably that's a broader community discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants