RFC-0008 and RFC-0009: Skill Registry and Harness Integration#10
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds RFC-0005 proposing a new governed, metadata-first Skill Registry for MLflow, including versioning, lifecycle/publish-state governance, typed source pointers (Git/OCI/ZIP), and first-class skill groups.
Changes:
- Introduces the Skill Registry conceptual model (skills, skill versions, tags, aliases) and publish-state lifecycle.
- Defines skill groups with versioned membership snapshots, plus associated tags/aliases.
- Specifies proposed DB schema (12 tables) and API surfaces (REST, Python SDK, CLI, UI).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| published skills, not the process of writing them. | ||
| - **Skill format specification.** The registry is format-agnostic. It | ||
| does not define or enforce what a skill looks like (SKILL.md, plugin | ||
| manifests, etc.). |
There was a problem hiding this comment.
Could we consider MLflow artifact storage as a first-class source type here? That would give us a natural UI upload flow and keeps the door open for more MLflow-native packaging/optimization later, even if Git and OCI stay the main paths.
There was a problem hiding this comment.
Good idea. We've added mlflow as an explicit future source type in the "Source type extensibility" section — it would allow storing skill content directly in MLflow artifact storage, providing a natural UI upload flow. It's deferred from the initial implementation to keep the registry metadata-first, but can be added as a follow-up without breaking changes. The adoption strategy's "Follow-up" section also calls this out.
— This comment was posted by Claude Code under the supervision of Bill Murdock.
| RETIRED = "retired" | ||
|
|
||
|
|
||
| class SkillSourceType(StrEnum): |
There was a problem hiding this comment.
Can we line this lifecycle up more closely with the MCP registry RFC in PR 12 (#12)? Right now this introduces draft, published, deprecated, and retired plus a separate skill-level status, which makes the cross-registry pattern feel less consistent unless there’s a strong skill-specific reason to diverge.
There was a problem hiding this comment.
Fully aligned — replaced the 4-state publish lifecycle (draft/published/deprecated/retired) with RFC-0004's 3-state model using status (active/deprecated/deleted). SkillStatus is now the only lifecycle enum. Parent entity status (Skill.status, SkillGroup.status) is derived from the latest version rather than set independently.
— This comment was posted by Claude Code under the supervision of Bill Murdock.
| does not perform scans. Scanning tools are separate. | ||
| - **Agent harness integration.** How a specific agent harness (Claude | ||
| Code, Codex, Cursor, etc.) installs or loads skills from the registry | ||
| is outside this RFC. The registry provides the metadata; harness |
There was a problem hiding this comment.
Do we want one short paragraph on external skill conventions we expect to interoperate with? Even if the registry stays format-agnostic, it would help to say whether we’re aligning with existing Claude/Cursor/GitHub-style skill packaging or deliberately avoiding standardization for now.
There was a problem hiding this comment.
Done — fully aligned with RFC-0004. Version lifecycle now uses status with three states (active, deprecated, deleted), matching MCPStatus. New versions default to active. Skill.status and SkillGroup.status are now read-only, derived from the latest version's status. Also added latest_version_alias on both Skill and SkillGroup, following RFC-0004's pattern for deterministic "latest" resolution. Store interface switched from AbstractSkillRegistryStore with @abstractmethod to SkillRegistryMixin with raise NotImplementedError, matching MCPServerRegistryMixin.
(Comment from Claude Code under the supervision of Bill Murdock.)
| #### SkillVersion | ||
|
|
||
| A versioned record containing a typed source pointer, publish state, | ||
| and tags. |
There was a problem hiding this comment.
I initially expected aliases to live on SkillVersion, but after checking MLflow model registry, the existing pattern is that aliases are owned by the top-level entity and point to a version. If the goal is consistency with MLflow and PR 12 (#12), I think this design is fine. It may just be worth making that pattern more explicit in the text.
There was a problem hiding this comment.
Aligned — set_skill_alias takes (name, alias, version) and sets the alias on the Skill entity, pointing to a specific version. Same pattern for set_skill_group_alias. This matches the MCP RFC's approach where aliases are owned by the parent entity rather than the version.
— This comment was posted by Claude Code under the supervision of Bill Murdock.
| ) | ||
| ``` | ||
|
|
||
| ## Create a skill group with a versioned membership snapshot |
There was a problem hiding this comment.
What's the justification for skill groups? Why not just using tags on skills
There was a problem hiding this comment.
Added a "Why groups instead of tags?" section to the RFC. The key reasons: (1) groups pin specific member versions for reproducibility, (2) groups have their own lifecycle and aliases independent of member status, (3) groups can reference capabilities across registries (skill registry + MCP registry), (4) groups map naturally to the "plugin" concept in agent harnesses, and (5) tags are flat key-value pairs that can't express ordered, versioned membership.
— This comment was posted by Claude Code under the supervision of Bill Murdock.
There was a problem hiding this comment.
Quick terminology update: "skill groups" have been renamed to "skill bundles" in the latest revision, and the section is now titled "Why bundles instead of tags?" The reasoning in my earlier reply still applies, just with the updated name.
This comment was posted by Bill Murdock with assistance from Claude Code.
| trace data back to a governed skill record to understand adoption | ||
| across an organization. | ||
|
|
||
| ### Use cases |
There was a problem hiding this comment.
Could we add explicit JTBDs per persona (developer, admin etc.)? It would make it easier to evaluate which parts of the data model are required
There was a problem hiding this comment.
Done — replaced the abstract use case bullets with three end-to-end persona flows: a platform administrator (registers, scans, groups, aliases, deprecates), a developer (searches, resolves alias, pulls), and a security engineer (queries scan tags, deprecates failed versions, tracks compliance). These ground the data model in concrete user workflows.
— This comment was posted by Claude Code under the supervision of Bill Murdock.
There was a problem hiding this comment.
The persona flows have been updated in the latest revision to reflect the new entity model. "Skill groups" are now "skill bundles," and the platform administrator scenario now references subagents explicitly (e.g., bundling a code-review skill with a security-auditor subagent and a GitHub MCP server). The three personas and their workflows are otherwise unchanged.
This comment was posted by Bill Murdock with assistance from Claude Code.
Update: Review Feedback Round 1Pushed changes addressing the straightforward review feedback: YAML frontmatter, mermaid ER diagram, field renames ( Still openStructural:
Scope:
Content:
(Comment from Claude Code under the supervision of Bill Murdock.) |
Status update (2026-04-27)Changes since last updateRFC-0005 (Skill Registry):
RFC-0006 (Skill Registry Harness Integration) — new:
Resolved review comments (newly replied)
Still open
(Comment from Claude Code under the supervision of Bill Murdock.) |
|
Also, I've converted this to draft given all the open issues listed above. |
Status update (2026-04-29)Changes since last updateRFC-0005 (Skill Registry):
RFC-0006 (Harness Integration):
Still open (deferred to future pass)
(Comment from Claude Code under the supervision of Bill Murdock.) |
Status update (2026-04-29, part 2)The four items listed as "still open" in the previous update are now all addressed:
All review comments from mprahl, serena-ruan, and Copilot now have replies. I don't have further edits planned at this time. The RFC is ready for another round of review. (Comment from Claude Code under the supervision of Bill Murdock.) |
| rfc_pr: https://github.com/mlflow/rfcs/pull/10 | ||
| --- | ||
|
|
||
| # RFC: Skill Registry |
There was a problem hiding this comment.
@Bill Murdock — Reading through the RFC and the proposed API, one thing struck me: given that the registry already supports kind=[skill, agent, mcp-server, hook], should we consider a more generic name than "skills" / "skill groups"?
The data model is already kind-agnostic — a Skill entity with kind=agent is conceptually an asset that happens to be an agent, not a skill. Calling the registry an Asset Registry (or Capability Registry) and using create_asset() / create_asset_group() instead of create_skill() / create_skill_group() would better reflect what the API actually does and avoid the naming tension where "skill" is both a specific kind and the umbrella term for all kinds.
The REST paths and CLI would follow naturally: mlflow assets create, mlflow asset-groups create-version, etc.
There was a problem hiding this comment.
+1
I'd like to also propose bundles. It satisfies the generic 'kind-agnostic' semantics of assets, while also implying plurality even in the singular term (skill-group and bundle both imply grouping of some asset).
I also agree that having an independent name (assets/bundles/capabilities) greatly improves function discoverability in the SDK
There was a problem hiding this comment.
Alternatively, if none of the names appeal, I'd suggest at least in the CLI dropping skill-groups in favor of skills with a --group flag. Something like:
mlflow skills pull --group pr-workflow --alias production
There was a problem hiding this comment.
+1 to @etirelli here, the umbrella term should not be a skill. Currently in the RFC the skill that is a kind/skill is more consistent with the industry's usage of the term, but the top level usage of the word skill would be very confusing.
My preference is asset registry or capability registry, or something equally broader.
There was a problem hiding this comment.
after further discussion last week, +1 to using the term bundle or skill bundle as per @khaledsulayman 's suggestion above.
There was a problem hiding this comment.
Resolved. Rather than renaming the umbrella term, we split the single Skill(kind=...) entity into three separate types: Skill, Subagent, and Hook. Each has full lifecycle (versions, aliases, tags, status). SkillGroup is renamed to SkillBundle with typed member lists, following the "skill bundle" suggestion from this thread. The registry is still "Skill Registry" and the SDK namespace stays mlflow.genai.skills.*. See commit d16abe0.
Bill Murdock, with assistance from Claude Code
There was a problem hiding this comment.
Adopted. SkillGroup is now SkillBundle throughout both RFCs. See commit d16abe0 for the full restructuring.
This comment was posted by Bill Murdock with assistance from Claude Code.
There was a problem hiding this comment.
Adopted a variant of this: the CLI now uses --bundle instead of --group, following the SkillGroup-to-SkillBundle rename. So mlflow skills pull --bundle pr-workflow --alias production.
This comment was posted by Bill Murdock with assistance from Claude Code.
| Add a Skill Registry to MLflow: a governed, metadata-first registry for | ||
| AI agent capabilities. The registry stores metadata and typed source | ||
| pointers (to Git repos, OCI registries, ZIP archives, etc.) rather | ||
| than artifacts directly. It provides enterprise governance on top of |
There was a problem hiding this comment.
Could we support both approaches? The user would either set pointers to external storage (Git, OCI, ZIP) or store assets directly in MLflow's internal artifact store.
The reason: source pointers are the right default for customers who already manage assets in Git or other systems. But for customers who don't have that infrastructure in place, or who already use MLflow to store models and want their agent capabilities alongside them, or who operate in airgapped or restricted environments where reaching external sources isn't practical — the ability to store assets directly in the registry would be a better fit.
There was a problem hiding this comment.
Agreed. The latest revision adds source_type="mlflow" as a first-class source type in the initial implementation. Content is stored as a directory tree of individual files in MLflow's artifact store, consistent with how model artifacts are stored. Users can choose external source pointers (Git, OCI, ZIP) or direct artifact storage depending on their environment. This covers the airgapped and "capabilities alongside models" use cases you described.
-- This comment was posted by Claude Code under the supervision of Bill Murdock.
|
|
||
| ### Out of scope | ||
|
|
||
| - **Artifact storage.** The registry stores metadata and source |
There was a problem hiding this comment.
given that mlflow already has an "artifact store" to store models and traces, I think the proposal could just embrace it and support it as an option, as per my previous comment.
There was a problem hiding this comment.
Addressed in the same change. See reply to your earlier comment.
-- This comment was posted by Claude Code under the supervision of Bill Murdock.
| permissions. | ||
| - **Alias management requires MANAGE.** Aliases like `production` | ||
| control which version downstream consumers resolve to. Changing an | ||
| alias has the same blast radius as a status transition. |
There was a problem hiding this comment.
Given the impact and blast radius of alias management, should alias changes be auditable?
Currently, last_updated_by / last_updated_timestamp on the parent Skill captures that something changed, but doesn't record alias-specific history (which alias, what it pointed to before, who changed it, when). For enterprise governance scenarios — "who promoted this to production and when?" or "what was production pointing to yesterday before the incident?" — an alias change log would be valuable.
This could be as lightweight as an append-only skill_alias_history table (alias, old_version, new_version, changed_by, timestamp), or even just emitting an event that external audit systems can consume. Worth considering for the initial design, since retrofitting audit onto alias mutations is harder than including it upfront.
There was a problem hiding this comment.
Added. Every alias mutation (set or delete) now appends a record to an append-only history table with the alias name, old version, new version, who changed it, and when. This covers "who promoted this to production?" and "what was production pointing to before the incident?" queries.
The design includes:
SkillAliasHistoryandSkillGroupAliasHistoryentitiesskill_alias_historyandskill_group_alias_historydatabase tables (append-only, no updates or deletes through the API)get_skill_alias_history()andget_skill_group_alias_history()store methods- REST endpoints at
/{name}/aliases/historyand/{name}/aliases/{alias}/history
History is recorded automatically by the store on every alias mutation. No separate API call needed.
-- This comment was posted by Claude Code under the supervision of Bill Murdock.
| cross-registry concern that applies equally to skills, MCP servers, | ||
| and other AI asset registries. It is expected to be solved at the | ||
| platform level across all MLflow registries rather than piecemeal in | ||
| each one. |
There was a problem hiding this comment.
Cross-workspace visibility can be managed at the infrastructure level (e.g., Kubernetes RBAC in OpenShift AI), so deferring that is reasonable. However, the RFC should consider supporting export/import of assets between workspaces — or at minimum, a move/duplicate operation. Use case: a platform team develops and tests assets in a staging workspace or even on a staging instance, then promotes them to a production workspace or instance. Without export/import, they'd need to re-register everything manually, including versions, tags, aliases, and group memberships. Even a simple mlflow skills export --name code-review --output bundle.json / mlflow skills import --input bundle.json --workspace production would cover this.
There was a problem hiding this comment.
Good use case. We'd like to follow the other MLflow registries on this. Neither the model registry nor the MCP registry (RFC-0004) has export/import yet, and we want to stay consistent rather than designing a serialization format in isolation. If export/import lands as a cross-registry capability in MLflow, the skill registry should adopt the same approach.
Added a mention in the adoption strategy's follow-up section. The data model is fully serializable, so this can be built on top of the existing API without schema changes.
-- This comment was posted by Claude Code under the supervision of Bill Murdock.
Status update (2026-04-30)Changes since last updateRFC-0005 (Skill Registry):
RFC-0006 (Harness Integration):
Open discussion
All review comments from mprahl, serena-ruan, Copilot, and etirelli now have replies (except the naming question, which is under active discussion). (Comment from Claude Code under the supervision of Bill Murdock.) |
| class SkillStatus(StrEnum): | ||
| ACTIVE = "active" | ||
| DEPRECATED = "deprecated" | ||
| DELETED = "deleted" |
There was a problem hiding this comment.
We added back a fourth DRAFT status in the mcp registry proposal fyi. If feedback on it is positive, might be useful here too?
bef24e7
There was a problem hiding this comment.
Added. The RFC now uses a 4-state lifecycle (draft, active, deprecated, deleted) aligned with the MCP Registry RFC. New versions default to draft, and activation requires MANAGE permission. See the updated status table and transitions in the latest push.
(This comment is from Claude Code under the supervision of Bill Murdock.)
| schema changes since the column stores a string value. `kind` is | ||
| immutable after creation. | ||
|
|
||
| **MCP servers: two registration paths.** The MCP server registry |
There was a problem hiding this comment.
Do we need both paths? I can see the benefit when embedded as part of an artifact, but wanted to double check if we can provide a cleaner path like a way for an embedded mcp to be automatically registered. Sorry if this was already discussed!
There was a problem hiding this comment.
Good question. The two paths serve different ownership models:
MCP Registry (RFC-0004) is for standalone MCP servers that have their own lifecycle, versioning, and deployment bindings. Skill groups reference these via registry="mcp" in their membership. This is the recommended path for most MCP servers.
kind=mcp-server in this registry is for MCP configs that are embedded in a group-level artifact (e.g., an .mcp.json inside an OCI image containing a complete plugin bundle). These configs exist only as part of their containing artifact and do not have an independent lifecycle.
Auto-registering embedded configs into the MCP registry would create governance confusion: the embedded config's lifecycle is tied to its containing group version, but an MCP registry entry has its own independent lifecycle. If someone deprecates the group version, the auto-registered MCP entry would still appear active in the MCP registry. Keeping them separate avoids that mismatch.
In practice, we expect most MCP servers to use the MCP registry path. The kind=mcp-server path is a narrow escape hatch for bundled artifacts where separating the config from its plugin would be artificial.
(This comment is from Claude Code under the supervision of Bill Murdock.)
There was a problem hiding this comment.
Update: the kind=mcp-server path described in the previous reply has been removed. MCP servers are now handled in two ways, both cleaner than before:
- Standalone MCP servers are registered in the MCP registry (RFC-0004) and referenced via the
mcp_serverslist onSkillBundleVersion. - Embedded MCP configs (e.g.,
.mcp.jsoninside a bundle-level OCI artifact) are just part of the artifact. They do not need separate registry entries or membership references.
See commit d16abe0.
Bill Murdock, with assistance from Claude Code
|
|
||
| ```python | ||
| @dataclass | ||
| class SkillGroup: |
There was a problem hiding this comment.
Could it make sense for skill group to simply be a Skill with kind=group? A skill can already be pointing to any number of things. Could potentially simplify/unify the data model a bit. Just putting it out there as an idea!
There was a problem hiding this comment.
Thanks for the suggestion! We discussed this on Slack and agreed to keep them separate. The main concern is naming clarity: collapsing groups into Skill with kind=group makes the "Skill" name even more confusing, since a skill group already includes not just skills but also sub-agents, hooks, and MCP configurations. People generally don't refer to a group of skills as a single "skill." With a name like "module" the composition would feel natural, but "skill" carries a more atomic connotation.
Keeping them separate also stays consistent with other MLflow registries (the MCP registry has separate server and server group entities) and avoids conditional complexity in the data model (groups have a membership table that individual skills don't need).
(This comment is from Claude Code under the supervision of Bill Murdock.)
There was a problem hiding this comment.
The naming clarity and data model complexity arguments make sense to me, but I'm a little confused about the "server group entities" analogy. What exactly does this refer to in the MCP registry rfc?
There was a problem hiding this comment.
Oh, sorry, that was an error in the comment that I should have caught! I've crossed it out. My main point was the other one. Thank you for catching this!
|
Status update (2026-05-07) Addressed jonburdo's three review comments:
The naming question (etirelli's comment on "skills" vs. a more generic name) remains open pending alignment on the broader question of unified vs. separate registries in MLflow. No further edits planned at this time. The RFC is ready for additional review. (This comment is from Claude Code under the supervision of Bill Murdock.) |
|
Status update (2026-05-12) The MCP Registry RFC (PR #12) has merged. Updated RFC-0005 to align with its final design: Changes made:
Reviewed and kept as-is (ahead of MCP RFC, not incompatible):
Differs by design (not a gap):
(This comment is from Claude Code under the supervision of Bill Murdock.) |
khaledsulayman
left a comment
There was a problem hiding this comment.
A few questions and some naming suggestions, including:
registerinstead ofcreatebundlesinstead ofskill-groups- verb-based skill status lifecycling (
publish,unpublish,deprecate,delete)
| skill = mlflow.genai.skills.create_skill( | ||
| name="code-review", | ||
| description="Reviews pull requests for correctness, style, and security", | ||
| ) | ||
|
|
||
| # Register a version pointing to a Git source | ||
| version = mlflow.genai.skills.create_skill_version( | ||
| name="code-review", | ||
| version="1.0.0", | ||
| source_type="git", | ||
| source="https://github.com/acme/agent-skills/tree/v1.0.0/code-review", | ||
| content_digest="sha256:a3f2b8c...", | ||
| ) | ||
| # version.status == "draft" | ||
|
|
There was a problem hiding this comment.
*assuming the register naming from my above comment, but also works for create:
It's not clear to me why these can't be consolidated into one entrypoint. For an unregistered skill, calling mlflow.genai.skills.register() (or mlflow skills register in the CLI) without specifying a version should suffice as the current create_skill() operation, whereas specifying a version flag fulfills create_skill_version(). For already registered skills, it seems reasonable for mlflow to internally handle idempotency in the case of an existing skill/skill-version.
If I'm missing something as to why the python API should split this across two methods, I'd still push for at least having one register/create verb in the CLI, if possible.
There was a problem hiding this comment.
Addressed in the same change as the register naming comment above. register_skill() is now a single entrypoint that handles both skill creation and version registration with idempotency. The CLI equivalent is mlflow skills register --name X --version Y ....
(This comment is from Claude Code under the supervision of Bill Murdock.)
| mlflow skills pull-group --name pr-workflow --alias production \ | ||
| --destination ./plugins/pr-workflow | ||
| ``` |
There was a problem hiding this comment.
Could we unify pull and pull-group under a single pull command?
The harness RFC already does this for install, so it makes sense for pull to follow the same convention:
mlflow skills pull --name code-review --alias production --destination ./skills/
mlflow skills pull --group pr-workflow --alias production --destination ./plugins/
There was a problem hiding this comment.
Alternatively, if the skill-group naming resolves to assets or bundles:
mlflow bundles pull --name pr-workflow --alias production --destination ./plugins/
There was a problem hiding this comment.
Done. pull now uses --name vs --group flags, matching how RFC-0006's install already works.
(This comment is from Claude Code under the supervision of Bill Murdock.)
There was a problem hiding this comment.
I adopted your --group flag suggestion from the naming thread. The CLI now uses mlflow skills pull --group pr-workflow --alias production rather than a separate skill-groups command. This addresses the CLI ergonomics regardless of how the top-level naming question resolves: if the namespace stays skills, the command works as-is; if it changes to bundles or assets, the --group flag pattern carries over directly.
(This comment is from Claude Code under the supervision of Bill Murdock.)
There was a problem hiding this comment.
Terminology update: --group is now --bundle after the entity model restructuring. The CLI is mlflow skills pull --bundle pr-workflow --alias production. Same pattern you suggested, just with the updated name.
This comment was posted by Bill Murdock with assistance from Claude Code.
There was a problem hiding this comment.
Resolved: the entity is now called SkillBundle, and the CLI uses mlflow skills pull --bundle pr-workflow --alias production. Thanks for pushing on this naming question.
This comment was posted by Bill Murdock with assistance from Claude Code.
| **MLflow artifact storage (`source_type="mlflow"`).** In addition to | ||
| external source pointers, the registry supports storing skill content | ||
| directly in MLflow's artifact storage. This serves users who do not | ||
| have external Git/OCI infrastructure, who want agent capabilities | ||
| stored alongside their models, or who operate in airgapped | ||
| environments where external sources are not reachable. | ||
|
|
||
| Content is stored as a directory tree of individual files under an | ||
| artifact path, consistent with how MLflow stores model artifacts. For | ||
| example, a skill with a SKILL.md, scripts, and reference material is | ||
| stored as separate artifacts under a version-specific prefix: | ||
|
|
||
| ``` | ||
| skills/code-review/1.0.0/ | ||
| SKILL.md | ||
| scripts/analyze.sh | ||
| scripts/lint-config.json | ||
| reference/style-guide.md | ||
| ``` | ||
|
|
||
| The `source` field contains the MLflow artifact URI (e.g., | ||
| `mlflow-artifacts:/skills/code-review/1.0.0/`). Pull downloads the | ||
| directory tree from the artifact store. The MLflow UI can browse | ||
| individual files within a stored skill version. | ||
|
|
||
| The upload API accepts a local directory path and stores each file as | ||
| a separate artifact. The `content_digest` is computed over the full | ||
| directory contents at upload time. |
There was a problem hiding this comment.
I'm a bit unclear on the actual upload path here. Initially, I assumed this would be done via create_skill()/create_skill_version() by passing (..., source_type=mlflow, source=filepath), but in keeping with the semantics of git and oci, source should probably be reserved for the URI in MLflow's artifact storage.
Do we need to add an explicit skills registry API around uploads, or do we delegate it to users manually calling mlflow.log_artifacts() and maybe document that path?
There was a problem hiding this comment.
The register_skill() convenience function now has a content_path parameter. When provided, it uploads the local directory to MLflow artifact storage and sets source_type and source automatically. No manual log_artifacts() call needed.
At the store layer, create_skill_version() also accepts content_path. The server stores each file as a separate artifact under a version-specific prefix (e.g., skills/code-review/1.0.0/SKILL.md) and computes the content_digest at upload time. See the "MLflow artifact storage" subsection in the data model for details.
This comment was posted by Bill Murdock with assistance from Claude Code.
| **MCP servers: two registration paths.** The MCP server registry | ||
| (RFC-0004) is the default and recommended path for registering MCP | ||
| servers. It provides deployment tracking via hosted bindings, | ||
| deduplication across skill groups, and the full MCP governance model. | ||
| Skill groups reference MCP registry entries via `registry="mcp"` in | ||
| their membership. | ||
|
|
||
| `kind=mcp-server` in this registry is reserved for MCP configs that | ||
| are embedded in a group-level artifact (e.g., an OCI image containing | ||
| a complete plugin with an `.mcp.json` file). These are not | ||
| independently managed and exist only as part of their containing | ||
| artifact. Standalone MCP servers should always be registered in the | ||
| MCP registry, not as skills. |
There was a problem hiding this comment.
Apologies if it's already implied but should we explicitly add a guard against registering a standalone mcp server via mlflow skills <create/register> --name mcp-name --kind mcp-server --source-type git --source https://...? Such attempts should probably fail and redirect the user to mlflow mcp-servers register
There was a problem hiding this comment.
Now moot. The kind field has been removed entirely. Skill, Subagent, and Hook are separate entity types with separate registration functions (register_skill(), register_subagent(), register_hook()). MCP servers are only referenced via cross-registry pointers in SkillBundleVersion.mcp_servers, not registered through the skill registry. See commit d16abe0.
Bill Murdock, with assistance from Claude Code
| skill = mlflow.genai.skills.create_skill( | ||
| name="code-review", | ||
| description="Reviews pull requests for correctness, style, and security", | ||
| ) |
There was a problem hiding this comment.
nit: to me, create reads as authoring. Could we go with register instead? This also mimics RFC-0004's convention.
There was a problem hiding this comment.
Added. The RFC now provides register_skill() as a convenience function that auto-creates the parent Skill entity if needed and registers a version in one call, matching the MCP RFC's register_mcp_server() pattern. The store layer retains create_skill() / create_skill_version() for fine-grained operations, consistent with how the MCP RFC structures its store interface vs. SDK convenience layer.
(This comment is from Claude Code under the supervision of Bill Murdock.)
| rfc_pr: https://github.com/mlflow/rfcs/pull/10 | ||
| --- | ||
|
|
||
| # RFC: Skill Registry |
There was a problem hiding this comment.
Alternatively, if none of the names appeal, I'd suggest at least in the CLI dropping skill-groups in favor of skills with a --group flag. Something like:
mlflow skills pull --group pr-workflow --alias production
| status lifecycle, security scan tracking, and federated discovery. | ||
| The two approaches are complementary. | ||
|
|
||
| # Adoption strategy |
There was a problem hiding this comment.
Is it worth adding a note about shared base extraction as per the MCP RFC? If even just to say we will coordinate with MCP registry implementation.
There was a problem hiding this comment.
Added. The adoption strategy now includes a note about coordinating with the MCP Registry's shared base extraction effort where practical to reduce duplication. See the "Shared base extraction" bullet in the adoption strategy section.
(This comment is from Claude Code under the supervision of Bill Murdock.)
| def get_skill_group_version( | ||
| self, name: str, version: str, | ||
| ) -> SkillGroupVersion: | ||
| raise NotImplementedError | ||
|
|
||
| def get_skill_group_version_by_alias( | ||
| self, name: str, alias: str, | ||
| ) -> SkillGroupVersion: | ||
| raise NotImplementedError | ||
|
|
||
| def get_latest_skill_group_version( | ||
| self, name: str, | ||
| ) -> SkillGroupVersion: | ||
| raise NotImplementedError |
There was a problem hiding this comment.
Same rationale as my reply to your get_skill_version consolidation suggestion: keeping separate methods to match the MCP Registry pattern. The group get methods parallel the skill get methods, which parallel the MCP server get methods.
(This comment is from Claude Code under the supervision of Bill Murdock.)
There was a problem hiding this comment.
Same terminology update: "group get methods" are now "bundle get methods" (get_skill_bundle_version, get_skill_bundle_version_by_alias, etc.).
This comment was posted by Bill Murdock with assistance from Claude Code.
| # Search active skill versions | ||
| mlflow skills search-versions --name code-review \ | ||
| --filter "status = 'active'" | ||
|
|
||
| # Search active groups | ||
| mlflow skill-groups search --filter "status = 'active'" |
There was a problem hiding this comment.
I'd consider a unified list command which will list all skills or skill-groups/assets/bundles unless provided --name or --group, in which case it will list the registered versions of that skill/skill-group/asset/bundle
There was a problem hiding this comment.
I like the ergonomics of a unified list, but I'm keeping separate search_skills() and search_skill_versions() to match the MCP Registry's pattern (search_mcp_servers() / search_mcp_server_versions()). Consistency across MLflow's registries is a priority so users who learn one API can transfer that knowledge.
In the CLI, this maps to mlflow skills search and mlflow skills search-versions. Skill groups have their own parallel commands.
(This comment is from Claude Code under the supervision of Bill Murdock.)
There was a problem hiding this comment.
Terminology update: "skill groups" are now "skill bundles" with their own parallel commands (mlflow skill-bundles search). The rationale for keeping separate search commands is unchanged.
This comment was posted by Bill Murdock with assistance from Claude Code.
| | `POST` | `/ajax-api/3.0/mlflow/skills/{name}/install` | Install a single capability for a harness | | ||
| | `POST` | `/ajax-api/3.0/mlflow/skill-groups/{name}/install` | Install a skill group for a harness | |
There was a problem hiding this comment.
Do we need REST endpoints for install? Would this not be a client-side operation?
There was a problem hiding this comment.
Agreed. Install is now a client-side operation: the SDK resolves from the registry, pulls content, and writes harness-specific manifests locally. The only server-side endpoint is the marketplace.json GET, which harnesses query to discover available plugins.
(This comment is from Claude Code under the supervision of Bill Murdock.)
|
|
||
| - **Skills** (SKILL.md) — reusable agent instructions | ||
| - **Agents** (agent .md) — sub-agent definitions | ||
| - **MCP servers** (JSON config) — tool server integrations |
There was a problem hiding this comment.
Now that the MCP registry RFC has been merged, is this still in scope?
There was a problem hiding this comment.
No. I removed kind=mcp-server from the skill registry entirely in the previous commit, in response to a similar point from Khaled. MCP servers belong exclusively in the MCP Registry (RFC-0004). Skill groups reference MCP registry entries via registry="mcp" in their membership, and harness-specific .mcp.json configs are generated at install time by RFC-0006.
-- Bill Murdock (with assistance from Claude Code)
| Add a Skill Registry to MLflow: a governed, metadata-first registry for | ||
| AI agent capabilities. The registry stores metadata and typed source | ||
| pointers (to Git repos, OCI registries, ZIP archives, etc.) rather | ||
| than artifacts directly. It provides enterprise governance on top of |
There was a problem hiding this comment.
Could you clarify this? When I first read it, it seemed like artifact repository was out of scope but later on I saw that it was.
There was a problem hiding this comment.
Clarified. The summary now says the registry stores metadata and typed source pointers, but can also store content directly via MLflow artifact storage. The primary design is metadata-first. The source_type="mlflow" section in the detailed design explains the artifact storage path.
-- Bill Murdock (with assistance from Claude Code)
|
|
||
| The list view shows skills and skill groups in a card-based or table | ||
| layout, with name, description, latest version, status, and tags. Users | ||
| can filter by status, source type, and search by name or description. A |
There was a problem hiding this comment.
It would be useful to track install counts and surface them in the MLflow UI. That would help enterprises understand which capabilities are actually being adopted, and it would also let end users sort or rank skills / skill groups by popularity.
🤖 Generated with Claude
There was a problem hiding this comment.
Good idea. We added install count tracking and surfacing in the UI to the adoption strategy section, enabling users to sort or rank capabilities by adoption.
-- Bill Murdock (with assistance from Claude Code)
| "version": "1.0.0", | ||
| "workspace": "default" | ||
| }, | ||
| "security-auditor": { |
There was a problem hiding this comment.
** 🤖 AI Review **
security-auditor is introduced earlier as a subagent, but this manifest puts it under "skills". RFC-0008 also says mlflow.skill_context() applies to skills only, not subagents/hooks/bundles. If the trace hooks consume this manifest as written, subagent execution would be mislabeled as a SKILL span. Please either keep only true skills in the manifest or add explicit member types so tracing can skip non-skill entries.
There was a problem hiding this comment.
Fixed. Replaced security-auditor (a subagent) with style-check (a skill) in the manifest example. The manifest is only used for trace instrumentation via skill_context(), which applies to skills only.
(This comment is from Bill Murdock with assistance from Claude Code.)
There was a problem hiding this comment.
Fixed. The manifest now contains only skills, not subagents. The security-auditor subagent was removed from the manifest example. The manifest is specifically for trace instrumentation of skill invocations, so only skills belong there.
Posted by Bill Murdock with assistance from Claude Code.
| "alias": "production", | ||
| "members": [ | ||
| { | ||
| "name": "code-review", |
There was a problem hiding this comment.
** 🤖 AI Review **
The typed registry model gets flattened away here: bundle members record only name/version for non-MCP entries, so replay cannot distinguish whether a member is a skill, subagent, or hook. The same ambiguity also shows up in install(name=...) and RFC-0008's pull(name=...), where the public contract has no type discriminator even though those namespaces are separate. Please add an explicit member_type / entity_type everywhere the API currently collapses typed assets to just name.
There was a problem hiding this comment.
Added a "kind" field to lock file members: "skill", "subagent", "hook", or "mcp_server". This makes type discrimination unambiguous, since the registry has separate namespaces for each type.
(This comment is from Bill Murdock with assistance from Claude Code.)
There was a problem hiding this comment.
The lock file already has a kind field on each member entry (e.g., "kind": "skill", "kind": "subagent", "kind": "mcp_server"). See the lock file format in implementation-details.md lines 210-232. The IntrospectedMember dataclass in the main RFC (line 172) also carries kind. The install(name=...) and pull(name=...) APIs operate on a flat namespace where names are unique across types, so no discriminator is needed there.
Posted by Bill Murdock with assistance from Claude Code.
|
|
||
| - Registry operations (covered in RFC-0008). | ||
| - Extending harness functionality (e.g., adding hook support). | ||
| - Automatic harness detection (follow-up). |
There was a problem hiding this comment.
** 🤖 AI Review **
Later sections of this RFC still rely on auto-detection for import when harness is omitted, including the CLI and UI flows. That means implementers either have to ship a follow-up feature in the initial design or ignore documented workflows. Please either move automatic harness detection into scope or remove the no-harness flows from this RFC.
There was a problem hiding this comment.
Made harness required for import in the initial release. Updated the SDK signature, CLI example, and UI wizard to match. Automatic harness detection remains listed as a follow-up feature.
(This comment is from Bill Murdock with assistance from Claude Code.)
There was a problem hiding this comment.
Addressed. Automatic harness detection is explicitly listed as out of scope (line 135). The --harness parameter is required in all CLI commands (install, install-bundle, introspect, import). The SDK functions also take harness as a required-in-practice parameter (it defaults to "claude-code" but is always explicit in examples).
Posted by Bill Murdock with assistance from Claude Code.
| | `workspace` | `String(63)` | PK | | ||
| | `bundle_name` | `String(256)` | PK, FK to `skill_bundle_versions` | | ||
| | `bundle_version` | `String(256)` | PK, FK to `skill_bundle_versions` | | ||
| | `member_type` | `String(20)` | PK; `skill`, `subagent`, `hook`, or `mcp` | |
There was a problem hiding this comment.
** 🤖 AI Review **
This uses mcp as the bundle member type, but RFC-0009's adapter contract and UI text use mcp_server. That mismatch will leak into adapter mapping, import/export, and bundle serialization because the same concept has two enum values across the design. Please standardize on one spelling everywhere.
There was a problem hiding this comment.
Standardized to mcp_server everywhere. Updated the database schema member_type column and the lock file "kind" values to match the adapter contract and UI text.
(This comment is from Bill Murdock with assistance from Claude Code.)
| 5. **No trace-to-skill linkage.** MLflow already traces agent | ||
| conversations (Claude Code via `mlflow autolog claude`, SDK | ||
| applications via framework autologgers such as | ||
| `mlflow.langchain.autolog()` and `mlflow.anthropic.autolog()`). These traces capture LLM calls, |
There was a problem hiding this comment.
Have you considered if we can enhance autolog to detect skill usage by Claude or other agents and automatically link the trace to a registry entity?
There was a problem hiding this comment.
The RFC already covers this. Added a forward reference at this point in the text to the Trace integration section, which describes the install-time trace manifest (mlflow-skills-manifest.json) and harness hooks that automatically create SKILL spans when a registered skill is invoked.
(This comment is from Bill Murdock with assistance from Claude Code.)
There was a problem hiding this comment.
Yes, this is already designed in RFC-0009. mlflow skills install writes an mlflow-skills-manifest.json that maps installed skill names to {workspace, name, version} registry coordinates. For Claude Code, pre/post tool-use hooks integrate with the mlflow autolog claude trace pipeline to create SKILL spans automatically when a registered skill is invoked. See the "Trace integration" section in RFC-0009 and the "Trace instrumentation details" section in RFC-0009's implementation-details.md for the full design.
(This comment is from Bill Murdock with assistance from Claude Code.)
There was a problem hiding this comment.
Follow-up: the earlier replies on this thread described hook-based instrumentation that turned out to be technically incorrect (hooks run in separate processes with no access to thread-local trace context). The RFC now recommends exactly what you suggested: an in-process autologger that detects skill usage and links it to the active trace. See the "Automatic skill-span instrumentation challenge" section in RFC-0009's implementation-details.md.
Posted by Bill Murdock with assistance from Claude Code.
| 1. Install the bundle for a harness | ||
| ([RFC-0009](../0009-skill-harness-integration/0009-skill-harness-integration.md)): | ||
| ```bash | ||
| mlflow skills install --bundle pr-workflow --alias production \ |
There was a problem hiding this comment.
If mlflow skills install were to either inject MLflow registry identity information in the skills as they get installed or some private file different from the lockfile that maps MLflow registry identities to file paths on the system, we could have MLflow autolog functionality intercept skills usage and then automatically link the trace to it.
I think this would be a feature that could drive adoption.
There was a problem hiding this comment.
This is exactly what the install-time trace manifest does. mlflow skills install writes mlflow-skills-manifest.json mapping installed skill names to their {workspace, name, version} registry coordinates. Harness hooks (e.g., Claude Code's PreToolUse/PostToolUse) use this manifest to automatically create SKILL spans. Added a forward reference at this point in the user journey to make the connection clearer.
(This comment is from Bill Murdock with assistance from Claude Code.)
There was a problem hiding this comment.
See my reply above. The install-time manifest is exactly that private file mapping registry identities to installed skills. Agreed that this could drive adoption.
(This comment is from Bill Murdock with assistance from Claude Code.)
| **Why bundles instead of tags?** Tags could express "these skills | ||
| are related" but cannot provide versioned membership snapshots | ||
| (reproducible point-in-time combinations), cross-registry references | ||
| (MCP servers from RFC-0004), bundle-level source pointers (a single |
There was a problem hiding this comment.
I still think bundles should only be a group of entities and not have a source itself.
What do you think @HumairAK ?
There was a problem hiding this comment.
I see it was discussed here:
#10 (comment)
@jwm4 , how about if we keep the source, let's make it so that a skill bundle cannot have a source and other entities? I feel like it'd be confusing otherwise.
There was a problem hiding this comment.
Agreed. Bundle versions are now mutually exclusive: a monolithic bundle has its own source and no individual member references, while an assembled bundle has member references and no bundle-level source. This eliminates the source mismatch confusion that Humair raised and simplifies pull semantics.
(This comment is from Bill Murdock with assistance from Claude Code.)
There was a problem hiding this comment.
Yes, this was done in the previous commit. Bundle versions are now mutually exclusive: a version is either monolithic (has a source, no member references) or assembled (has member references, no source). The API rejects attempts to set both. See the "SkillBundle" section in RFC-0008 and the "Bundle-level source" section in implementation-details.md.
(This comment is from Bill Murdock with assistance from Claude Code.)
There was a problem hiding this comment.
Follow-up: after thinking more about tracing, I've revised my position on this. Here's what the RFC now says:
- Assembled bundles have member references where each member has its own source. No bundle-level source.
- Monolithic bundles have a bundle-level source and member references where the members do NOT have their own sources (their content is embedded in the bundle artifact).
The rule is: a bundle version cannot have both a bundle-level source and members with member-level sources. I think this addresses your original concern about it being confusing, but let me know if I'm wrong. I feel that not letting monolithic bundles have member references at all was an overreaction on my part. Without registry entries for the embedded skills, there's no way to attribute SKILL spans to individual skills inside a monolithic bundle. The member references are what make per-skill tracing and governance work even when the content is delivered as a single artifact.
Posted by Bill Murdock with assistance from Claude Code.
| #### List view | ||
|
|
||
| The list view shows skills, subagents, hooks, and bundles together | ||
| using a card-based layout consistent with other MLflow pages. Each |
There was a problem hiding this comment.
I think MCP registry is the first to use cards in MLflow.
There was a problem hiding this comment.
Fixed. Updated to say "consistent with the MCP Server Registry (RFC-0004)" instead of "consistent with other MLflow pages."
— Bill Murdock with assistance from Claude Code
| - Status badge with color coding: draft (gray), active (green), | ||
| deprecated (amber) | ||
| - Source type indicator (Git, OCI, ZIP, MLflow) | ||
| - Tag chips |
There was a problem hiding this comment.
Nit: it seemed too busy in MCP registry so it was removed in the prototype.
There was a problem hiding this comment.
Good call. Removed the grid/list toggle line.
— Bill Murdock with assistance from Claude Code
| |---|---| | ||
| | `READ` | Search entities, get versions, resolve aliases, list tags and memberships | | ||
| | `EDIT` | Create entities, create versions, set and delete tags, update description | | ||
| | `MANAGE` | Status transitions (activate, deprecate, delete), set and delete aliases, delete versions, delete entities, manage permissions | |
There was a problem hiding this comment.
This looks a bit stricter than current MLflow model-registry / prompt auth. In OSS/workspace auth today, status transitions (for example TransitionModelVersionStage) and setting/moving aliases are gated by UPDATE / EDIT, while destructive operations use DELETE / MANAGE. Here both status transitions and alias changes require MANAGE.
Is that divergence intentional?
There was a problem hiding this comment.
Good catch. I aligned permissions with the model registry and MCP Server Registry: status transitions and alias setting are now gated by EDIT (can_update), destructive operations (deletes) by MANAGE (can_delete).
— Bill Murdock with assistance from Claude Code
| "Register Skill" button (with a dropdown for subagent, hook, or | ||
| bundle) initiates registration. | ||
|
|
||
| #### Detail view: skills, subagents, hooks |
There was a problem hiding this comment.
Probably out of scope for MVP, but I'm thinking it'd be very nice to render download links to the content where possible.
For example, if it's GitHub, we can generate the raw content URL for it. If it's an MLflow artifact, you can directly view the contents somehow.
There was a problem hiding this comment.
Good idea. I agree this would be a nice enhancement for the detail view. Out of scope for MVP but worth tracking for a follow-up.
(This comment is from Bill Murdock with assistance from Claude Code.)
| @abstractmethod | ||
| def install_skill_bundle( | ||
| self, | ||
| bundle_version: SkillBundleVersion, |
There was a problem hiding this comment.
My concern with this interface is it requires every HarnessAdapter to know how to download from every skill source (e.g. Git, OCI, etc.).
Would it make sense to provide the already downloaded contents by the MLflow client somehow?
There was a problem hiding this comment.
Agreed. I introduced a PulledMember dataclass so the MLflow client handles all source fetching (Git clone, OCI pull, etc.) and passes pre-fetched local paths to adapters. Adapters only need to know about directory layout and manifest generation, not source types.
— Bill Murdock with assistance from Claude Code
| 2. For each discovered skill, subagent, or hook, registers the | ||
| capability in the skill registry if it does not already exist, | ||
| with source pointing to the member's location within the | ||
| artifact. For MCP server configs, import creates cross-registry |
There was a problem hiding this comment.
We need the whole server.json to be able to register an MCP server version, so I think this may not be possible.
There was a problem hiding this comment.
Yes, this was addressed in a new commit earlier today. Import does not try to register MCP servers from embedded configs since they lack the full server.json the MCP registry requires. It only creates cross-registry bundle references to MCP servers that already exist in the MCP registry (RFC-0004). If no matching MCP server exists, the embedded config is preserved in the bundle but not registered. See the import operation step 2 in RFC-0009.
(This comment is from Bill Murdock with assistance from Claude Code.)
| artifact and returns an `IntrospectedBundle` listing each | ||
| discovered member with its type, source path, and any metadata the | ||
| adapter can extract. | ||
| 2. For each discovered skill, subagent, or hook, registers the |
There was a problem hiding this comment.
It seems like a user may want to list what in the bundle gets registered. It may be a better experience if the user directly calls import and then can pass IntrospectedBundle or IntrospectedMember to the create skill/bundle method.
There was a problem hiding this comment.
Done. Added introspect_bundle as a standalone read-only operation (SDK and CLI). import_bundle now accepts an optional members parameter for selective import. The CLI uses --include flags that default to all members.
— Bill Murdock with assistance from Claude Code
There was a problem hiding this comment.
Follow-up: I've removed selective import. Since import creates monolithic bundles (the bundle artifact is the single source of truth), cherry-picking individual members doesn't fit the model. The introspect_bundle operation is still there for previewing what an artifact contains before importing, but import_bundle no longer accepts a members parameter or --include flags. It registers the whole bundle with all discovered members.
Posted by Bill Murdock with assistance from Claude Code.
| (RFC-0004) rather than registering new MCP servers. If no | ||
| matching MCP server exists, the embedded config is preserved in | ||
| the bundle but not registered. | ||
| 3. Creates a skill bundle and bundle version whose members reference |
There was a problem hiding this comment.
Are we assuming this is only supported for MLflow stored skills/bundles and that there is an upload step implied here?
There was a problem hiding this comment.
Clarified. Import requires a remotely accessible source (git, OCI, ZIP, or MLflow artifact URI). For local content, the user uploads to MLflow artifact storage first, then imports using the artifact URI.
— Bill Murdock with assistance from Claude Code
| When an existing registry entry conflicts with an imported element | ||
| (e.g., same name but different source), the import operation reports | ||
| the conflict and skips that element. The caller can resolve conflicts | ||
| by renaming elements or using the `--force` flag to overwrite. |
There was a problem hiding this comment.
We should never be able to overwrite content in a registry. This would defeat the purpose of governance around versioning.
There was a problem hiding this comment.
Agreed. Removed the --force flag. Import now creates a new version if content has changed, or reuses the existing version if it has not. No overwriting.
— Bill Murdock with assistance from Claude Code
| mlflow skills import --source ./my-plugin --harness claude-code | ||
| ``` | ||
|
|
||
| #### UI |
There was a problem hiding this comment.
I think I'd prefer this be client side only and the UI shows import instructions using the MLflow SDK.
I'd be concerned about the MLflow server downloading random URLs that users put in.
There was a problem hiding this comment.
Agreed. Import is now CLI and SDK only, no UI. The server should not fetch arbitrary user-supplied URLs.
— Bill Murdock with assistance from Claude Code
|
|
||
| The MLflow UI provides a multi-step import wizard: | ||
|
|
||
| 1. **Source selection.** The user enters a source (local path or URL) |
There was a problem hiding this comment.
Local path does not seem secure.
There was a problem hiding this comment.
Agreed. Removed local path as a source for import. Introspect still works on local paths (read-only preview), but import requires a remote source so registered members have pullable source pointers.
— Bill Murdock with assistance from Claude Code
| "type": "bundle", | ||
| "name": "pr-workflow", | ||
| "version": "1.0.0", | ||
| "alias": "production", |
There was a problem hiding this comment.
I think the resolved version makes sense unless there is mlflow skills update command to bump up using the alias.
Edit: It looks like this may be the case but it'd be good to be explicit of this purpose.
There was a problem hiding this comment.
Added a clarification: the lock file records resolved versions, not aliases. The --update flag re-resolves the alias to pick up newer versions, then updates the lock file with the new resolved version.
— Bill Murdock with assistance from Claude Code
| ```python | ||
| def install( | ||
| name: str | None = None, | ||
| bundle: str | None = None, |
There was a problem hiding this comment.
Why name and bundle? Should we consider having separate install method for installing a skill vs installing a bundle?
There was a problem hiding this comment.
Good suggestion. Split into install() for a single skill and install_bundle() for bundles, in both the SDK and CLI (mlflow skills install vs mlflow skills install-bundle).
— Bill Murdock with assistance from Claude Code
| "hooks": [ | ||
| { | ||
| "type": "command", | ||
| "command": "mlflow skills trace-start --manifest .mlflow-skills-manifest.json" |
There was a problem hiding this comment.
How would MLflow know the current trace/session the skill was executed in?
I'm thinking another way is to expand MLflow autolog for Claude to monkeypatch the methods used to execute a skill and then MLflow records sets the skill context, then continues the execution.
There was a problem hiding this comment.
Clarified the mechanism: the hooks attach to the active trace that mlflow autolog claude already manages via MLflow's standard thread-local trace context, so no trace ID needs to be passed explicitly.
For the Agent SDK, I adopted your suggestion: when mlflow.autolog(flavor="claude") is active and a manifest is present, the autologger automatically injects the skill tracing hooks. No manual configuration needed. For the CLI path, hooks remain the mechanism since there is no Python method to monkeypatch.
— Bill Murdock with assistance from Claude Code
There was a problem hiding this comment.
Follow-up correction: the claim about hooks attaching to thread-local trace context was wrong. Claude Code hooks run as separate child processes via spawn/exec, so they have no access to the parent process's MLflow thread-local trace context. The RFC now treats automatic skill-span instrumentation as an implementation challenge, with the in-process autologger as the recommended direction. An autologger that observes skill tool invocations in the same process that owns the active trace avoids the cross-process problem entirely.
Posted by Bill Murdock with assistance from Claude Code.
| "hooks": [ | ||
| { | ||
| "type": "command", | ||
| "command": "mlflow skills trace-start --manifest .mlflow-skills-manifest.json" |
There was a problem hiding this comment.
How does MLflow map the used skill to the entry in .mlflow-skills-manifest.json?
There was a problem hiding this comment.
Clarified. Claude Code passes the tool input to hooks via stdin as JSON. When the Skill tool fires, the input includes the skill name (e.g., {"skill": "code-review"}). The hook reads the skill name from this input and looks it up in the manifest.
— Bill Murdock with assistance from Claude Code
There was a problem hiding this comment.
Follow-up: this described hooks reading the skill name from stdin JSON, but that mechanism is no longer in the RFC. Since hooks run in separate processes, they can't create SKILL spans attached to the active trace. The manifest still exists and still maps skill names to registry coordinates, but how the instrumentation reads it is now an implementation challenge. The recommended direction is an in-process autologger that can read the manifest and observe skill invocations directly.
Posted by Bill Murdock with assistance from Claude Code.
| from claude_code_sdk import ClaudeAgentOptions | ||
|
|
||
| async def on_skill_start(input_data, tool_use_id, context): | ||
| skill_name = input_data["tool_input"].get("skill") |
There was a problem hiding this comment.
Can MLflow autolog do this automatically (perhaps opt-in)?
There was a problem hiding this comment.
Yes. Updated the Agent SDK section: when mlflow.autolog(flavor="claude") is active and a mlflow-skills-manifest.json is present, the autologger automatically injects the hooks. For cases where autolog is not used, developers can call mlflow.skill_context() directly.
— Bill Murdock with assistance from Claude Code
There was a problem hiding this comment.
Follow-up: yes, autolog is still the recommended direction, but it would not "inject hooks" as I said. Instead, the autologger would observe skill tool invocations in-process and use the manifest to create SKILL spans directly. The RFC now frames this as the recommended implementation approach rather than describing a specific hook injection mechanism.
Posted by Bill Murdock with assistance from Claude Code.
mprahl
left a comment
There was a problem hiding this comment.
This looks really good! I just left a few more comments.
|
Agreed. I've aligned with the approach from PR #21: version strings now require semantic versioning, I'm planning to work through all the other feedback from your review next. (This comment is from Bill Murdock with assistance from Claude Code.) |
RFC-0008 adds a governed, metadata-first Skill Registry to MLflow for AI agent capabilities (skills, subagents, hooks, skill bundles). RFC-0009 adds harness-specific installation, bundle import, lock files, and trace instrumentation for Claude Code, Codex CLI, Cursor, and other agent harnesses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6bf377d to
b69cf18
Compare
Summary
Two companion RFCs for adding a governed Skill Registry to MLflow:
RFC-0008 (Skill Registry): A metadata-first registry for AI agent capabilities (skills, subagents, hooks). Stores metadata and typed source pointers (Git, OCI, ZIP, MLflow artifacts). Provides lifecycle management, usage analytics via MLflow traces (
mlflow.skill_context()), and federated discovery. Skill bundles group related capabilities into versioned units that map to agent harness plugins, with cross-registry references to MCP servers (RFC-0004).RFC-0009 (Harness Integration): Adds
mlflow skills installto generate harness-specific manifests and place files in the correct directories. Adapters for Claude Code, Codex CLI, and Cursor. Bundle import for onboarding existing plugins into the registry. Trace instrumentation via install-time manifests and harness hooks.Related Issues
This PR is from Bill Murdock with assistance from Claude Code.