Skip to content

feat(delegate): File-based agent definitions with markdown + YAML frontmatter#2183

Merged
VascoSch92 merged 17 commits intomainfrom
vasco/issue-2049
Feb 25, 2026
Merged

feat(delegate): File-based agent definitions with markdown + YAML frontmatter#2183
VascoSch92 merged 17 commits intomainfrom
vasco/issue-2049

Conversation

@VascoSch92
Copy link
Contributor

@VascoSch92 VascoSch92 commented Feb 23, 2026

Summary

This PR introduces the ability to define and load agents via Markdown files and performs a structural refactor to centralize agent-loading logic. (ref #2049 )

Key Features

  • Markdown Agent Definitions: Added support for loading agent configurations directly from .md files.
  • Loading Hierarchy: Implemented a file-loading priority consistent with the existing Skills hierarchy.

Architectural Refactor
Moved all logic related to agent discovery and instantiation from plugin and tools.delegate into a dedicated subagents module.

Remarks
This PR focuses on the core infrastructure. Several advanced features for Markdown-based agents are currently unsupported and are being tracked in the primary Epic #2054.

docs
OpenHands/docs#358

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4f812b3-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4f812b3-python \
  ghcr.io/openhands/agent-server:4f812b3-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4f812b3-golang-amd64
ghcr.io/openhands/agent-server:4f812b3-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4f812b3-golang-arm64
ghcr.io/openhands/agent-server:4f812b3-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4f812b3-java-amd64
ghcr.io/openhands/agent-server:4f812b3-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4f812b3-java-arm64
ghcr.io/openhands/agent-server:4f812b3-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4f812b3-python-amd64
ghcr.io/openhands/agent-server:4f812b3-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:4f812b3-python-arm64
ghcr.io/openhands/agent-server:4f812b3-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:4f812b3-golang
ghcr.io/openhands/agent-server:4f812b3-java
ghcr.io/openhands/agent-server:4f812b3-python

About Multi-Architecture Support

  • Each variant tag (e.g., 4f812b3-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 4f812b3-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

API breakage checks (Griffe)

Result: Passed

Action log

all-hands-bot

This comment was marked as outdated.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk
   __init__.py21290%75–76
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py3432193%283, 288, 316, 359, 377, 390, 455, 604–605, 608, 754, 762, 764, 775, 777–779, 804, 966, 973–974
openhands-sdk/openhands/sdk/plugin
   plugin.py2051692%408–409, 416–417, 434–435, 438–440, 458–460, 476–477, 495–496
   types.py198796%59, 62, 65, 121, 246, 662, 670
openhands-sdk/openhands/sdk/subagent
   load.py37294%154–155
   registry.py101397%79, 188–189
   schema.py41197%88
openhands-tools/openhands/tools
   __init__.py11281%34–35
openhands-tools/openhands/tools/delegate
   definition.py27774%81–82, 85, 88–89, 98, 101
   impl.py12010115%29, 31–32, 41–42, 46, 52–53, 56–59, 61, 73–74, 78–80, 84–85, 92–94, 103–104, 114–118, 122, 124, 128–130, 135–137, 139, 145, 148, 153, 158, 162, 167–169, 186–187, 194–196, 205, 207–209, 212–217, 219, 226–227, 229–230, 233–236, 238–239, 243–246, 249–251, 256–257, 260–261, 264, 266–270, 272, 275–277, 279–280, 283, 285, 290–292
openhands-tools/openhands/tools/task
   definition.py562555%67, 73–75, 77–79, 81–82, 90, 95–96, 99, 101, 146, 194, 196–198, 200, 204–205, 207–208, 214
   manager.py1167337%64–66, 70–72, 79, 81, 84, 87–88, 92–93, 97, 101–104, 107–109, 111, 135–136, 138–139, 144, 150, 157–158, 163–165, 173, 180, 189–191, 199, 205, 215–216, 218–221, 223, 239–240, 242, 244–245, 247, 251–252, 254–257, 259–267, 269, 271, 275–276, 278
TOTAL19472604868% 

all-hands-bot

This comment was marked as outdated.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean refactoring with solid architecture.

Key strengths:

  • Right data structures: clean separation (schema, load, registry)
  • Solves real problem: multi-source agent registration with clear priority rules
  • No unnecessary complexity: straightforward loading and deduplication logic
  • Well-tested: comprehensive coverage of priority logic, edge cases, and error paths

Architecture is sound:

  • Moving agent definitions to SDK makes sense (tools should use SDK, not define core abstractions)
  • File-based loading with .agents/ priority over .openhands/ is pragmatic
  • Error handling is adequate (try/except with logging in load.py:136-139)

One future improvement: Integration test for LocalConversation._ensure_plugins_loaded() would verify the auto-registration wiring, but unit coverage is thorough enough for merge.

Verdict: ✅ Worth merging

Key insight: The 6-level priority system (programmatic > plugin > project .agents > project .openhands > user .agents > user .openhands) is complex but justified - it solves real composition needs without forcing special cases. The deduplication in _load_agents_from_dirs() handles it cleanly.

@VascoSch92 VascoSch92 marked this pull request as ready for review February 23, 2026 13:56
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean refactoring with solid architecture.

Key strengths:

  • Right data structures: clean separation (schema, load, registry)
  • Solves real problem: multi-source agent registration with clear priority rules
  • No unnecessary complexity: straightforward loading and deduplication logic
  • Well-tested: comprehensive coverage of priority logic, edge cases, and error paths

Architecture is sound:

  • Moving agent definitions to SDK makes sense (tools should use SDK, not define core types)
  • Priority hierarchy is clear and well-documented: programmatic > plugin > file-based (project > user)
  • Thread-safe with minimal locking overhead

Worth merging

@VascoSch92
Copy link
Contributor Author

Note that the docs tests are not passing because of missing documentations for examples/01_standalone_sdk/40_acp_agent_example.py. The docs for this feature should be merged soon.

@enyst

This comment was marked as resolved.

@enyst

This comment was marked as outdated.

@openhands-ai

This comment was marked as outdated.

@openhands-ai

This comment was marked as duplicate.

@enyst
Copy link
Collaborator

enyst commented Feb 23, 2026

(HUMAN: pasting the result here, but please feel free to smack it when it's over the top on nothing really - I feel that might be the case)


Taste Rating: Needs improvement

This PR is trying to solve a real problem (declarative subagents), but it does it with a couple of “global state + first-wins” shortcuts that will absolutely bite you in any long-running process (agent-server), and it breaks existing public imports without a compatibility shim.


[CRITICAL ISSUES] (must fix)

1) You broke userspace: removed public APIs without a compatibility layer

Two concrete breakages:

  • openhands.tools.delegate.register_agent is gone
    Previously exported in openhands-tools/openhands/tools/delegate/__init__.py (and tested). Now the module exports only tool classes (new file shows no register_agent, lines 3–18). Anyone with:

    from openhands.tools.delegate import register_agent

    now gets an ImportError.

  • openhands.sdk.plugin.AgentDefinition is gone
    openhands-sdk/openhands/sdk/plugin/__init__.py no longer re-exports AgentDefinition (current file exports many types, but not that one; see lines ~16–48). That was a documented import surface before.

If you want to refactor internals, fine, but keep the old import paths working (even if they just re-export and warn) or bump versions and formally deprecate. Right now it’s “surprise, your code doesn’t import anymore”.


2) Global registry + register_*_if_absent() destroys your priority model and breaks “reload on new conversation”

You claim a priority order in openhands-sdk/openhands/sdk/subagent/load.py (docstring lines 29–36) and in register_plugin_agents() (registry.py lines 210–236), but the actual implementation is basically:

  • global _agent_factories dict (registry.py line 53)
  • register_plugin_agents() uses register_agent_if_absent() (registry.py lines 225–235)
  • register_file_agents() uses register_agent_if_absent() (registry.py lines 193–206)
  • LocalConversation calls them once per conversation (local_conversation.py lines 374–379)

That means the first thing to register a given name wins forever, across:

  • different workspaces/projects in the same process
  • later conversations
  • later plugin loads
  • edits to .md agent definitions

This directly violates your own stated priority (“plugin agents higher priority than file-based”) in any scenario where:

  1. conversation A starts without plugins → file agent “foo” registered
  2. conversation B starts with plugin providing “foo” → plugin cannot override, because “foo” already exists

It also fails the issue’s “reload on new conversation” requirement for updated definitions: if the .md file changes, a new conversation will still silently keep the old definition because duplicates are skipped.

This is the core design smell: you can’t implement priority and reloading with a single global dict and a no-overwrite registration primitive. The data structure is wrong for the problem.


3) The “default agent” used for delegation changed semantics (likely a regression)

get_agent_factory(None) now resolves to openhands.sdk.subagent.builtins.default.get_default_agent() (registry.py lines 253–256), which builds an Agent via _agent_definition_to_factory() (builtins/default.py lines 24–33).

This default sub-agent is not the same as the old preset default agent (which set condenser, system_prompt_kwargs, etc.). If the intent is “delegation subagents are lightweight”, that’s OK—but it’s a behavioral change that should be explicit and tested, because it will change output quality and context handling.


[IMPROVEMENT OPPORTUNITIES] (should fix)

4) One of the registry tests is structurally wrong (so it doesn’t test what it claims)

tests/sdk/subagent/test_subagent_registry.py::test_register_file_agents_project_priority creates the user agent file here:

  • user_home = tmp_path / "fake_home" / "agents" (line 49)
  • then writes to user_home / ".agents" / "shared-agent.md" (lines 51–55)

…but load_user_agents() searches Path.home() / ".agents/agents" (load.py lines 81–96). That test never places the file under .agents/agents/, so the “user version” likely isn’t loaded at all. The test can pass even if user-loading is broken.

This is exactly how you end up merging something that “has tests” but still doesn’t work.


5) Tool naming / UX mismatch is unresolved

Your schema/tests use tool names like ReadTool, GlobTool, Read, Write (e.g., loader tests lines 30–56; schema tests lines 13–33). Real tool registry names in this repo are typically snake_case derived from class names (e.g., GlobTool.name == "glob" per ToolDefinition naming logic).

So users will write agent files like the Claude docs (Read, Grep, Glob, Bash) and it’ll parse, but may fail at runtime when tool resolution happens. If you want Claude compatibility, you need an alias/normalization layer or very explicit docs.


6) The “example usage” in the new registry module is stale

openhands-sdk/openhands/sdk/subagent/registry.py docstring still says:

from openhands.tools.delegate import register_agent, Skill

(line 5)
That’s now wrong, because you removed that symbol from openhands.tools.delegate. This is basic “don’t ship docs that don’t run”.


7) Loader error handling is “log-and-ignore everything”

_load_agents_from_dir() catches Exception (load.py lines 133–140) and continues. For user config, that’s sometimes fine, but you’ll make debugging miserable. At least consider surfacing which field failed validation (or returning structured errors), especially since this is a brand new feature people will misconfigure.


[TESTING GAPS]

  • You removed plugin-loading tests (tests/sdk/plugin/test_plugin_loading.py deleted) but didn’t replace coverage for “plugin agents get registered and take precedence”.
  • There’s no test that proves your stated precedence order actually holds across multiple conversations in the same process (which is where the global registry bug shows up).
  • There’s no test for “edit agent file → new conversation picks up new content” (currently it won’t).

VERDICT: Needs rework

The feature direction is fine, but the current implementation is built on a global registry that can’t express priority or reloading correctly, and you broke public imports without providing a compatibility path.

Key insight: You’re trying to model layered configuration with precedence using a single “first one wins” global dict. That’s the wrong data structure, and it forces broken semantics (no override, no reload, cross-workspace leakage).

@VascoSch92
Copy link
Contributor Author

Review of the roasted-review

1) You broke userspace: removed public APIs without a compatibility layer
True, but we discussed that it was Ok :-)

2) Global registry + register_*_if_absent() destroys your priority model and breaks “reload on new conversation”
I believe this is not True. If we resume a conversation, it makes sense to use old hooks and plugins and not new one. On the other hands, if we start a new conversation, everything is loaded another time following the hierarcy.

3) The “default agent” used for delegation changed semantics (likely a regression)
This is True. But just because we don't have a field for the condenser now in the Markdown.
There is already an issue to add that in the parsing.

4) One of the registry tests is structurally wrong (so it doesn’t test what it claims)
This is True. Corrected!

5) Tool naming / UX mismatch is unresolved
This is True but a very small nit

6) The “example usage” in the new registry module is stale
This is also True. Corrected!

7) Loader error handling is “log-and-ignore everything”
This is not the case as we have exc_info=True

@VascoSch92

This comment was marked as resolved.

@enyst
Copy link
Collaborator

enyst commented Feb 23, 2026

Do you mean creaiting docs for the file-based agents? Like skills and plugin?

I mean an example script in examples/ directory (and maybe it's worth it to be a directory for that example, so that it has the markdown file separately, easy to understand... maybe)

@enyst
Copy link
Collaborator

enyst commented Feb 23, 2026

1) You broke userspace: removed public APIs without a compatibility layer
True, but we discussed that it was Ok :-)

Can we decide what is public API now, and add it to the SDK's __init__ ?

@VascoSch92
Copy link
Contributor Author

Do you mean creaiting docs for the file-based agents? Like skills and plugin?

I mean an example script in examples/ directory (and maybe it's worth it to be a directory for that example, so that it has the markdown file separately, easy to understand... maybe)

Added :-)

@VascoSch92 VascoSch92 requested a review from enyst February 24, 2026 10:41
@VascoSch92 VascoSch92 enabled auto-merge (squash) February 25, 2026 15:04
Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work on this!

I have a proposal that is… not clear enough, sorry, as I’m reading through it: the way this works is understandable to me, and the code design looks OK. But I feel agents need to read a lot of code to get how it should work. I wonder if we can inform the agent about this and its intentions, somewhere, in a simple and clear way.

Maybe an AGENTS.md. Maybe a page in docs that is meant for the code design behind the code. Kinda prefer both… if that makes sense.

@VascoSch92 VascoSch92 merged commit ab8f5f2 into main Feb 25, 2026
21 of 22 checks passed
@VascoSch92 VascoSch92 deleted the vasco/issue-2049 branch February 25, 2026 15:15
@VascoSch92
Copy link
Contributor Author

Thank you for the work on this!

I have a proposal that is… not clear enough, sorry, as I’m reading through it: the way this works is understandable to me, and the code design looks OK. But I feel agents need to read a lot of code to get how it should work. I wonder if we can inform the agent about this and its intentions, somewhere, in a simple and clear way.

Maybe an AGENTS.md. Maybe a page in docs that is meant for the code design behind the code. Kinda prefer both… if that makes sense.

open an issue and I can do that ;-) just to I don't forget.

@enyst
Copy link
Collaborator

enyst commented Feb 25, 2026

@OpenHands open an issue as said in #2183 (comment)

Look at the PR and understand what I’m talking about, and include in the issue 2-3 proposals how we can document the code design / assumptions / invariants that this PR actually brings.

@openhands-ai
Copy link

openhands-ai bot commented Feb 25, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 25, 2026

No code changes were made and no branches were pushed/updated (this task was only to open the issue and propose documentation approaches).

@openhands-ai openhands-ai bot mentioned this pull request Mar 5, 2026
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants