feat(delegate): File-based agent definitions with markdown + YAML frontmatter by VascoSch92 · Pull Request #2183 · OpenHands/software-agent-sdk

VascoSch92 · 2026-02-23T13:04:24Z

Summary

This PR introduces the ability to define and load agents via Markdown files and performs a structural refactor to centralize agent-loading logic. (ref #2049 )

Key Features

Markdown Agent Definitions: Added support for loading agent configurations directly from .md files.
Loading Hierarchy: Implemented a file-loading priority consistent with the existing Skills hierarchy.

Architectural Refactor
Moved all logic related to agent discovery and instantiation from plugin and tools.delegate into a dedicated subagents module.

Remarks
This PR focuses on the core infrastructure. Several advanced features for Markdown-based agents are currently unsupported and are being tracked in the primary Epic #2054.

docs
OpenHands/docs#358

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4f812b3-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4f812b3-python \
  ghcr.io/openhands/agent-server:4f812b3-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4f812b3-golang-amd64
ghcr.io/openhands/agent-server:4f812b3-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4f812b3-golang-arm64
ghcr.io/openhands/agent-server:4f812b3-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4f812b3-java-amd64
ghcr.io/openhands/agent-server:4f812b3-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4f812b3-java-arm64
ghcr.io/openhands/agent-server:4f812b3-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4f812b3-python-amd64
ghcr.io/openhands/agent-server:4f812b3-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:4f812b3-python-arm64
ghcr.io/openhands/agent-server:4f812b3-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:4f812b3-golang
ghcr.io/openhands/agent-server:4f812b3-java
ghcr.io/openhands/agent-server:4f812b3-python

About Multi-Architecture Support

Each variant tag (e.g., 4f812b3-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 4f812b3-python-amd64) are also available if needed

github-actions · 2026-02-23T13:04:52Z

API breakage checks (Griffe)

Result: Passed

Action log

github-actions · 2026-02-23T13:10:56Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk
__init__.py	21	2	90%	75–76
openhands-sdk/openhands/sdk/conversation/impl
local_conversation.py	343	21	93%	283, 288, 316, 359, 377, 390, 455, 604–605, 608, 754, 762, 764, 775, 777–779, 804, 966, 973–974
openhands-sdk/openhands/sdk/plugin
plugin.py	205	16	92%	408–409, 416–417, 434–435, 438–440, 458–460, 476–477, 495–496
types.py	198	7	96%	59, 62, 65, 121, 246, 662, 670
openhands-sdk/openhands/sdk/subagent
load.py	37	2	94%	154–155
registry.py	101	3	97%	79, 188–189
schema.py	41	1	97%	88
openhands-tools/openhands/tools
__init__.py	11	2	81%	34–35
openhands-tools/openhands/tools/delegate
definition.py	27	7	74%	81–82, 85, 88–89, 98, 101
impl.py	120	101	15%	29, 31–32, 41–42, 46, 52–53, 56–59, 61, 73–74, 78–80, 84–85, 92–94, 103–104, 114–118, 122, 124, 128–130, 135–137, 139, 145, 148, 153, 158, 162, 167–169, 186–187, 194–196, 205, 207–209, 212–217, 219, 226–227, 229–230, 233–236, 238–239, 243–246, 249–251, 256–257, 260–261, 264, 266–270, 272, 275–277, 279–280, 283, 285, 290–292
openhands-tools/openhands/tools/task
definition.py	56	25	55%	67, 73–75, 77–79, 81–82, 90, 95–96, 99, 101, 146, 194, 196–198, 200, 204–205, 207–208, 214
manager.py	116	73	37%	64–66, 70–72, 79, 81, 84, 87–88, 92–93, 97, 101–104, 107–109, 111, 135–136, 138–139, 144, 150, 157–158, 163–165, 173, 180, 189–191, 199, 205, 215–216, 218–221, 223, 239–240, 242, 244–245, 247, 251–252, 254–257, 259–267, 269, 271, 275–276, 278
TOTAL	19472	6048	68%

all-hands-bot

🟢 Good taste - Clean refactoring with solid architecture.

Key strengths:

Right data structures: clean separation (schema, load, registry)
Solves real problem: multi-source agent registration with clear priority rules
No unnecessary complexity: straightforward loading and deduplication logic
Well-tested: comprehensive coverage of priority logic, edge cases, and error paths

Architecture is sound:

Moving agent definitions to SDK makes sense (tools should use SDK, not define core abstractions)
File-based loading with .agents/ priority over .openhands/ is pragmatic
Error handling is adequate (try/except with logging in load.py:136-139)

One future improvement: Integration test for LocalConversation._ensure_plugins_loaded() would verify the auto-registration wiring, but unit coverage is thorough enough for merge.

Verdict: ✅ Worth merging

Key insight: The 6-level priority system (programmatic > plugin > project .agents > project .openhands > user .agents > user .openhands) is complex but justified - it solves real composition needs without forcing special cases. The deduplication in _load_agents_from_dirs() handles it cleanly.

all-hands-bot

🟢 Good taste - Clean refactoring with solid architecture.

Key strengths:

Right data structures: clean separation (schema, load, registry)
Solves real problem: multi-source agent registration with clear priority rules
No unnecessary complexity: straightforward loading and deduplication logic
Well-tested: comprehensive coverage of priority logic, edge cases, and error paths

Architecture is sound:

Moving agent definitions to SDK makes sense (tools should use SDK, not define core types)
Priority hierarchy is clear and well-documented: programmatic > plugin > file-based (project > user)
Thread-safe with minimal locking overhead

✅ Worth merging

VascoSch92 · 2026-02-23T14:06:11Z

Note that the docs tests are not passing because of missing documentations for examples/01_standalone_sdk/40_acp_agent_example.py. The docs for this feature should be merged soon.

openhands-sdk/openhands/sdk/subagent/registry.py

enyst · 2026-02-23T16:08:46Z

(HUMAN: pasting the result here, but please feel free to smack it when it's over the top on nothing really - I feel that might be the case)

Taste Rating: Needs improvement

This PR is trying to solve a real problem (declarative subagents), but it does it with a couple of “global state + first-wins” shortcuts that will absolutely bite you in any long-running process (agent-server), and it breaks existing public imports without a compatibility shim.

[CRITICAL ISSUES] (must fix)

1) You broke userspace: removed public APIs without a compatibility layer

Two concrete breakages:

openhands.tools.delegate.register_agent is gone
Previously exported in openhands-tools/openhands/tools/delegate/__init__.py (and tested). Now the module exports only tool classes (new file shows no register_agent, lines 3–18). Anyone with:
```
from openhands.tools.delegate import register_agent
```
now gets an ImportError.
openhands.sdk.plugin.AgentDefinition is gone
openhands-sdk/openhands/sdk/plugin/__init__.py no longer re-exports AgentDefinition (current file exports many types, but not that one; see lines ~16–48). That was a documented import surface before.

If you want to refactor internals, fine, but keep the old import paths working (even if they just re-export and warn) or bump versions and formally deprecate. Right now it’s “surprise, your code doesn’t import anymore”.

2) **Global registry + `register_*_if_absent()` destroys your priority model and breaks “reload on new conversation”**

You claim a priority order in openhands-sdk/openhands/sdk/subagent/load.py (docstring lines 29–36) and in register_plugin_agents() (registry.py lines 210–236), but the actual implementation is basically:

global _agent_factories dict (registry.py line 53)
register_plugin_agents() uses register_agent_if_absent() (registry.py lines 225–235)
register_file_agents() uses register_agent_if_absent() (registry.py lines 193–206)
LocalConversation calls them once per conversation (local_conversation.py lines 374–379)

That means the first thing to register a given name wins forever, across:

different workspaces/projects in the same process
later conversations
later plugin loads
edits to .md agent definitions

This directly violates your own stated priority (“plugin agents higher priority than file-based”) in any scenario where:

conversation A starts without plugins → file agent “foo” registered
conversation B starts with plugin providing “foo” → plugin cannot override, because “foo” already exists

It also fails the issue’s “reload on new conversation” requirement for updated definitions: if the .md file changes, a new conversation will still silently keep the old definition because duplicates are skipped.

This is the core design smell: you can’t implement priority and reloading with a single global dict and a no-overwrite registration primitive. The data structure is wrong for the problem.

3) The “default agent” used for delegation changed semantics (likely a regression)

get_agent_factory(None) now resolves to openhands.sdk.subagent.builtins.default.get_default_agent() (registry.py lines 253–256), which builds an Agent via _agent_definition_to_factory() (builtins/default.py lines 24–33).

This default sub-agent is not the same as the old preset default agent (which set condenser, system_prompt_kwargs, etc.). If the intent is “delegation subagents are lightweight”, that’s OK—but it’s a behavioral change that should be explicit and tested, because it will change output quality and context handling.

[IMPROVEMENT OPPORTUNITIES] (should fix)

4) One of the registry tests is structurally wrong (so it doesn’t test what it claims)

tests/sdk/subagent/test_subagent_registry.py::test_register_file_agents_project_priority creates the user agent file here:

user_home = tmp_path / "fake_home" / "agents" (line 49)
then writes to user_home / ".agents" / "shared-agent.md" (lines 51–55)

…but load_user_agents() searches Path.home() / ".agents/agents" (load.py lines 81–96). That test never places the file under .agents/agents/, so the “user version” likely isn’t loaded at all. The test can pass even if user-loading is broken.

This is exactly how you end up merging something that “has tests” but still doesn’t work.

5) Tool naming / UX mismatch is unresolved

Your schema/tests use tool names like ReadTool, GlobTool, Read, Write (e.g., loader tests lines 30–56; schema tests lines 13–33). Real tool registry names in this repo are typically snake_case derived from class names (e.g., GlobTool.name == "glob" per ToolDefinition naming logic).

So users will write agent files like the Claude docs (Read, Grep, Glob, Bash) and it’ll parse, but may fail at runtime when tool resolution happens. If you want Claude compatibility, you need an alias/normalization layer or very explicit docs.

6) The “example usage” in the new registry module is stale

openhands-sdk/openhands/sdk/subagent/registry.py docstring still says:

from openhands.tools.delegate import register_agent, Skill

(line 5)
That’s now wrong, because you removed that symbol from openhands.tools.delegate. This is basic “don’t ship docs that don’t run”.

7) Loader error handling is “log-and-ignore everything”

_load_agents_from_dir() catches Exception (load.py lines 133–140) and continues. For user config, that’s sometimes fine, but you’ll make debugging miserable. At least consider surfacing which field failed validation (or returning structured errors), especially since this is a brand new feature people will misconfigure.

[TESTING GAPS]

You removed plugin-loading tests (tests/sdk/plugin/test_plugin_loading.py deleted) but didn’t replace coverage for “plugin agents get registered and take precedence”.
There’s no test that proves your stated precedence order actually holds across multiple conversations in the same process (which is where the global registry bug shows up).
There’s no test for “edit agent file → new conversation picks up new content” (currently it won’t).

VERDICT: Needs rework

The feature direction is fine, but the current implementation is built on a global registry that can’t express priority or reloading correctly, and you broke public imports without providing a compatibility path.

Key insight: You’re trying to model layered configuration with precedence using a single “first one wins” global dict. That’s the wrong data structure, and it forces broken semantics (no override, no reload, cross-workspace leakage).

VascoSch92 · 2026-02-23T17:01:15Z

Review of the roasted-review

1) You broke userspace: removed public APIs without a compatibility layer
True, but we discussed that it was Ok :-)

2) Global registry + register_*_if_absent() destroys your priority model and breaks “reload on new conversation”
I believe this is not True. If we resume a conversation, it makes sense to use old hooks and plugins and not new one. On the other hands, if we start a new conversation, everything is loaded another time following the hierarcy.

3) The “default agent” used for delegation changed semantics (likely a regression)
This is True. But just because we don't have a field for the condenser now in the Markdown.
There is already an issue to add that in the parsing.

4) One of the registry tests is structurally wrong (so it doesn’t test what it claims)
This is True. Corrected!

5) Tool naming / UX mismatch is unresolved
This is True but a very small nit

6) The “example usage” in the new registry module is stale
This is also True. Corrected!

7) Loader error handling is “log-and-ignore everything”
This is not the case as we have exc_info=True

enyst · 2026-02-23T18:36:07Z

Do you mean creaiting docs for the file-based agents? Like skills and plugin?

I mean an example script in examples/ directory (and maybe it's worth it to be a directory for that example, so that it has the markdown file separately, easy to understand... maybe)

enyst · 2026-02-23T18:36:50Z

1) You broke userspace: removed public APIs without a compatibility layer
True, but we discussed that it was Ok :-)

Can we decide what is public API now, and add it to the SDK's __init__ ?

VascoSch92 · 2026-02-24T10:38:31Z

Do you mean creaiting docs for the file-based agents? Like skills and plugin?

I mean an example script in examples/ directory (and maybe it's worth it to be a directory for that example, so that it has the markdown file separately, easy to understand... maybe)

Added :-)

openhands-sdk/openhands/sdk/subagent/load.py

openhands-sdk/openhands/sdk/subagent/registry.py

openhands-sdk/openhands/sdk/subagent/builtins/default.py

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py

enyst

Thank you for the work on this!

I have a proposal that is… not clear enough, sorry, as I’m reading through it: the way this works is understandable to me, and the code design looks OK. But I feel agents need to read a lot of code to get how it should work. I wonder if we can inform the agent about this and its intentions, somewhere, in a simple and clear way.

Maybe an AGENTS.md. Maybe a page in docs that is meant for the code design behind the code. Kinda prefer both… if that makes sense.

VascoSch92 · 2026-02-25T15:16:48Z

Thank you for the work on this!

I have a proposal that is… not clear enough, sorry, as I’m reading through it: the way this works is understandable to me, and the code design looks OK. But I feel agents need to read a lot of code to get how it should work. I wonder if we can inform the agent about this and its intentions, somewhere, in a simple and clear way.

Maybe an AGENTS.md. Maybe a page in docs that is meant for the code design behind the code. Kinda prefer both… if that makes sense.

open an issue and I can do that ;-) just to I don't forget.

enyst · 2026-02-25T15:27:30Z

@OpenHands open an issue as said in #2183 (comment)

Look at the PR and understand what I’m talking about, and include in the issue 2-3 proposals how we can document the code design / assumptions / invariants that this PR actually brings.

openhands-ai · 2026-02-25T15:28:09Z

I'm on it! enyst can track my progress at all-hands.dev

openhands-ai · 2026-02-25T15:30:14Z

I inspected PR feat(delegate): File-based agent definitions with markdown + YAML frontmatter #2183 (“feat(delegate): File-based agent definitions with markdown + YAML frontmatter”) to understand the new subagent file-loading/registration behavior and the design assumptions it introduces (scan locations + rules, precedence order, schema semantics).
I retrieved and used the referenced PR comment (issuecomment-3960012826) as the motivating context for the documentation request.
I opened a new GitHub issue in OpenHands/software-agent-sdk: Document subagent loader design & invariants introduced by file-based agents (PR #2183) #2200
“Document subagent loader design & invariants introduced by file-based agents (PR feat(delegate): File-based agent definitions with markdown + YAML frontmatter #2183)”
Document subagent loader design & invariants introduced by file-based agents (PR #2183) #2200
In the issue, I documented the key invariants that should be made explicit and included 3 proposals for how to document the design:
1. Add a scoped AGENTS.md next to the subagent implementation for agentic contributors,
2. Add a dedicated “Subagents / delegation design” page in the docs repo,
3. Add an ADR capturing assumptions/rationale and keep precedence rules enforced via tests.

No code changes were made and no branches were pushed/updated (this task was only to open the issue and propose documentation approaches).

VascoSch92 added 4 commits February 18, 2026 16:17

first version MD agents

ba87b4e

update

ce6b225

update

475881c

update after discussion

6d892f7

VascoSch92 requested a review from all-hands-bot February 23, 2026 13:04

This comment was marked as outdated.

Sign in to view

VascoSch92 requested a review from all-hands-bot February 23, 2026 13:14

This comment was marked as outdated.

Sign in to view

all-hands bot feedback

e0fb41b

VascoSch92 force-pushed the vasco/issue-2049 branch from 0d6d277 to e0fb41b Compare February 23, 2026 13:44

VascoSch92 requested a review from all-hands-bot February 23, 2026 13:45

all-hands-bot approved these changes Feb 23, 2026

View reviewed changes

VascoSch92 requested a review from simonrosenberg February 23, 2026 13:56

VascoSch92 marked this pull request as ready for review February 23, 2026 13:56

all-hands-bot approved these changes Feb 23, 2026

View reviewed changes

VascoSch92 mentioned this pull request Feb 23, 2026

feat(delegation): Advanced Features for Markdown-based Agents #2186

Open

13 tasks

enyst reviewed Feb 23, 2026

View reviewed changes

openhands-sdk/openhands/sdk/subagent/registry.py Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

This comment was marked as duplicate.

Sign in to view

roasted-review feedback

455c8ff

This comment was marked as resolved.

Sign in to view

VascoSch92 added 3 commits February 24, 2026 10:37

docstrings

0754db0

add example

ea1863e

change number of example

36fdf3e

VascoSch92 requested a review from enyst February 24, 2026 10:41

enyst reviewed Feb 24, 2026

View reviewed changes

openhands-sdk/openhands/sdk/subagent/load.py Outdated Show resolved Hide resolved

Apply suggestion from @enyst

d38a9e5

enyst reviewed Feb 24, 2026

View reviewed changes

openhands-sdk/openhands/sdk/subagent/registry.py Outdated Show resolved Hide resolved

enyst reviewed Feb 24, 2026

View reviewed changes

openhands-sdk/openhands/sdk/subagent/builtins/default.py Outdated Show resolved Hide resolved

enyst mentioned this pull request Feb 25, 2026

[PRD] Support multiple agent functions in .openhands/ directory #1743

Open

VascoSch92 requested a review from enyst February 25, 2026 08:26

VascoSch92 and others added 2 commits February 25, 2026 09:26

comments and feedback

b1495ad

Merge branch 'main' into vasco/issue-2049

6fe32d4

simonrosenberg reviewed Feb 25, 2026

View reviewed changes

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Show resolved Hide resolved

VascoSch92 added 2 commits February 25, 2026 15:11

fix after rebase and discussion

03e4cda

nit

b33defb

simonrosenberg approved these changes Feb 25, 2026

View reviewed changes

VascoSch92 and others added 2 commits February 25, 2026 15:28

fix tests

1aace23

Merge branch 'main' into vasco/issue-2049

d40971b

VascoSch92 enabled auto-merge (squash) February 25, 2026 15:04

enyst approved these changes Feb 25, 2026

View reviewed changes

fix after rebase

38acb66

VascoSch92 merged commit ab8f5f2 into main Feb 25, 2026
21 of 22 checks passed

VascoSch92 deleted the vasco/issue-2049 branch February 25, 2026 15:15

enyst mentioned this pull request Feb 25, 2026

Document subagent loader design & invariants introduced by file-based agents (PR #2183) #2200

Closed

openhands-ai bot mentioned this pull request Mar 5, 2026

Release v1.12.0 #2302

Merged

11 tasks

Conversation

VascoSch92 commented Feb 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

github-actions bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

VascoSch92 commented Feb 23, 2026

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as duplicate.

enyst commented Feb 23, 2026

Taste Rating: Needs improvement

[CRITICAL ISSUES] (must fix)

1) You broke userspace: removed public APIs without a compatibility layer

2) Global registry + register_*_if_absent() destroys your priority model and breaks “reload on new conversation”

3) The “default agent” used for delegation changed semantics (likely a regression)

[IMPROVEMENT OPPORTUNITIES] (should fix)

4) One of the registry tests is structurally wrong (so it doesn’t test what it claims)

5) Tool naming / UX mismatch is unresolved

6) The “example usage” in the new registry module is stale

7) Loader error handling is “log-and-ignore everything”

[TESTING GAPS]

VERDICT: Needs rework

Uh oh!

VascoSch92 commented Feb 23, 2026

Review of the roasted-review

Uh oh!

This comment was marked as resolved.

enyst commented Feb 23, 2026

Uh oh!

enyst commented Feb 23, 2026

Uh oh!

VascoSch92 commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VascoSch92 commented Feb 25, 2026

Uh oh!

enyst commented Feb 25, 2026

Uh oh!

openhands-ai bot commented Feb 25, 2026

Uh oh!

openhands-ai bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VascoSch92 commented Feb 23, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Feb 23, 2026 •

edited

Loading

github-actions bot commented Feb 23, 2026 •

edited

Loading

2) **Global registry + `register_*_if_absent()` destroys your priority model and breaks “reload on new conversation”**