feat(agents): add a fork tool so Horton can branch the current session#4489
feat(agents): add a fork tool so Horton can branch the current session#4489kevin-dp wants to merge 10 commits into
fork tool so Horton can branch the current session#4489Conversation
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
samwillis
left a comment
There was a problem hiding this comment.
API shape: align ctx.fork with ctx.spawn
I think we should tighten the public ctx.fork API before this lands. Right now the shapes are quite different:
ctx.spawn(type, id, args?, opts?)
ctx.fork(targetEntityUrl?, opts?)That makes fork feel like a minimal Horton-specific helper rather than a general runtime API. In spawn, id names the new entity being created. In fork, targetEntityUrl names the source entity being copied, while the new fork's instance_id is not exposed.
Could we make fork mirror spawn more closely, e.g.:
ctx.fork(sourceEntityUrl, id, opts?)where id maps to the fork endpoint's instance_id, so id consistently means "the new entity being created".
For self-fork ergonomics, I'd also suggest a convenience wrapper rather than overloading the first string argument:
ctx.forkSelf(id, opts?)That keeps ctx.fork(sourceEntityUrl, id, opts?) unambiguous while still making the common self-fork case concise.
Options: reuse the spawn options where possible
The opts object should probably mirror spawn as much as the semantics allow:
type ForkOptions = {
initialMessage?: unknown
wake?: Wake
tags?: Record<string, string>
observe?: boolean
sandbox?: SpawnSandboxOption
}initialMessage feels especially important. The expected workflow today is fork() then send(), but that makes branch creation a two-step operation and can leave idle forks behind if the second step fails. Making the initial message part of the fork operation would be cleaner and easier to reason about.
Similarly, wake should probably be configurable rather than hardcoded to runFinished + includeResponse; observe: false can remain the opt-out.
Reliability: partial success is currently reported as failure
There's also a reliability issue in the current implementation: ctx.fork() creates the fork first, then registers the auto-observe wake.
const { entityUrl: forkUrl } =
await wiringConfig.forkEntity(targetEntityUrl)
if (opts?.observe !== false) {
await doObserve(entity(forkUrl), {
on: `runFinished`,
includeResponse: true,
})
}If wake registration fails, the call rejects even though the fork already exists. Horton then reports "Error forking session" without the fork URL. A retry can create duplicate idle forks, and the original fork will not wake the parent.
If we add initialMessage and wake to fork, I'd prefer fork creation, initial dispatch, and wake registration to be handled atomically server-side. If that's too much for this PR, we should at least return/report the fork URL when observation setup fails so callers can recover.
Test coverage
Please add focused tests for the new anchor: "latest_completed_run" path:
- resolving the latest completed run into the expected fork pointer
- rejecting when the source has no completed run
- forwarding
anchorthrough the HTTP route - ideally, covering the fork + observe failure behavior above
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4489 +/- ##
===========================================
- Coverage 69.21% 55.55% -13.66%
===========================================
Files 77 301 +224
Lines 9238 34417 +25179
Branches 2878 9713 +6835
===========================================
+ Hits 6394 19121 +12727
- Misses 2826 15278 +12452
Partials 18 18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@samwillis i addressed all your comments. Please have another look. |
samwillis
left a comment
There was a problem hiding this comment.
Thanks for the update. This does address a good chunk of the earlier feedback: ctx.fork now uses an opts bag, has spawn-like initialMessage / wake / tags / observe options, moves wake registration into the server-side fork path with rollback, and adds useful coverage for the latest_completed_run anchor and wake rollback paths.
I think a few things still need tightening before this is ready:
Align ctx.fork with ctx.spawn by exposing branch id and target URL as positional args
The main API alignment point is still missing. spawn makes the new entity id explicit:
ctx.spawn(type, id, args?, opts?)but ctx.fork still does not let the caller name the fork, even though the HTTP endpoint already supports instance_id:
ctx.fork({
targetEntityUrl?,
initialMessage?,
wake?,
tags?,
observe?,
})I'd prefer fork to follow the same positional-arguments-plus-options shape as spawn, with explicit positional args for the new branch id and source entity URL:
ctx.fork(id, targetEntityUrl, opts?)Where:
idmaps to the endpoint'sinstance_idand names the new fork/branch.targetEntityUrlis the source entity to fork.optscarries the spawn-like behavior options.
For self-fork ergonomics, I'd make that explicit with a convenience wrapper rather than defaulting targetEntityUrl to the current entity:
ctx.forkSelf(id, opts?)For example:
await ctx.forkSelf("branch-a", {
initialMessage: { text: "Try approach A" },
})
await ctx.fork("branch-b", "/horton/source-session", {
initialMessage: { text: "Try approach B" },
tags: { experiment: "fork-api" },
})This keeps id consistently meaning "the new entity being created" across spawn and fork, while avoiding implicit self-fork behavior.
Validate parent and wake subscriber inputs on the fork route
The route now accepts parent and wake from the request body and passes them straight into forkSubtree. forkSubtree then sets the new root fork's parent and registers a wake using wake.subscriberUrl.
That differs from spawn, which resolves/validates the parent entity before attaching the new entity to it. A direct API caller should not be able to attach a fork under an arbitrary parent URL or register a wake for an arbitrary subscriber without validation/authorization.
Please validate at least:
parentexists- the caller is allowed to create/attach a child under that parent
wake.subscriberUrlmatches/is allowed for that parent/caller, or is derived server-side instead of accepted from the body
Non-blocking: clarify initialMessage atomicity
The comments say the initial message is atomic with the fork RPC and prevents idle partial forks, but implementation still sends it after forkSubtree returns and after dispatch subscription linking:
const result = await ctx.entityManager.forkSubtree(...)
for (const forkedEntity of result.entities) {
await linkEntityDispatchSubscription(ctx, forkedEntity)
}
if (parsed.initialMessage !== undefined) {
await ctx.entityManager.send(result.root.url, ...)
}If linking or send fails, the endpoint returns an error after the fork has already been created. That may be acceptable if making this truly transactional is hard on the server, but we should at least make the comments/docs precise so callers know initialMessage is delivered after fork creation rather than atomically rolled back with it.
Return shape parity
Less blocking, but still worth considering: spawn returns an EntityHandle, while fork returns only { url }. If forks are now children and use the same parent-ownership model as spawned workers, it would be cleaner if the handler API exposed a similar handle surface where practical.
|
Forking at run boundaries makes a lot of sense actually... that's the only coherent place to do it really. This would make it easier to fork at older points too as the UI could expose forking at each run point and then the run object itself could list its final offset to bridge from that to the offset. |
`POST /_electric/entities/<type>/<id>/fork` gains an optional
`anchor: 'latest_completed_run'` body field as an alternative to
`fork_pointer`. When present, the server scans the source root's
`main` history, finds the last `runs` row with `status === 'completed'`,
derives the matching `{ offset, sub_offset }` pointer, and runs the
existing pointer-fork path with it. Mutually exclusive with
`fork_pointer`; 400 if no completed run exists. This lets callers
without access to the source's per-row pointer side-table (e.g. an
agent forking via a tool) fork at the same anchor the per-row
"Fork from here" UI uses.
`HandlerContext` gains `fork(targetEntityUrl?, opts?)`. Defaults
`targetEntityUrl` to `ctx.entityUrl` for self-fork. Auto-observes the
new fork with `runFinished` + `includeResponse` so the caller wakes
when the fork's next run finishes; opt out with `observe: false`.
Wired through a new `RuntimeServerClient.forkEntity` method and a new
`WiringConfig.forkEntity` injection point alongside `createOrGetChild`.
Horton gets a `fork` tool (optional `entityUrl` parameter) that
delegates to `ctx.fork`, plus a "When to fork (vs spawn_worker)"
section in the system prompt framing the distinction: spawn for an
isolated subtask with an empty context, fork for parallel exploration
that needs the conversation's full history. Includes the
end-turn-first / send-different-prompts / wait-for-all-responses
workflow for parallel-exploration patterns.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Change `fork` from creating a sibling (top-level) session to creating
a child of the forking entity, mirroring the spawn primitive's
parent-ownership model.
Reply delivery now uses the same manifest-anchored wake path spawn
uses: server registers a runFinished + includeResponse subscription
on the new fork at fork time, runtime writes a `kind: 'child'`
manifest row on the parent's stream, and the parent wakes with the
fork's response inline when the fork's next run finishes. The
previous out-of-band `ctx.observe` auto-call is gone.
Wire shape:
- `forkBodySchema` gains `parent` and `wake` (same shapes as
`spawnBodySchema`); forkSubtreeInner overrides the new root fork's
parent and registers the wake via `wakeRegistry.register`.
- `RuntimeServerClient.forkEntity` and `WiringConfig.forkEntity`
accept the new fields; `doFork` passes `parent: ctx.entityUrl` +
`wake: { subscriberUrl, condition: 'runFinished', includeResponse:
true }` and writes the manifest-child row via
`wakeSession.registerManifestEntry`.
- `ctx.fork` loses its `observe` option (no longer needed — the
manifest-anchored wake is the delivery path).
Horton's "When to fork (vs spawn_worker)" prompt section reframes
fork as the sibling primitive to spawn_worker — both create a child
the parent owns and gets replies from; the difference is only what
the child boots with (empty context vs. inherited history). Adds an
explicit trigger pattern ("prefer fork when generating multiple
variants the user wants to compare; don't inline") to route the
"give me three takes" shape to fork instead of collapsing it into
one inline response (the regression observed on the first end-to-end
test of the previous sibling-fork version).
Also fixes a chat-render gap exposed by agent-to-agent sends: the
`send` tool's `payload` description now spells out the canonical
`{ text: "..." }` shape, and `readInboxText` falls back to
`message` / `content` keys when `text` isn't present — so a fork
that receives a follow-up prompt via `send` actually shows the
prompt in the chat instead of rendering a blank bubble.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ctx.fork` was `(targetEntityUrl?: string) => …` — a single optional
positional arg. Future knobs (the deferred explicit-anchor parameter)
would force callers to thread `undefined` through the positional spot
to set a later arg. Switch to an options bag now so the upgrade path
is additive.
`ctx.fork()` self-fork stays a no-arg call. Cross-entity fork
becomes `ctx.fork({ targetEntityUrl: '...' })`. Threaded through
`HandlerContextConfig.doFork` and the agents `fork` tool wrapper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tags, observe
Bring `ctx.fork`'s opts shape into line with `ctx.spawn`'s. Sam's review
flagged the asymmetry: spawn exposes a settled set of knobs for
parent-owned child creation; fork — which is the same lifecycle pattern
with a different starting context — was missing the corresponding ones.
`ctx.fork(opts?)` now mirrors spawn where the semantics map:
- `initialMessage?: unknown` — atomic dispatch. The fork's inbox row
is delivered server-side via `entityManager.send` AFTER
`linkEntityDispatchSubscription` runs (same ordering spawn uses),
so the dispatcher is subscribed before the inbox lands and the
fork actually wakes on it. Folds the previous fork+send two-step
into one tool call and closes the "send fails, parent has an
idle fork on its manifest" reliability gap.
- `wake?: Wake` — overrides the default `runFinished + includeResponse`.
Translated through the same `normalizeWake` logic createOrGetChild
uses for spawn.
- `tags?: Record<string, string>` — stamped on the new root fork in
addition to those copied from the source.
- `observe?: boolean` (default `true`) — `false` opts out of the
parent relationship entirely: no parent URL, no wake registration,
no manifest child row. Fire-and-forget mirror of spawn's semantics.
Server side: `forkBodySchema` grows the corresponding fields; the
route handler delivers `initialMessage` via `entityManager.send(rootUrl,
…)` after the link-dispatch loop, matching spawn's pattern exactly.
`sandbox` is deferred to a follow-up — applying it requires the
`resolveSandboxForSpawn` resolver chain (inherit handling, runner
allowance), which is a non-trivial surface and not load-bearing for
the current demo flow.
Tool side: exposes `initialMessage` and `tags` to the model; keeps
`wake`/`observe` runtime-only. Updates the description and Horton's
system prompt to lead with `initialMessage` so the parallel-exploration
loop becomes "fork-with-prompt N times" instead of "fork then send N
times."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses Sam's test-coverage ask on the PR. Four focused tests for
the fork-tool surface this PR added:
- `anchor 'latest_completed_run' resolves to a fork pointer at the
latest completed run` — feeds a stream of events covering an idle
→ started → completed → started cycle and asserts the pointer fed
to streamClient.fork is `{ offset: <prior entry>, subOffset: 1 }`
on the completed-row entry, matching the runtime's onBeforeBatch
pointer-minting convention.
- `anchor 'latest_completed_run' rejects when the source has no
completed run` — same setup but with only `started`/no `completed`
rows. Expects a 400 with a "no completed run" message.
- `rolls back the fork when wake registration fails` — wakeRegistry
.register throws, forkSubtree rejects, and the rollback path
deletes the newly-created root fork entity + its `main`/`error`
streams. The source root is left intact.
- `forwards anchor, parent, wake, initialMessage, and tags through
to forkSubtree (and sends the initial message)` — HTTP route test
asserting (a) the new body fields land in the right places in
forkSubtree's options, (b) `initialMessage` is NOT passed into
forkSubtree but instead delivered via `entityManager.send` after
the dispatch link, mirroring spawn's pattern.
- (Bonus) `rejects when fork_pointer and anchor are both present` —
the mutual-exclusion validation also gets coverage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…registry changes After rebasing on top of main, two main-side changes that landed since this branch's base broke a handful of tests that were green at branch time. Aligning the mocks so CI doesn't false-alarm on this PR. 1. `add permission enforcement` (#4475) made the spawn route 401 for any non-bypass principal that doesn't hold the required entity-type grant. The runtime-dsl test framework was using `system:runtime-dsl-test`, which isn't a built-in bypass — switch to `system:dev-local` (the same bypass principal the desktop's local runtime uses). 70 snapshots auto-regenerated; the diff is purely the `from` field, nothing semantic. The dispatch-policy- routing test setup gets the same swap, plus its mocked runner's `owner_principal` flipped to match (so the `assertDispatchPolicyAllowed` owner check passes). 2. A new `registry.replaceSharedStateLink` call landed in `syncManifestLinks` without updating the registry mocks in agents-server's status / write-validation / server-start tests. Added `replaceSharedStateLink: vi.fn()` (or a no-op method on the fake registry class) to each. None of these tests are exercised by the recent main commits' CI matrix (TS tests are scoped by affected workspace, and main has only touched agents-desktop / agents-mobile recently), which is why they appear to "pass on main" but red on this PR's CI. The fixes above keep them passing and let our actual change get a clean signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ute tests Same pattern as the previous rebase-fixup commit: another test file landed on main with a non-bypass principal (`system:test`), and the new permission-enforcement code (#4475) now reaches for `registry.hasEntityPermission` via canAccessEntity — which the test's mock registry doesn't expose. Switching the principal to `system:dev-local` (a built-in bypass) sidesteps the entity-permission check, same way the desktop's local runtime and the dispatch-policy-routing tests do. These tests assert subscription routing, not authz, so the bypass is the minimal fix and matches the convention established in the previous commit on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Self(id, opts?)
Aligns ctx.fork with ctx.spawn's positional shape:
ctx.spawn(type, id, args?, opts?)
ctx.fork(sourceEntityUrl, id, opts?)
ctx.forkSelf(id, opts?) // convenience for sourceEntityUrl=self
Both fork methods now require `id` (mirroring spawn). `id` maps to the
server's `instance_id` body field and gives callers two things spawn
already provides:
- Idempotency on retry: same id → server-side deduplication (no
duplicate forks on a transient network error).
- Predictable URLs: caller knows `/horton/<id>` before the request
returns; no need to wait for the response to learn the new URL.
The model-facing surface stays simple: the `fork` tool's `id` param
remains optional and is auto-generated via `nanoid(10)` when the model
doesn't supply one — same pattern createSpawnWorkerTool uses for the
worker id. So the model still calls `fork({ initialMessage: ... })`
without ceremony; the tool always satisfies the library API's id
requirement.
Also introduces `ForkOptions` (initialMessage, wake, tags, observe) as
a named exported type, mirroring how spawn's opts bag looks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A direct HTTP caller could previously attach a fork under an arbitrary `parent` URL or register a wake firing to an arbitrary subscriber. The agents-runtime always passes its own entity URL for both fields, so this hadn't bitten in practice — but it's a real authz hole on the endpoint itself. Mirroring the spawn route's parent-validation flow: - When `parent` is set, look it up via `registry.getEntity` (404 if it doesn't exist) and run `canAccessEntity(ctx, parent, 'spawn')` (401 if the caller can't attach a child there). - When `wake` is set, `parent` is required (the only sensible target for a fork's wake is its parent — same semantics spawn assumes), and `wake.subscriberUrl` must equal `parent`. This prevents a caller from registering a wake firing to an entity they don't own. If we ever need subscriber-flexibility, this becomes a proper canAccessEntity check on the subscriber URL. Tests cover the three rejection paths: parent not found (404), wake without parent (400), and wake.subscriberUrl mismatch (401). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spawn returns `EntityHandle`; fork was returning `{ url: string }`.
Now that forks are children with the same parent-ownership model as
spawned workers (and they report back via the same manifest-anchored
wake), there's no good reason for fork's return type to be poorer than
spawn's. Bring them into parity.
`setupCtx.fork(sourceUrl, id, opts)` is added alongside
`setupCtx.spawn` and mirrors its flow:
- Builds an `EntityHandle` with deferred run state.
- Registers a spawn handle on `wakeSession` so a `runFinished` wake
resolves the handle's `run` promise.
- Calls `wiring.forkEntity(...)` to do the server-side fork.
- Creates a child `EntityStreamDB` to observe the fork's stream and
hooks completed/failed run events back to the same resolver.
- Writes a `kind: 'child'` manifest row on this entity's stream
(parity with spawn's bookkeeping).
- Caches the handle in `observeHandleCache`.
`observe: false` falls through the same way spawn's does — no parent,
no wake, no manifest, no DB observation; the handle's `.run` /
`.text` reject with a clear "opted out" message.
`process-wake.ts`'s `doFork` becomes a thin delegate to
`setupCtx.fork`, same pattern `doSpawn` uses. The user-facing `Wake`
→ wakeRegistry-shape translation moves to a `normalizeForkWake`
module-level helper that the `wiring.forkEntity` impl calls.
Tool surface stays the same — `createForkTool` now uses
`handle.entityUrl` to build the response text but the model still
just gets a `Forked at <url>.` confirmation; the EntityHandle surface
is for programmatic callers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
anchor: 'latest_completed_run'field on the fork endpoint so callers without access to the source's per-row pointer side-table can still fork at the same anchor the UI uses.ctx.fork(targetEntityUrl?, opts?)onHandlerContext, with auto-observe (runFinished+includeResponse) so the caller wakes when the fork's next run finishes. DefaultstargetEntityUrltoctx.entityUrlfor self-fork.forktool plus a "When to fork (vs spawn_worker)" section in the system prompt framing the distinction: spawn for an isolated subtask with empty context, fork for parallel exploration that needs the conversation's full history.How it composes
The minimal flow exercised against
localhost:4437:fork(), thensendthe fork '9 × 9', then end your turn."runFinishedwakes him with the response.The "demo loop" the design was scoped to (analyze → fork N → synthesise the winner) needs an end-turn-then-fork pattern because the fork anchor is the latest completed run — anything mid-turn is excluded. The system prompt's new "When to fork" workflow walks Horton through this.
Wire shape
Server-side: reads the source root's
mainhistory, walks for the lastrunsrow withstatus === 'completed', derives{ offset, sub_offset }using the same pointer convention the runtime mints fromonBeforeBatch, then runs the existing pointer-fork path with it. Mutually exclusive withfork_pointer(400 if both). 400 if no completed run exists.Server-side rollout
This depends on the agents-server having the new
anchorfield. Local self-hosted servers built from this PR work immediately. Cloud (Stratovolt) needs to be redeployed on the new package before the tool works against it — withoutanchorhandling, the old endpoint falls through to the HEAD-clone path'swaitForIdleSubtree, which times out at 120s when an agent calls fork from inside its own turn (source can't become idle while the request is pending). Confirmed during local testing.Known follow-ups
forkcurrently always anchors at "latest completed run." AmessagesAgoor similar parameter to address earlier anchors is parked for a follow-up. The simplest case fits the demo flow; explicit addressing waits for a concrete use case to shape the API.fork. The "When to fork" section is in place but may need stronger directive language for ambiguous "give me three takes" prompts. Iterating on the wording is a small follow-up that doesn't change the tool surface.Test plan
npx tsc --noEmitclean acrossagents-server,agents-runtime,agents.agents-runtime777/779 (2 pre-existing skips),agents54/54.agents-server360/402 with one pre-existing flake unrelated to this change (a docker port collision on the local box).localhost:4437: minimalfork()+send()round-trip; auto-observe wakes the parent with the fork's response; new entity tile appears in the desktop UI.@electric-ax/agents-serverto Stratovolt before announcing the tool works against Cloud.🤖 Generated with Claude Code