Skip to content

refactor(subagents): consolidate sub-agent liveness — guard, delete suspend flag, stop parent over-supervision#1481

Merged
Aaronontheweb merged 6 commits into
devfrom
fix/approval-as-state
Jun 25, 2026
Merged

refactor(subagents): consolidate sub-agent liveness — guard, delete suspend flag, stop parent over-supervision#1481
Aaronontheweb merged 6 commits into
devfrom
fix/approval-as-state

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

What & why

The subagent-boundary half of the #1472 consolidation ("addition through subtraction" of the sub-agent liveness machinery). Builds on #1473 (step 1, already merged). Three independently-tested commits, in dependency order:

Step 3 — fail loud if a SelfMonitoring tool resolves to Opaque (5ff03bbf)

The watchdog budget is chosen from GetLivenessMode; a SelfMonitoring-declared tool that silently resolved to Opaque would be put on a wall-clock budget and could kill a healthy self-monitoring run. ToolLivenessValidator runs at startup and fails loud on any such mismatch. (Adapted from the goal's literal "throw in GetLivenessMode" — unsafe, because unknown tools legitimately reach it before ExecuteStreamAsync rejects them — to a startup assertion that achieves the same intent.)

Step 2 — delete the SuspendsInactivityWatchdog flag (b0765503)

Its only consumer was the parent's watchdog on the spawn_agent stream; its only producer was the sub-agent's approval wait. With step 3 guaranteeing spawn_agent resolves to SelfMonitoring, the parent already stops watching after the first item, so the flag was dead weight. The sub-agent's own approval survival is via ProcessingWatchdog.Stop, unaffected.

Step 4 — parent no longer supervises self-monitoring tools (1d2d4f53)

Self-monitoring tools own their liveness end to end: the sub-agent's prefill/no-progress watchdog self-terminates stalls, its guards fast-fail, and SubAgentActor.PostStop guarantees a terminal SubAgentResult on any stop — so the parent's Ask always completes. The parent drains the stream to its terminal item under caller cancellation only, and the FirstItemOnly budget mode is removed.

Contract change: "self-monitoring" now means a tool owns its liveness completely and always yields a terminal item. The parent's blind-startup backstop is gone; spawn_agent satisfies the contract via the PostStop guarantee.

Net effect

Net −67 LOC (my diff: +183 / −250 across 11 files). The flag, the parent's dual-supervision of sub-agents, and the FirstItemOnly mode are all gone; the three invariants (healthy sub-agent never killed; approvals never affect the clock; no silent liveness downgrade) hold by construction at the sub-agent boundary.

Scope / follow-ups

Per checkpoint, the two system-wide steps of #1472 are deferred to their own legs:

  • Step 5 — replace the off-thread volatile StreamingToolWatchdog (now used only for opaque tools' wall-clock budget) with actor-owned liveness.
  • Step 6 — unify the daemon's logging path (remove the Akka-async-vs-MEL-sync split + the SessionDiagnosticsContext AsyncLocal gate).

Verification

  • Full Netclaw.Actors.Tests suite green (2443 passed) before rebase; rebuild + relevant sweep (130) green after rebasing onto latest dev.
  • dotnet slopwatch analyze clean; copyright headers present.

Part of #1472.

@Aaronontheweb Aaronontheweb added reliability Retries, resilience, graceful degradation cleanup Code quality improvements and tech debt reduction subagents spawn_agent, SubAgentActor, definition loader, discovery context layer, and related features labels Jun 24, 2026
@Aaronontheweb Aaronontheweb marked this pull request as ready for review June 24, 2026 17:40
Aaronontheweb added a commit that referenced this pull request Jun 24, 2026
- Cancel handling: a self-monitoring drain cancelled by the caller no longer becomes
  a tool-result "Error executing tool: The operation was canceled." fed back to the
  model; it rethrows to the batch handler (clean turn failure). The outer relabel to
  a wall-clock TimeoutException is kept on purpose — the tool-execution token is also
  the session's own timeout signal, so the two cannot be distinguished there.
- Vacuous test: the old "no parent timeout" test advanced a FakeTimeProvider the drain
  path never reads, so it proved nothing. Replaced with two honest tests —
  runs-to-completion, and bounded-only-by-caller-cancellation (the latter also covers
  the cancel-handling fix above).
- Dead code: drop the never-assigned `result` local in ConsumeAsync (after FirstItemOnly
  was removed, `return result ?? throw` was always the throw).
- Stale comment: SpawnAgentTool no longer claims a parent watchdog keeps the delegated
  run alive — there is none.
- Overstated justification: correct the pipeline comment — the drain's terminal item
  depends on SpawnAsync's finally completing the activity channel, not on PostStop's
  reply alone (PostStop only unblocks the spawner Ask).
- Skill: netclaw-operations now states spawn_agent has no parent timeout (the subagent
  owns its liveness; bounded by its own watchdog or by cancelling the turn). Bump
  version 2.15.0 -> 2.16.0.

By design (deliberately NOT "fixed"): no parent backstop and an unbounded approval
wait. A self-monitoring tool owns its liveness end to end and approval blocks until
answered — removing the moving part is the goal.

Relates to #1481, #1472.
Aaronontheweb added a commit that referenced this pull request Jun 24, 2026
The startup guard only flagged "declared SelfMonitoring but resolved to a
wall-clock mode" (the kill-mid-run direction). With the parent no longer
supervising self-monitoring tools at all, the reverse is now equally dangerous:
a tool that resolves SelfMonitoring without declaring it via the attribute would
be drained with NO watchdog. ToolLivenessValidator now fails loud on either
mismatch; adds a test for the reverse direction.

Addresses review finding #9. Relates to #1481, #1472.
Aaronontheweb added a commit that referenced this pull request Jun 24, 2026
The streaming-tool-call-execution OpenSpec change and SPEC-016 still described the
original two-phase watchdog / "parent first-item startup guard for self-monitoring
tools" design, which #1481 superseded. Reconcile the in-flight (unarchived) change's
delta spec to the implemented two-class model so it archives correct:

- netclaw-tools delta spec: replace "Per-call two-phase inactivity watchdog" with
  "Per-call liveness by tool class" — opaque tools get one wall-clock budget;
  self-monitoring tools get NO parent watchdog (drained to a terminal item, bounded
  only by their own watchdog or caller cancellation); both-direction startup liveness
  validation. New scenarios cover each.
- design.md: D3 update recording the "addition through subtraction" refinement.
- tasks.md: Phase H for the consolidation work.
- SPEC-016: superseded note pointing at the reconciled OpenSpec requirement.

openspec validate passes. Relates to #1481, #1472, #1450.
The per-tool watchdog budget is chosen from IToolExecutor.GetLivenessMode. A
SelfMonitoring-declared tool that silently resolves to Opaque would be put on a
wall-clock budget and could kill a healthy self-monitoring run — e.g. spawn_agent
killing a sub-agent mid-approval. The source generator emits LivenessMode from the
[NetclawTool(Liveness=...)] attribute, so the two can only diverge via stale
generated code or a hand-rolled override.

Add ToolLivenessValidator.AssertSelfMonitoringConsistency and run it at startup
(ToolIndexUpdater) over the assembled registry: any tool declaring SelfMonitoring
that resolves to a different mode fails loud at startup rather than mis-supervising
at runtime. This is the precondition for removing the SuspendsInactivityWatchdog
flag (next commit).

Note: a literal "make GetLivenessMode throw on miss" is unsafe — unknown tools
legitimately reach GetLivenessMode before ExecuteStreamAsync rejects them, so a
throw there would break the graceful unknown-tool path. A startup assertion
achieves the same intent (no silent downgrade) without that regression.

Step 3 of #1472.
The flag's only consumer was the parent's per-tool-call watchdog on the spawn_agent
stream; its only producer was the sub-agent's approval wait. With the step-3 guard
guaranteeing spawn_agent resolves to SelfMonitoring (FirstItemOnly, which disables
the parent watchdog after the first stream item), the parent no longer watches
during an approval wait, so the flag is dead weight. The sub-agent's own approval
survival is via ProcessingWatchdog.Stop, unaffected.

Remove:
- ToolActivityUpdate.SuspendsInactivityWatchdog (the flag)
- the StreamingToolWatchdog consumer branch
- the SubAgentActor producer (EmitActivity param + the approval-wait call)
- the now-obsolete suspend watchdog tests (the behaviour no longer exists)

The sub-agent still surfaces "awaiting human approval" / "approval resolved"
activities to the parent stream for visibility; the renamed actor test covers that.
FirstItemOnly survival of a long approval stays covered by
First_item_only_budget_disables_parent_liveness_after_startup.

Step 2 of #1472.
Self-monitoring tools (spawn_agent) own their liveness end to end: the sub-agent's
prefill/no-progress watchdog self-terminates stalls, its guards fast-fail, and
SubAgentActor.PostStop guarantees a terminal SubAgentResult on any stop — so the
parent's Ask always completes. The parent therefore does not supervise such a tool
at all: it drains the stream to its terminal item under caller (turn/user)
cancellation only.

- SessionToolExecutionPipeline: self-monitoring tools are drained without a watchdog
  (DrainToCompletionAsync); opaque tools keep the one wall-clock budget.
- Remove the FirstItemOnly budget mode (ToolWatchdogResetMode.FirstItemOnly + its
  factory + the two switch cases) — no longer used.

Contract change: "self-monitoring" now means the tool owns its liveness completely
and always yields a terminal item; the parent's blind-startup backstop is gone. The
removed Self_monitoring_tool_still_times_out_when_startup_never_emits test (and its
SelfMonitoringNeverStartsExecutor stub) exercised exactly that backstop. spawn_agent
satisfies the contract via the PostStop guarantee; affected tests now drain rather
than watch.

Step 4 of #1472.
- Cancel handling: a self-monitoring drain cancelled by the caller no longer becomes
  a tool-result "Error executing tool: The operation was canceled." fed back to the
  model; it rethrows to the batch handler (clean turn failure). The outer relabel to
  a wall-clock TimeoutException is kept on purpose — the tool-execution token is also
  the session's own timeout signal, so the two cannot be distinguished there.
- Vacuous test: the old "no parent timeout" test advanced a FakeTimeProvider the drain
  path never reads, so it proved nothing. Replaced with two honest tests —
  runs-to-completion, and bounded-only-by-caller-cancellation (the latter also covers
  the cancel-handling fix above).
- Dead code: drop the never-assigned `result` local in ConsumeAsync (after FirstItemOnly
  was removed, `return result ?? throw` was always the throw).
- Stale comment: SpawnAgentTool no longer claims a parent watchdog keeps the delegated
  run alive — there is none.
- Overstated justification: correct the pipeline comment — the drain's terminal item
  depends on SpawnAsync's finally completing the activity channel, not on PostStop's
  reply alone (PostStop only unblocks the spawner Ask).
- Skill: netclaw-operations now states spawn_agent has no parent timeout (the subagent
  owns its liveness; bounded by its own watchdog or by cancelling the turn). Bump
  version 2.15.0 -> 2.16.0.

By design (deliberately NOT "fixed"): no parent backstop and an unbounded approval
wait. A self-monitoring tool owns its liveness end to end and approval blocks until
answered — removing the moving part is the goal.

Relates to #1481, #1472.
The startup guard only flagged "declared SelfMonitoring but resolved to a
wall-clock mode" (the kill-mid-run direction). With the parent no longer
supervising self-monitoring tools at all, the reverse is now equally dangerous:
a tool that resolves SelfMonitoring without declaring it via the attribute would
be drained with NO watchdog. ToolLivenessValidator now fails loud on either
mismatch; adds a test for the reverse direction.

Addresses review finding #9. Relates to #1481, #1472.
The streaming-tool-call-execution OpenSpec change and SPEC-016 still described the
original two-phase watchdog / "parent first-item startup guard for self-monitoring
tools" design, which #1481 superseded. Reconcile the in-flight (unarchived) change's
delta spec to the implemented two-class model so it archives correct:

- netclaw-tools delta spec: replace "Per-call two-phase inactivity watchdog" with
  "Per-call liveness by tool class" — opaque tools get one wall-clock budget;
  self-monitoring tools get NO parent watchdog (drained to a terminal item, bounded
  only by their own watchdog or caller cancellation); both-direction startup liveness
  validation. New scenarios cover each.
- design.md: D3 update recording the "addition through subtraction" refinement.
- tasks.md: Phase H for the consolidation work.
- SPEC-016: superseded note pointing at the reconciled OpenSpec requirement.

openspec validate passes. Relates to #1481, #1472, #1450.
@Aaronontheweb Aaronontheweb force-pushed the fix/approval-as-state branch from 5f4713f to 9c8c9e6 Compare June 25, 2026 00:43

@Aaronontheweb Aaronontheweb left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - mostly competing timeout ideas being deleted.

Status: Planning note for issue #1450. The corresponding OpenSpec artifacts must
be updated through the OpenSpec workflow before implementation is closed.

> **Superseded in part by #1472 / PR #1481 ("addition through subtraction").** The

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

time,
onActivity: _ => firstActivitySeen.TrySetResult(),
TestContext.Current.CancellationToken);
async Task<string?> DrainAsync()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// SpawnAsync return even on a crash. (Note: PostStop alone only unblocks
// the spawner Ask — the terminal stream item depends on that finally
// running.) Opaque tools remain bounded by one wall-clock budget.
if (executor.GetLivenessMode(toolCall) == ToolLivenessMode.SelfMonitoring)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb merged commit 75e7ea2 into dev Jun 25, 2026
21 checks passed
@Aaronontheweb Aaronontheweb deleted the fix/approval-as-state branch June 25, 2026 01:33
Aaronontheweb added a commit that referenced this pull request Jun 25, 2026
…out (#1472 step 5) (#1487)

After #1481 the off-thread StreamingToolWatchdog was used for exactly one thing:
bounding opaque tool calls with a flat wall-clock budget. In that mode its two
shared Volatile fields were written once and never changed, so the 1s polling timer
only ever computed "has the budget elapsed since start?" — which is exactly what a
CancellationTokenSource timeout token does. (The "why is this volatile instead of
actor state?" smell.)

Replace it with a TimeProvider-driven timeout token — new CancellationTokenSource(
timeout, timeProvider) linked to the caller token — draining through the same
DrainToCompletionAsync helper self-monitoring tools already use. A budget trip is
converted to the same TimeoutException message; caller (turn/user) cancellation is
rethrown unchanged (preserves #1481). Per call, so a slow tool times out without
affecting siblings, and TimeProvider-virtualizable so tests stay deterministic.

- Delete StreamingToolWatchdog, ToolWatchdogBudget, ToolWatchdogResetMode — removes
  the volatile, the polling timer, and the dead ResetOnItem/FirstItem/InterItem
  machinery (review follow-up #13).
- Unify the two drain paths on DrainToCompletionAsync (review follow-up #11).
- Point SpawnAgentStreamingTests at the production drain instead of the watchdog
  (review follow-up #12).
- Opaque wall-clock timeout + concurrent independence are already covered at the
  pipeline level (FakeTimeProvider); add one test for the stream-without-a-completion-
  item contract; gut the watchdog unit tests, keeping the INetclawTool default-adapter
  test.

No behavior change: opaque tools still time out at the same wall-clock budget with the
same message and per-tool isolation; self-monitoring tools are untouched. The spec-of-
record ("Per-call liveness by tool class") already says opaque tools get one wall-clock
budget — only the mechanism changed.

Net: roughly -150 LOC. Part of #1472.
@Aaronontheweb Aaronontheweb mentioned this pull request Jun 25, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Code quality improvements and tech debt reduction reliability Retries, resilience, graceful degradation subagents spawn_agent, SubAgentActor, definition loader, discovery context layer, and related features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant