refactor(subagents): consolidate sub-agent liveness — guard, delete suspend flag, stop parent over-supervision#1481
Merged
Merged
Conversation
Aaronontheweb
added a commit
that referenced
this pull request
Jun 24, 2026
- Cancel handling: a self-monitoring drain cancelled by the caller no longer becomes a tool-result "Error executing tool: The operation was canceled." fed back to the model; it rethrows to the batch handler (clean turn failure). The outer relabel to a wall-clock TimeoutException is kept on purpose — the tool-execution token is also the session's own timeout signal, so the two cannot be distinguished there. - Vacuous test: the old "no parent timeout" test advanced a FakeTimeProvider the drain path never reads, so it proved nothing. Replaced with two honest tests — runs-to-completion, and bounded-only-by-caller-cancellation (the latter also covers the cancel-handling fix above). - Dead code: drop the never-assigned `result` local in ConsumeAsync (after FirstItemOnly was removed, `return result ?? throw` was always the throw). - Stale comment: SpawnAgentTool no longer claims a parent watchdog keeps the delegated run alive — there is none. - Overstated justification: correct the pipeline comment — the drain's terminal item depends on SpawnAsync's finally completing the activity channel, not on PostStop's reply alone (PostStop only unblocks the spawner Ask). - Skill: netclaw-operations now states spawn_agent has no parent timeout (the subagent owns its liveness; bounded by its own watchdog or by cancelling the turn). Bump version 2.15.0 -> 2.16.0. By design (deliberately NOT "fixed"): no parent backstop and an unbounded approval wait. A self-monitoring tool owns its liveness end to end and approval blocks until answered — removing the moving part is the goal. Relates to #1481, #1472.
Aaronontheweb
added a commit
that referenced
this pull request
Jun 24, 2026
The startup guard only flagged "declared SelfMonitoring but resolved to a wall-clock mode" (the kill-mid-run direction). With the parent no longer supervising self-monitoring tools at all, the reverse is now equally dangerous: a tool that resolves SelfMonitoring without declaring it via the attribute would be drained with NO watchdog. ToolLivenessValidator now fails loud on either mismatch; adds a test for the reverse direction. Addresses review finding #9. Relates to #1481, #1472.
Aaronontheweb
added a commit
that referenced
this pull request
Jun 24, 2026
The streaming-tool-call-execution OpenSpec change and SPEC-016 still described the original two-phase watchdog / "parent first-item startup guard for self-monitoring tools" design, which #1481 superseded. Reconcile the in-flight (unarchived) change's delta spec to the implemented two-class model so it archives correct: - netclaw-tools delta spec: replace "Per-call two-phase inactivity watchdog" with "Per-call liveness by tool class" — opaque tools get one wall-clock budget; self-monitoring tools get NO parent watchdog (drained to a terminal item, bounded only by their own watchdog or caller cancellation); both-direction startup liveness validation. New scenarios cover each. - design.md: D3 update recording the "addition through subtraction" refinement. - tasks.md: Phase H for the consolidation work. - SPEC-016: superseded note pointing at the reconciled OpenSpec requirement. openspec validate passes. Relates to #1481, #1472, #1450.
The per-tool watchdog budget is chosen from IToolExecutor.GetLivenessMode. A SelfMonitoring-declared tool that silently resolves to Opaque would be put on a wall-clock budget and could kill a healthy self-monitoring run — e.g. spawn_agent killing a sub-agent mid-approval. The source generator emits LivenessMode from the [NetclawTool(Liveness=...)] attribute, so the two can only diverge via stale generated code or a hand-rolled override. Add ToolLivenessValidator.AssertSelfMonitoringConsistency and run it at startup (ToolIndexUpdater) over the assembled registry: any tool declaring SelfMonitoring that resolves to a different mode fails loud at startup rather than mis-supervising at runtime. This is the precondition for removing the SuspendsInactivityWatchdog flag (next commit). Note: a literal "make GetLivenessMode throw on miss" is unsafe — unknown tools legitimately reach GetLivenessMode before ExecuteStreamAsync rejects them, so a throw there would break the graceful unknown-tool path. A startup assertion achieves the same intent (no silent downgrade) without that regression. Step 3 of #1472.
The flag's only consumer was the parent's per-tool-call watchdog on the spawn_agent stream; its only producer was the sub-agent's approval wait. With the step-3 guard guaranteeing spawn_agent resolves to SelfMonitoring (FirstItemOnly, which disables the parent watchdog after the first stream item), the parent no longer watches during an approval wait, so the flag is dead weight. The sub-agent's own approval survival is via ProcessingWatchdog.Stop, unaffected. Remove: - ToolActivityUpdate.SuspendsInactivityWatchdog (the flag) - the StreamingToolWatchdog consumer branch - the SubAgentActor producer (EmitActivity param + the approval-wait call) - the now-obsolete suspend watchdog tests (the behaviour no longer exists) The sub-agent still surfaces "awaiting human approval" / "approval resolved" activities to the parent stream for visibility; the renamed actor test covers that. FirstItemOnly survival of a long approval stays covered by First_item_only_budget_disables_parent_liveness_after_startup. Step 2 of #1472.
Self-monitoring tools (spawn_agent) own their liveness end to end: the sub-agent's prefill/no-progress watchdog self-terminates stalls, its guards fast-fail, and SubAgentActor.PostStop guarantees a terminal SubAgentResult on any stop — so the parent's Ask always completes. The parent therefore does not supervise such a tool at all: it drains the stream to its terminal item under caller (turn/user) cancellation only. - SessionToolExecutionPipeline: self-monitoring tools are drained without a watchdog (DrainToCompletionAsync); opaque tools keep the one wall-clock budget. - Remove the FirstItemOnly budget mode (ToolWatchdogResetMode.FirstItemOnly + its factory + the two switch cases) — no longer used. Contract change: "self-monitoring" now means the tool owns its liveness completely and always yields a terminal item; the parent's blind-startup backstop is gone. The removed Self_monitoring_tool_still_times_out_when_startup_never_emits test (and its SelfMonitoringNeverStartsExecutor stub) exercised exactly that backstop. spawn_agent satisfies the contract via the PostStop guarantee; affected tests now drain rather than watch. Step 4 of #1472.
- Cancel handling: a self-monitoring drain cancelled by the caller no longer becomes a tool-result "Error executing tool: The operation was canceled." fed back to the model; it rethrows to the batch handler (clean turn failure). The outer relabel to a wall-clock TimeoutException is kept on purpose — the tool-execution token is also the session's own timeout signal, so the two cannot be distinguished there. - Vacuous test: the old "no parent timeout" test advanced a FakeTimeProvider the drain path never reads, so it proved nothing. Replaced with two honest tests — runs-to-completion, and bounded-only-by-caller-cancellation (the latter also covers the cancel-handling fix above). - Dead code: drop the never-assigned `result` local in ConsumeAsync (after FirstItemOnly was removed, `return result ?? throw` was always the throw). - Stale comment: SpawnAgentTool no longer claims a parent watchdog keeps the delegated run alive — there is none. - Overstated justification: correct the pipeline comment — the drain's terminal item depends on SpawnAsync's finally completing the activity channel, not on PostStop's reply alone (PostStop only unblocks the spawner Ask). - Skill: netclaw-operations now states spawn_agent has no parent timeout (the subagent owns its liveness; bounded by its own watchdog or by cancelling the turn). Bump version 2.15.0 -> 2.16.0. By design (deliberately NOT "fixed"): no parent backstop and an unbounded approval wait. A self-monitoring tool owns its liveness end to end and approval blocks until answered — removing the moving part is the goal. Relates to #1481, #1472.
The startup guard only flagged "declared SelfMonitoring but resolved to a wall-clock mode" (the kill-mid-run direction). With the parent no longer supervising self-monitoring tools at all, the reverse is now equally dangerous: a tool that resolves SelfMonitoring without declaring it via the attribute would be drained with NO watchdog. ToolLivenessValidator now fails loud on either mismatch; adds a test for the reverse direction. Addresses review finding #9. Relates to #1481, #1472.
The streaming-tool-call-execution OpenSpec change and SPEC-016 still described the original two-phase watchdog / "parent first-item startup guard for self-monitoring tools" design, which #1481 superseded. Reconcile the in-flight (unarchived) change's delta spec to the implemented two-class model so it archives correct: - netclaw-tools delta spec: replace "Per-call two-phase inactivity watchdog" with "Per-call liveness by tool class" — opaque tools get one wall-clock budget; self-monitoring tools get NO parent watchdog (drained to a terminal item, bounded only by their own watchdog or caller cancellation); both-direction startup liveness validation. New scenarios cover each. - design.md: D3 update recording the "addition through subtraction" refinement. - tasks.md: Phase H for the consolidation work. - SPEC-016: superseded note pointing at the reconciled OpenSpec requirement. openspec validate passes. Relates to #1481, #1472, #1450.
5f4713f to
9c8c9e6
Compare
Aaronontheweb
commented
Jun 25, 2026
Aaronontheweb
left a comment
Collaborator
Author
There was a problem hiding this comment.
LGTM - mostly competing timeout ideas being deleted.
| Status: Planning note for issue #1450. The corresponding OpenSpec artifacts must | ||
| be updated through the OpenSpec workflow before implementation is closed. | ||
|
|
||
| > **Superseded in part by #1472 / PR #1481 ("addition through subtraction").** The |
| time, | ||
| onActivity: _ => firstActivitySeen.TrySetResult(), | ||
| TestContext.Current.CancellationToken); | ||
| async Task<string?> DrainAsync() |
| // SpawnAsync return even on a crash. (Note: PostStop alone only unblocks | ||
| // the spawner Ask — the terminal stream item depends on that finally | ||
| // running.) Opaque tools remain bounded by one wall-clock budget. | ||
| if (executor.GetLivenessMode(toolCall) == ToolLivenessMode.SelfMonitoring) |
This was referenced Jun 25, 2026
Closed
Aaronontheweb
added a commit
that referenced
this pull request
Jun 25, 2026
…out (#1472 step 5) (#1487) After #1481 the off-thread StreamingToolWatchdog was used for exactly one thing: bounding opaque tool calls with a flat wall-clock budget. In that mode its two shared Volatile fields were written once and never changed, so the 1s polling timer only ever computed "has the budget elapsed since start?" — which is exactly what a CancellationTokenSource timeout token does. (The "why is this volatile instead of actor state?" smell.) Replace it with a TimeProvider-driven timeout token — new CancellationTokenSource( timeout, timeProvider) linked to the caller token — draining through the same DrainToCompletionAsync helper self-monitoring tools already use. A budget trip is converted to the same TimeoutException message; caller (turn/user) cancellation is rethrown unchanged (preserves #1481). Per call, so a slow tool times out without affecting siblings, and TimeProvider-virtualizable so tests stay deterministic. - Delete StreamingToolWatchdog, ToolWatchdogBudget, ToolWatchdogResetMode — removes the volatile, the polling timer, and the dead ResetOnItem/FirstItem/InterItem machinery (review follow-up #13). - Unify the two drain paths on DrainToCompletionAsync (review follow-up #11). - Point SpawnAgentStreamingTests at the production drain instead of the watchdog (review follow-up #12). - Opaque wall-clock timeout + concurrent independence are already covered at the pipeline level (FakeTimeProvider); add one test for the stream-without-a-completion- item contract; gut the watchdog unit tests, keeping the INetclawTool default-adapter test. No behavior change: opaque tools still time out at the same wall-clock budget with the same message and per-tool isolation; self-monitoring tools are untouched. The spec-of- record ("Per-call liveness by tool class") already says opaque tools get one wall-clock budget — only the mechanism changed. Net: roughly -150 LOC. Part of #1472.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
The subagent-boundary half of the #1472 consolidation ("addition through subtraction" of the sub-agent liveness machinery). Builds on #1473 (step 1, already merged). Three independently-tested commits, in dependency order:
Step 3 — fail loud if a SelfMonitoring tool resolves to Opaque (
5ff03bbf)The watchdog budget is chosen from
GetLivenessMode; aSelfMonitoring-declared tool that silently resolved toOpaquewould be put on a wall-clock budget and could kill a healthy self-monitoring run.ToolLivenessValidatorruns at startup and fails loud on any such mismatch. (Adapted from the goal's literal "throw inGetLivenessMode" — unsafe, because unknown tools legitimately reach it beforeExecuteStreamAsyncrejects them — to a startup assertion that achieves the same intent.)Step 2 — delete the
SuspendsInactivityWatchdogflag (b0765503)Its only consumer was the parent's watchdog on the
spawn_agentstream; its only producer was the sub-agent's approval wait. With step 3 guaranteeingspawn_agentresolves toSelfMonitoring, the parent already stops watching after the first item, so the flag was dead weight. The sub-agent's own approval survival is viaProcessingWatchdog.Stop, unaffected.Step 4 — parent no longer supervises self-monitoring tools (
1d2d4f53)Self-monitoring tools own their liveness end to end: the sub-agent's prefill/no-progress watchdog self-terminates stalls, its guards fast-fail, and
SubAgentActor.PostStopguarantees a terminalSubAgentResulton any stop — so the parent'sAskalways completes. The parent drains the stream to its terminal item under caller cancellation only, and theFirstItemOnlybudget mode is removed.Contract change: "self-monitoring" now means a tool owns its liveness completely and always yields a terminal item. The parent's blind-startup backstop is gone;
spawn_agentsatisfies the contract via thePostStopguarantee.Net effect
Net −67 LOC (my diff: +183 / −250 across 11 files). The flag, the parent's dual-supervision of sub-agents, and the
FirstItemOnlymode are all gone; the three invariants (healthy sub-agent never killed; approvals never affect the clock; no silent liveness downgrade) hold by construction at the sub-agent boundary.Scope / follow-ups
Per checkpoint, the two system-wide steps of #1472 are deferred to their own legs:
StreamingToolWatchdog(now used only for opaque tools' wall-clock budget) with actor-owned liveness.SessionDiagnosticsContextAsyncLocal gate).Verification
Netclaw.Actors.Testssuite green (2443 passed) before rebase; rebuild + relevant sweep (130) green after rebasing onto latestdev.dotnet slopwatch analyzeclean; copyright headers present.Part of #1472.