Skip to content

Schema-aware coercion of per-call meta arguments (fix Qwen near-miss tool-arg rejections)#1470

Merged
Aaronontheweb merged 8 commits into
netclaw-dev:devfrom
Aaronontheweb:worktree-coerce-meta-arg-names
Jun 24, 2026
Merged

Schema-aware coercion of per-call meta arguments (fix Qwen near-miss tool-arg rejections)#1470
Aaronontheweb merged 8 commits into
netclaw-dev:devfrom
Aaronontheweb:worktree-coerce-meta-arg-names

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Problem

Self-hosted Qwen3.6 emits per-call meta hints (_rationale/_timeout_seconds/_background) using ChatGPT-style names without the underscore — TimeoutSeconds, Timeout, Rationale. Since #1398's loud tool-argument validation, those near-misses are hard-rejected ("the tool was NOT executed"), which pushed the model off tools entirely and into emitting ChatGPT-style sandbox:/… download links instead of calling attach_file.

Observed in production sessions D0AC6CKBK5K/1781746527.976389 and C0B1GKNLZD3/1782215358.474099: 26 "Unrecognized argument" rejections across the fleet in the days after the 0.24 deploy (Rationale_rationale, TimeoutSeconds_timeout_seconds, …), and a cluster of dead sandbox: links the user kept correcting by hand.

Approach

Make per-call meta-argument resolution schema-aware, applied uniformly to every tool:

A key is a meta field iff it's an exact canonical name (_timeout_seconds…), or it resolves to a meta name and is not a declared parameter of this tool.

The daemon loads the full schema for every tool — native and MCP — so the declared-param check is authoritative everywhere; no native/MCP special-casing. The no-silent-discard invariant holds for all tools: every supplied value is either consumed (correctly mapped) or loudly rejected (unknown key / invalid value / ambiguous spelling).

What's in here (3 commits)

  1. Schema-aware coercionToolArgumentValidator.ResolveMetaField(tool, key) resolves near-miss meta names but yields to a tool's own declared parameter (so a third-party MCP timeout is forwarded, never hijacked). Validation and extraction share this one resolver. Validator cache switched to per-instance (ConditionalWeakTable) because MCP tools share a type but carry distinct schemas.
  2. Unified InterpretToolCall seam — one IToolExecutor method (validate + extract in one step, resolving the tool and resolver once) that both the main pipeline and the sub-agent route through, so neither can skip a step. Fixes the sub-agent silently dropping timeout hints (it previously didn't extract meta at all). The meta-ambiguity guard moved to ValidateMetaValues, which runs for every tool (it was native-only, leaving an MCP silent-drop).
  3. Re-drive preserves metaToAiMessage(reinjectMeta: true) re-injects the persisted MetaJson hints on the approval re-drive path, so a re-executed call honors its timeout instead of falling back to the default (a pre-existing latent bug surfaced by the deep review; MetaJson was written but never read by the re-drive dispatch).

Persistence now records tool calls exactly as the executor interprets them (LlmSessionActor passes the schema-aware interpreter to ChatMessageConverter.FromAiMessage), so history matches what ran.

Review

Authored against an xhigh code review (10 finder angles) plus a focused deep review of the persistence/re-drive seam (3 adversarial tracers). Findings either fixed (ambiguity gap, persistence divergence, re-drive meta loss, stale docs, dead code, dup work) or accepted with rationale in the commit bodies (undeclared near-miss on additionalProperties:true MCP schemas — off the declared interface, narrow; CWT vs ConcurrentDictionary micro-cost — required for per-instance MCP schemas).

Verification

  • Netclaw.Actors.Tests: 2477 passed; Netclaw.Daemon.Tests: 738 passed.
  • New coverage: MCP ambiguity rejection, InterpretToolCall success/rejection, schema-aware resolution yields to declared params, persistence near-miss round-trip, re-drive hint restoration, meta-field switch-coverage guard.
  • dotnet slopwatch analyze clean; copyright headers verified.
  • netclaw-operations skill bumped to v2.15.0 (spelling-tolerant arg guidance).

Eval/smoke suites N/A (no identity/TUI/skill-matching surface changed). The eval case tool_timeout_arg_recovery still passes by construction; worth a run against the live Qwen endpoint before release.

ChatGPT-trained models (Qwen3.6) name the per-call meta fields the ChatGPT
way — TimeoutSeconds/Rationale/bare Timeout instead of the canonical
_timeout_seconds/_rationale/_background. Since netclaw-dev#1398's loud validation, those
near-misses are hard-rejected ("tool was NOT executed"), which pushes the model
off tools and into emitting ChatGPT-style sandbox: download links instead of
calling attach_file (observed in sessions D0AC6CKBK5K_1781746527 and
C0B1GKNLZD3_1782215358).

Resolve a supplied key to a meta field when its normalized form matches a
canonical name AND the tool does not declare a real parameter that binds to it.
This is schema-aware: a third-party MCP server's own 'timeout' parameter is
forwarded untouched, while the model's near-miss is still coerced where no such
parameter exists. The value is consumed, never silently dropped — preserving the
no-silent-fallback invariant netclaw-dev#1398 established. Ambiguous double-spellings and
invalid values are still rejected loudly.

- ToolArgumentValidator.ResolveMetaField(tool, key): single schema-aware resolver;
  validator cache switched to instance-keyed (ConditionalWeakTable) so MCP tools
  sharing McpToolAdapter no longer conflate schemas.
- ToolCallMeta.ExtractFrom / ValidateMetaValues / Extract take a resolver delegate
  (default exact, schema-blind) used by persistence; the executor passes the
  schema-aware resolver via IToolExecutor.PrepareToolCall.
- Sub-agent loop now runs ValidateToolCall -> PrepareToolCall -> apply timeout hint,
  the same shared seam as the main pipeline (it previously skipped extraction and
  silently dropped the hint).
- Loud default in the meta switch; dead SuggestFor branch removed; recognition
  loops merged.
- netclaw-operations skill v2.15.0; tool_timeout_arg_recovery eval comment updated.

Full Netclaw.Actors.Tests green (2471); Daemon provider-boundary green (91);
slopwatch clean; headers verified.
…review findings

Follow-up to the schema-aware meta-arg change (d43a16f), addressing an xhigh
code review. Three substantive fixes plus cleanups.

Correctness — ambiguity guard now covers MCP. The 'two keys map to one meta
field' guard lived only in ToolArgumentValidator.ValidateArgumentKeys, which the
executor skips for MCP tools; but MCP gets schema-aware near-miss extraction, so a
double-spelled meta key on an MCP call silently last-write-wins. Moved the guard
into ToolCallMetaExtractor.ValidateMetaValues (runs for every tool via
DispatchingToolExecutor.ValidateArguments), checked ahead of value errors so it
keeps priority. Removed the now-redundant guard from ValidateArgumentKeys.

Altitude — one InterpretToolCall seam. Added IToolExecutor.InterpretToolCall
(returns rejection? + meta + cleaned), which resolves the tool and builds the meta
resolver ONCE, then validates and (only on success) extracts via a shared
ValidateCore. The main pipeline and the sub-agent loop both route through it instead
of calling ValidateToolCall then PrepareToolCall as two un-ordered steps (removes the
double registry lookup and the extract-without-validate footgun). Timeout-hint
application centralized in ToolExecutionContext.ApplyMeta (used by the sub-agent's
two sites). Sub-agent's intentional ignore of _background/_rationale is now
commented.

Persistence matches runtime — ChatMessageConverter.FromAiMessage takes an optional
schema-aware interpreter; LlmSessionActor passes the executor's PrepareToolCall at
the tool-call persistence site, so a near-miss meta key is stripped and captured in
MetaJson in recorded history (was persisted raw, schema-blind). Default stays exact
for tests/replay.

Cleanups: PrepareToolCall default delegates to ToolCallMetaExtractor.Extract;
ValidateArgumentKeys computes NormalizeKey once; removed the stale 'exact-match for
MCP' docs and the duplicate XML-doc on ResolveMetaField; added a loud default in the
ExtractFrom meta switch plus a test that every MetaFieldNames entry is bound.

Accepted (documented): undeclared near-miss on additionalProperties:true MCP
schemas (off the declared interface; narrow); ConditionalWeakTable vs
ConcurrentDictionary micro-cost (CWT required for per-instance MCP schemas).

Tests: MCP ambiguity rejection; InterpretToolCall success/rejection; persistence
near-miss round-trip; exact-resolver tests renamed off the misleading 'MCP' name;
meta-field coverage guard. Full Netclaw.Actors.Tests green (2476); Daemon
provider-boundary green (91); slopwatch clean; headers verified.
A tool call that needs approval, whose session passivates before approval and
then cold-recovers, is re-driven through ChatMessageConverter.ToAiMessage, which
rebuilt the call from ArgumentsJson only and never re-injected the meta hints that
persistence had stripped into MetaJson. So the re-driven execution extracted no meta
and silently fell back to the default timeout — dropping the agent's _timeout_seconds
(and _background/_rationale). This reproduced the original 'long call killed at the
default timeout' symptom on the post-approval re-drive path.

Pre-existing (it affected canonical _timeout_seconds before the meta-arg work too;
MetaJson was written and round-tripped but only ever read by CompactionPromptBuilder,
never by the re-drive dispatch). Surfaced by the deep persistence review.

ToAiMessage gains a reinjectMeta flag (default false): when set it parses the
persisted MetaJson and re-injects the canonical keys (_rationale/_timeout_seconds/
_background) into the reconstructed arguments, so the re-dispatched call's extraction
reapplies the hint. RedriveToolBatchForApproval opts in; the other four ToAiMessage
callers (outbound provider history, compaction, sub-agent) keep it false — the model
must never receive meta keys. Re-drive reconstructs for execution only (not
re-persisted), so no meta keys leak back into the journal.

Test: ToAiMessage_reinjectMeta_restores_hint_for_redrive_but_not_outbound proves
the strip→reinject→re-extract round-trip restores the 1800s hint on re-drive while
outbound history stays clean. Full Netclaw.Actors.Tests green (2477); slopwatch clean;
headers verified.
@Aaronontheweb Aaronontheweb added bug Something isn't working mcp Model context protocol server / client issues. sessions LLM session actor, turn lifecycle, pipelines tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc. labels Jun 23, 2026
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) June 23, 2026 23:53
@Aaronontheweb Aaronontheweb disabled auto-merge June 23, 2026 23:53
@Aaronontheweb

Copy link
Copy Markdown
Collaborator Author

Need to run evals for this before I review this PR

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) June 24, 2026 02:46
@Aaronontheweb Aaronontheweb merged commit 79bf66d into netclaw-dev:dev Jun 24, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working mcp Model context protocol server / client issues. sessions LLM session actor, turn lifecycle, pipelines tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant